Hi Evan,
I need to scatter the elements of a vector out to multiple processors.
The mapping is one to many (vector elements can go to many procs). I
would like to do this with a permutation matrix which has 1 nonzero
per row.
I'd like the process to run on the GPU, so a warp would need to
Hey,
I'm proud to announce that after about 3weeks, I've recoded from scratch
the OpenCL code generator to integrate it fully with
viennacl::scheduler::statement.
hurray :-) With the changes to the generator I pushed yesterday there is
now a clear spot on where to hand the expression over to
Hey,
My preferred option is to pad by default and either to make the
padding a multiple of four or sixteen. However, we need to maintain
a full set of unpadded operations, because user-provided buffers
need not be padded (and a subsequent padding may be too expensive)
I
Hey,
see commit message here:
https://github.com/viennacl/viennacl-dev/commit/1a214259f577acd1b329197285e26cf2cd774e34
Best regards,
Karli
--
See everything from the browser to the database with AppDynamics
Get
Hi Phil,
Thanks Karl !
This will allow serve us when we deal with multiple GPU ;)
I'm pretty happy with the model now, basically extending the concept of
a 'context' in OpenCL beyond OpenCL boundaries: Create vectors as follows:
viennacl::vectorT x(42); //vector in default context
Hi guys,
as I was recently discussing asynchronous transfer and execution with
Evan in an MPI context, this is now addressed with
viennacl::async_copy()
Typical use case:
std::vectordouble std_x(SIZE);
viennacl::vectordouble vcl_x(SIZE);
viennacl::async_copy(std_x, vcl_x); // same
Hey,
Hmm, I'm not completely sure.
The best GEMM performance are not located around (distance-wise in the
parameter space) the sweet spot, generally, since perturbating one
parameter can result in disastrous performance.
Yeah, I agree, the sweet spot may not be defined 'distance-wise', but
Hi,
A padding of 256 looks pretty expensive to me, resulting in a lot of
unnecessary FLOPs in worst case. Can you please assemble a list of
all GEMM kernel configuration parameters and their execution times
for the GTX 470, Tesla C2050, HD 7970 and HD 5850? mL, nL, and kL
Hi Toby,
The main difficulty with following the conventions is that it's not
clear which is the convention to pick. NumPy provides both a matrix()
class and a ndarray() class -- the former has semantics closer to matrix
algebra, whilst the latter is designed to be closer to having more
Hi,
I've just realized i had forgotten to answer!
My computer is no longer laggy in single-threaded mode, which is already
a good thing :) it still cannot bear make -j4, even though it has 4GB of
RAM, my desktop computer can without any issue, though. I'll update this
when I have cleaned
Hi guys,
wow, AMD open-sourced their Math libraries...
Best regards,
Karli
---
*AMD Accelerated Parallel Processing Math Libraries (APPML) is now
available as open source as clMath.*
I am extremely pleased to have the opportunity to announce that the
APPML BLAS FFT
Hi Toby,
Karl Rupp r...@iue.tuwien.ac.at writes:
Parallella will ship their first (OpenCL-enabled!) boards in October and
also offers a university partner program:
http://www.parallella.org/pup/
I'd forgotten about these things! Their roadmap does look intruiging;
I'm looking forward
Hi,
I can't think of any such case where one would want to have control over
this. This would require knowledge of our implementations to make
appropriate choices anyway. In order to have a reasonable decision
process, we need to come up with some heuristics...
My first idea would be to
Hi Toby,
As I've been crafting the cmake files in order to prepare for a
PyViennaCL release, I've realised that the more I do this, the more the
PyViennaCL tree diverges from upstream, and the more merges I'll have to
take care of whenever I pull in upstream changes. So I've been more and
Hi Toby,
I get a lot of errors like the below when I enable T = char (or other
integer numeric types) in PyViennaCL. (...)
This is now fixed. You should be able to instantiate the basic types
with char, uchar, short, ushort as well. So far I have explicitly tested
this only with vector,
Hi guys,
we haven't had an IRC meeting for quite a while now. I'm finally done
with most of my relocation from the US back to Austria, so I propose to
have our next IRC meeting on Friday, October 4, at 15:00 UTC. Is this
okay for everybody interested in joining?
Potential topics:
- Final
.
Toby
Philippe Tillet phil.til...@gmail.com mailto:phil.til...@gmail.com
writes:
Hey,
I'll be there!
Philippe
2013/10/2 Karl Rupp r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at
Hi guys,
we
Hi,
Rather than introducing yet another base class, what about allowing
implicit vectors in vector_base by suitable constructor arguments?
This will also keep compilation times under control :-)
I'm a bit confused, this solution would then allocate memory in the case
of :
Hey,
OPERATION_FUNCTION_SUB_TYPE_FAMILY (norm, prod, inner_prod, etc...)
OPERATION_ELEMENT_FUNCTION_SUB_TYPE_FAMILY (abs, pow, etc)
OPERATION_ELEMENT_OPERATOR_SUB_TYPE_FAMILY(+, ==, , etc...)
I assume they are all within the same enum - go for it :-)
Best regards,
Karli
Hey, hey, hey,
The GRE is finally behind me...! I'm done with my vocabulary marathon
and can go back to some C++ coding!
Yeah! I hope GRE went well :-)
Is 2pm UTC tomorrow fine for everyone?
Fine for me. :-)
(Keep the change in daylight saving time in mind)
Best regards,
Karli
Hi Philippe,
I am done implementing :
x = viennacl::reduceop(viennacl::rows(A));
x = viennacl::reduceop(viennacl::cols(A));
s = viennacl::reduceop(x);
In the generator. For now, the op supported are : add, mult, max, min. I
can't support them all, because I need to provide their neutral
Hi Toby,
I don't have much to report, but I thought I should check in to show
that I'm still alive, since there's been a flurry of git activity
recently. I'm slowly getting on top of my workload, and should finally
be there around the beginning of December. I'll set aside a day then to
get
Hi Soufiane,
from your description it seems to me that you want to solve a
least-squares problem, while the GMRES implementation in ViennaCL is for
square systems. I suggest you use a QR-factorization as outlined in
examples/tutorial/least-squares.cpp.
If you have any good pointers for GMRES
Hi Soufiane,
My mistake is from a misunderstanding of the documentation. I read that:
Conjugate Gradient (CG) symmetric positive definite
Stabilized Bi-CG (BiCGStab) non-symmetric
Generalized Minimum Residual (GMRES) general
Like General A NxM, my is for any kind of square-matrices.
Ah,
Hi Albert,
I thought that a good way to get good performance is to formulate all
the calculations somehow vectorized but I'm not sure if I have chosen
the best way because the code performs badly. The matrices are big,
about 10k \times 10k in size.
This is correct, provided that the
be faster for large matrices because of the way
threads are assigned. I guess that most of your execution time is spent
elsewhere, so it's probably not worth optimizing further...
Best regards,
Karli
On Thu, Dec 12, 2013 at 2:49 PM, Karl Rupp r...@iue.tuwien.ac.at wrote:
Hi Albert,
I thought
Hi Albert,
this looks like you are running out of memory. Do you happen to know how
much video RAM you have on your machine? The matrix will eat up ~400 MB
of memory, which you need to add to the other video RAM consumed by the
operating system.
Best regards,
Karli
On 12/12/2013 03:19 PM,
I'm sure Philippe wanted to send this to viennacl-devel ;-)
Original Message
Subject:Re: [ViennaCL-devel] How to use multiple cores/CPUs
Date: Sun, 15 Dec 2013 16:58:14 +0800
From: Philippe Tillet phil.til...@gmail.com
To: Karl Rupp r...@iue.tuwien.ac.at
Hey,
I agree. However, it seems to me that setting the implementation for
each matrix would end up being tedious... one table per memory backend
since to make sense conceptually to me, since the performance (and the
portability) of each blas implementation is determined by the underlying
Hi,
Yeah, it certainly is a bit tedious. Feel free to only do this for
matrix-matrix multiplications for now, a full operation table is
presumably too much of a refactoring for ViennaCL 1.x.y, but much
better suited for 2.0.0.
Yes. It's actually a pretty complicated
Hi Toby,
Hmm. Seems that that wasn't enough, so I split up the sources. I've got
the peak RAM usage down to ~1100 MiB, which hopefully will do the
trick. I'm off to bed, so we'll know in the morning..
Nice, ~1GB should be fine, as this is also not that unusual with other
projects. I intend
Hey,
There is some trickery going on with transpositions and layout,
but it
works for every transpose/layout combination. One can also link
A's blas
to his own gemm function, provided a tiny wrapper (essentially
to ensure
signature
Hi,
I've started back on the generator today, and realized how ugly the
dispatching mechanism was, to take advantage of the equivalencies based
on the fact that
RowMajor + Trans = ColMajor + NoTrans
Actually, I've been wondering : why wouldn't we do this on the whole
codebase?
We should
Hi Toby,
please allow for ~1 more day, then 1.5.0 is out and I'm available for
testing :-)
Best regards,
Karli
On 12/17/2013 08:14 AM, Toby St Clere Smithe wrote:
Toby St Clere Smithe m...@tsmithe.net
writes:
Yep, looks like the build was successful, so I'll go ahead and make sure
it's
Hey,
*Sneeks in*
(Seems like it's time to hide a if( rand() RAND_MAX/2) return;
somewhere in the code where Karl won't find it !)
:D
:-P You'd have to trick git into not displaying the change ;-)
Best regards,
Karli
Hi Toby,
So I've uploaded PyViennaCL packages which don't have the shared_ptr
troubles that my previous ones did (in the end, I split what used to be
a single Python extension into many smaller extensions under one package
namespace). These packages seem mostly to work, and (for instance),
Hey,
So it turned out that it wasn't anything to do with my previous error,
just that having split up the files, I had not put the OpenCL #define in
all the right places.
In fact, come to think of it, that may be the cause of the shared_ptr
troubles. Agh!!
Ah, I see. What about setting
for matrices.
The full change logs can be found at
http://viennacl.sourceforge.net/changelog.txt
Thanks to all contributors :-)
Best regards and best wishes for 2014,
Karl Rupp
--
Rapidly troubleshoot problems before
Hey,
In fact, there are still some features that I don't have in PyViennaCL:
Karl Rupp r...@iue.tuwien.ac.at writes:
- Added norm_frobenius() for computing the Frobenius norm of dense
matrices.
- Multiple OpenCL contexts can now be used in a multi-threaded setting
(one thread per
Hi Toby,
alright, finally some first testing experience. This is a bunch of very
basic information, as I'm only now in the state where I can mimic a new
PyViennaCL user ;-)
* Installation: Works nicely from the PPA, no problems with my Linux
Mint Maya (based on Ubuntu 12.04, so this is
!
Best regards,
Karl Rupp
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments Everything In Between.
Get
Hi Philippe,
I'm slowly getting back to ViennaCL.
I have added one bullet point to the roadmap:
* Full integration of the micro-scheduler and the generator
Yep, definitely. See issue #8, it's already on the TODO-list for the
1.5.x branch. The nice thing is that this is completely internal
Hey,
So today I went back to ViennaCL. I tried to move the equivalence
columntrans = rownotrans upwards in the dispatching mechanism but it
turns out to be impossible, because matrixT,row_major is not (and
should not be) convertible to matrixT, column_major, rendering the
underlying
Hi,
Yes it does! Actually, what we would ideally do is to, by default, link
ViennaCL to the integrated set of numerical kernels (those of
libviennacl, which would be generated dynamically for the OpenCL
backend), and allow one to switch backend to
MKL/OpenBLAS/CuBLAS/FunFunFunBLAS... The
Hey,
I am a bit confused, is there any reason for using reciprocal and
flip_sign, instead of just changing the scalar accordingly?
yes (with a drawback I'll discuss at the end): Consider the family of
operations
x = +- y OP1 a +- z OP2 b
where x, y, and z are vectors, OP1 and OP2 are
Hi,
I was in fact wondering why one passed reciprocal_alpha and flip_sign
into the kernel. After thinking more about it, I have noticed that this
permits us to do the corresponding inversion/multiplication within the
kernel, and therefore avoid one some latency penalty / kernel launch
Hi,
I prefer option 3. This would allow for something like :
if(size(x)1e5 stride==1 start==0){
Here we also need to check the internal_size to fit the vector width
//The following steps are costly for small vectors
NumericT cpu_alpha = alpha //copy back to host when the scalar is
Hey hey hey,
Convergence depends on what is inside generate_execute() ;-) How is
the problem with alpha and beta residing on the GPU addressed? How
will the batch-compilation look like? The important point is that
for the default axpy kernels we really don't want to go
Hi Philippe,
I don't understand why this would go through more than one compilation...
This kernel is compiled only once, the value of flip_sign and reciprocal
only changes the dynamic value of the argument, not the source code.
This would eventually result in:
if(alpha_reciprocal)
Hi Phil,
Oh, I get it better now. I am not entirely convinced, though ;)
From my experience, the overhead of the jit launch is negligible
compared to the compilation of one kernel. I'm not sure whether
compiling two kernels in the same program or two different program
creates a big
Hi Philippe,
I have found this relatively new and interesting PDF file :
http://www.altera.com/literature/hb/opencl-sdk/aocl_optimization_guide.pdf
. I'll read it overnight. This is of course for a mid/long-term
perspective, but there are some remarkable points within, for example
(some
Hi guys,
the Google Summer of Code [1] is approaching. It attracted some great
contributors in the past, most notably Philippe and Toby, and I hope
there's more to come. So, guys, please provide your project ideas.
My experience is that good projects are those which don't require the
student
Hi Toby,
the Google Summer of Code [1] is approaching. It attracted some great
contributors in the past, most notably Philippe and Toby, and I hope
there's more to come. So, guys, please provide your project ideas.
I have a couple of inter-related things I'd like to work on, to make
Hey,
I'll be once more available as a mentor :)
Yeah, great! :-)
I'll be myself pretty busy with some BLAS2/BLAS3 tuning for Hawaii.
I don't think this is going to be a problem.
I'm
also in favor of ideas of projects which don't require a strong
knowledge of the current codebase, such
Hi Toby,
The first project I have in mind is the benchmarking GUI we brainstormed
about in IRC. It's probably a good idea to push out a first working
version in the next weeks and then let the student work on refinements
such as a visualization of the results, etc.
Were you thinking of
Hey,
So, as of now, the generation of row-wise reduction can be triggered
through the interface:
viennacl::reduceop_add(viennacl::row_wise(Mat))
viennacl::reduceop_max(viennacl::col_wise(Mat))
viennacl::reduceop_min(Vec)
This plugs into a statement under the form:
Hi guys,
in the past few days we worked here in Vienna on setting up an automated
nightly build system based on CTest and CDash. It isn't fully completed
yet, but it already starts to pay off:
http://jwein2.iue.tuwien.ac.at:5/CDash/index.php?project=ViennaCL
Philippe, could you please
Hi,
As long as you're a student, you're eligible to apply for GSoC. ;-)
However, I don't give any guarantees, your application will be treated
equally. You certainly have an advantage with respect to how things
work, but no other student should be excluded upfront.
It would definitely be
Hi Philippe,
I completely agree, concerning matrix-free implementations of the linear
solver.
Their absence is the very reason why I had to reimplement solvers for
UMinTL.
I assume you are aware that you can overload viennacl::linalg::prod()
for whatever custom 'matrix' type you pass to
on the PETSc-ViennaCL
bindings. (This would certainly require a fairly experienced student)
Best regards,
Karli
On Saturday, February 15, 2014, Karl Rupp r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at wrote:
Hi Philippe,
I completely agree, concerning matrix-free
Dear Gengdai Liu,
I'm a new user of viennacl. I have a problem when I used it.
I have already successfully built all examples with default setting (but
with Eigen and CUDA turned on). When I swithed to using cuda backend for
eigen-with-viennalcl example(VIENNACL_WITH_CUDA is defined), I got
Hi Toby,
so there's one last thing before I can get a release out that's been
bugging me for the last couple of days: I can't seem to get the
iterative solvers to work, for either dense or sparse matrices. I've
tried an implementation of the 'mat65k.mtx' example, I've tried using
the
Hey,
I'm on it. A couple of notes on the setup:
* The manual checkouts of external/boost_numpy and
external/viennacl-dev don't work for me (git 1.7.9.5). Is this supposed
to be automatic? If not, there should be instructions in the README.
Hmm, yes -- the README as it is at the moment
Hi again,
back to the original problem: The RHS is not passed correctly. Use the
sample system attached, it is just 4x4 and should converge nicely. If I
print the RHS vector passed to the iterative solver, it is all zero.
Therefore, the solver doesn't even start to iterate, but instead
Hi,
Argl, the reason is that I had to do a manual clone of your Boost.NumPy
repo and did not change the branch. I don't think it's good to have a
patched repo for Boost.NumPy around...
Ah -- if you do the git submodule update --init command, I think it does
that for you. In any case, I
Hey,
Concerning the Norm, I have made minor changes in the
operator/operator-subfamily for a new version of the generator. Maybe your
statement parsing relies on the former family? The type
(OPERATION_UNARY_NORM_*_TYPE) has not changed, though...
Do you have a link to a diff I can read?
Hi Toby,
For most of the PyViennaCL functions, where the prototypes are
compatible, I pass vector_base objects to ViennaCL, because that way I
don't have to have a large number of identical functions for vector,
vector_proxy, etc. In this case, I was passing the vector as a
vector_base
Hi Toby,
That would explain why the vector mysteriously disappears. Notably, when
I put the print commands in my C++ code, I put them *after* the solver
call (probably a mistake to put them there, in hindsight, but
nonetheless it seems to have been useful!).
Why does *_base need a copy
Hi Toby,
The time has finally (almost) come. I'd like to make a 1.0.0 release
tomorrow (or rather, later today), but first of all, there are three
things I'd like to happen.
Yeah, great! :-)
Firstly, I need to move pyviennacl into its own repository under
viennacl-dev on GitHub. I don't
Hi again,
Karl Rupp r...@iue.tuwien.ac.at writes:
alright, I added you to the developer group. I don't know whether this
is sufficient for write permissions at the project web, but just give it
a try. User permission management is quite coarse-grained on sf.net.
Great! See
http
Hey,
Looks good! Evan's suggestion makes me think that the example code
should be made really obvious. Could you put another link there like
[PyViennaCL Examples] pointing at
http://viennacl.sourceforge.net/pyviennacl/doc/examples/index.html
?
Done.
Please let me know when the tarball
Hi Toby,
I don't have much expertise with MSVC, but I'm trying to build
pyviennacl, and I've got a couple of weird bugs. I've looked at the
source, but really have no idea what it's complaining about. Why can't
it resolve the type ambiguity here?
Hi Philippe,
I've been obtaining recently significant performance improvement out of
the kernel generator, which should bring ViennaCL 1.6 extremely close
(95%) to CuBLAS (on NVidia hardware) and clAmdBlas (on AMD hardware)
for BLAS1/Dense BLAS2/Dense BLAS3.
Excellent, great news! :-)
Hey,
Karl Rupp r...@iue.tuwien.ac.at writes:
alright, this looks like the issue is with dense matrices being passed
to BiCGStab. Does the build work if you disable the dense matrices for
the iterative solvers? If so, then I think you can temporarily fix this
within PyViennaCL and we don't
Hi Toby,
thanks, I'll test it tonight and let you know about the outcome. :-)
Best regards,
Karli
On 02/26/2014 03:05 PM, Toby St Clere Smithe wrote:
Hi all,
So with the new build system it is now possible to build PyViennaCL on
Windows. I only have Windows in a virtual machine, so it
Hi Toby,
sorry, I got delayed, have to wait until tomorrow. What's the current
status of the dense matrices in BiCGStab? Does the compilation problem
still show up?
Best regards,
Karli
On 02/26/2014 03:08 PM, Karl Rupp wrote:
Hi Toby,
thanks, I'll test it tonight and let you know about
Hey,
Yes, that worked, see my comment here:
https://github.com/viennacl/pyviennacl-dev/issues/2
Installation now succeeded. :-)
Did you change anything else since then?
Nope! I assume all is well, then :)
Okay, then all is well. Please wait another hour before you start the
packaging
Hey,
Okay, then all is well. Please wait another hour before you start the
packaging process, I might add something to the README file :-)
Sure -- just e-mail when ready :)
Ready:
https://github.com/viennacl/pyviennacl-dev/commit/261a3b8c5ad0f69e57a3f1c5e9ad469e29cf84ad
Best regards,
Karli
Hi,
See above :-) There are good reasons for dropping infos(),
particularly as we cannot assume that each OpenCL SDK returns the
requested information as rapidly as we might need it.
Hmm, then a good solution would be to internally use infos whenever a
viennacl::ocl object is
Hi,
(CC-ing viennacl-devel, as this is developer-talk ;-) )
Either way, I want to let you know that the generator/auto-tuner is
undergoing significant changes, and that you will, actually, not have to
worry about it for your GSoC project. The generator will be used
transparently via the
Hi,
Well, I think this is not entirely unrelated. The purpose of the GUI
is still to allow a broader community to feed us with benchmark
data, so somehow the loop over all possible configurations is still
essential. With an interface to Python I assume that an API to do
Hi,
Why is data pointless? I'd rather have only a few datapoints on new
hardware out there rather than having absolutely no data at all.
I mean, the data is pretty useful because it tells us about the best
default kernel for large square matrices, but it is not very useful if
we
can
expect in 1.6?
Thanks,
-Matt
On Jan 21, 2014 4:53 PM, Karl Rupp r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at wrote:
Hi Philippe,
I'm slowly getting back to ViennaCL.
I have added one bullet point to the roadmap:
* Full integration of the micro-scheduler
://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Developer-Meetings
On 05/11/2014 12:13 PM, Karl Rupp wrote:
Hi guys,
I think it's time to refine our plans for a 1.6.0 release as well as the
interaction with our two GSoC projects. To do so, please enter your
availability for an IRC session here
Philippe
2014-05-21 21:35 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at:
Hi,
I've slightly modified the CG implementation to handle
preconditionned
CG and unpreconditionned CG in the same routine (without
Hi Sumit,
If I were to use an Eigen Matrix, I can potentially access its raw data
using the Matrix.data() function. Is there anything similar in ViennaCL
also?
yes, there is. The respective member function is called handle() and
returns a multi-backend handle. It then depends where the
Hey,
The integration of the generator is going on slowly but safely. Vector
kernels are fully integrated and I'm about to support some matrix kernel
as well ( excluding FFT, LU, and a few others).
:-)
I have one metaphysical question, though.
Then my answer is '42' ;-)
There are two
Hi,
Have you heard of this Google project :
https://code.google.com/p/cppclean/
For some reason it no longer contains code and has been forked on
https://github.com/myint/cppclean
I've once run the ViennaCL codebase against cppcheck
(http://cppcheck.sourceforge.net/), which is somewhat
Hey,
Bear in mind also that PyViennaCL is such a shared library interface
right now, and already has a fairly demanding compilation!
Indeed!
One quick question: does the explicit conversion step allocate any new
memory, or is it treated like a cast?
Such an explicit conversion creates new
Hi,
This sounds reasonable indeed. I need casting operation_node_type for
the generator to control explicit casting within a generated kernel, but
it sounds very reasonable to only allow such constructors indeed.
The casting functionality can still be part of the generator, there's no
reason
Hi,
This is a report on what was done so far, and what remains to be done on
the Benchmark GUI project.
Quick overview of what was done so far:
-all benchmarks implemented and runnable from the GUI
-result visualization of benchmarks
-UI menu and navigation
-CMake build system
Good,
Hi Toby,
I reiterate my call for an IRC meeting
fine with me. My schedule is very much in flux in the next ~10 days or
so, so I might be unavailable on short notice. Rather than having one
big IRC meeting with all topics crushed together, I suggest we have a
couple of smaller topic-oriented
Hi,
the cases 5, 6, and 7 are handled by running a kernel for four vectors,
then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3
vectors. This could also be handled by a generated kernel, yes, but I
haven't implemented this for two reasons:
1. less kernels to compile
2.
Hey,
Unfortunately I won't be available until Tuesday for a meeting. Python
and CUDA-based libraries are widely used by the Machine Learning
community. I also want to push OpenCL forwards, but supporting CUDA
through PyViennaCL would be a very good thing to do, since a lot of
researcher
Hi Toby,
I'll be available from Tuesday afternoon on. What about wednesday
13:00 UTC
and 15:00 UTC?
Both of these are fine by me! Sorry for the delay responding.
Let's stay 15:00UTC, then, after the tea!
Karl has been pretty busy lately. In case he cannot come on
Hi Andreas,
I should be fine with Wednesday, 15:00 UTC.
I can make that, too, for at least a half hour. :)
Excellent!
Where is this taking place? (I.e. what IRC channel or some such?)
If that's not yet determined, I run this:
https://ssl.tiker.net/chat/
which we could use. (Also
Hi,
I've already told this on IRC. The GEMV uses very conservative profiles
with very few threads. Now that I have ported a simple version of GEMM
(when only full matrices are used), I'll re-bind the generator into
pyviennacl and will try to get an auto-tuning up and runing in python.
Then,
Hey,
I made a small mistake when creating these conservative profiles. GEMV
runs with only one work group. I'll fix this, don't worry :)
ah, that's an easy fix then. Thanks!
Best regards,
Karli
2014-07-06 13:31 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at
hardware-specific tuning. But for everything else
(which is pretty much equivalent to memory bandwidth limited) we can
pretty much 'guess' a good configuration and get close to the practical
peak.
Best regards,
Karli
2014-07-06 13:37 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at
mailto:r
Hey,
our Nightly tests report new issues with some examples, which are
probably all due to GEMM:
http://viennastar.iue.tuwien.ac.at/CDash/index.php?project=ViennaCL
(also look at the previous day)
Philippe, I see a bunch or recent commits. Is it possible that this got
fixed in the meanwhile?
1 - 100 of 252 matches
Mail list logo