Hi Andreas,
> I have a program which has to iteratively solve a number of
independent equation systems.
> The actual number of systems may vary between 1 and 10.
> Because they are totally independent from each other, we assign each system a
> number of OpenMP threads allowing them to be solved
Hi Evan,
> Building against Boost is a challenge with CMake. I repeatedly run
> into problems building ViennaCL against boost_filesystem and
> boost_system libraries depending on how the host has them built.
I agree, it's indeed often unnecessarily challenging on Windows. Do you
encounter the s
Hi Evan,
> Can you refresh my memory on how ViennaCL behaves with this operation:
>
> vector = vector - vector_range
>
> I would like to subtract an M < N sized vector_range from an N sized
> vector (by operating on the first M elements). Is this supported in
> 1.4.2?
Yes, this is supported. You
gt;>>
>>> On Sun, Jul 21, 2013 at 4:13 PM, Evan Bollig wrote:
>>>> Ah I see. Thanks for the clarification on projecting the LHS. I was
>>>> trying to assign the result to an unprojected vector assuming (vector
>>>> - vector
Hey,
thanks for the hint! There's one thing that is of particular interest
for us: The portable intermediate representation, which is essentially a
LLVM IR standardized for OpenCL. This should cut down compilation times
and at the same time allow for a few more optimizations which are hard
to
rn a vector_range (i.e., operation returns operand type).
>>
>> -E
>>
>> On Sun, Jul 21, 2013 at 4:09 PM, Karl Rupp wrote:
>>> Hi Evan,
>>>
>>>
>>>> Can you refresh my memory on how ViennaCL behaves with this operation:
>>>
Hi Evan,
thanks, this is now fixed.
Unfortunately GCC 4.6 and above use some mixed C++11 compilation mode,
in which these extra typename keywords are perfectly valid. They are not
allowed in C++03, thus GCC 4.4 is right about complaining here.
Any hints on how to make GCC 4.6 and above complai
Hi Andreas,
the rigorous solution of the problem turns out to require more effort
than anticipated such that the generic solver implementations don't get
messed up. I'll need another day or two for this to correct.
Best regards,
Karli
On 07/21/2013 09:59 AM, Andreas IHU wrote:
> Hi again,
>
>
latest dev and test against that.
>>
>> Cheers,
>> -E
>>
>> On Mon, Jul 22, 2013 at 1:47 PM, Karl Rupp wrote:
>>> Hi Evan,
>>>
>>> I just pushed support for sparse matrix-vector products when using
>>> vector-ranges and vector-sli
Hi Evan,
> Hey Karl, does the cuthill mckee algorithm from ViennaCL account for
> unsymmetric matrices?
I looked it up in the bachelor thesis of the student and he assumed a
symmetric *graph*. Hence, if your matrix is structurally symmetric, but
non-symmetric in terms of values, things should
t;
>> -E
>>
>> On Wed, Jul 24, 2013 at 4:53 PM, Karl Rupp wrote:
>>> Hi Evan,
>>>
>>> this is strange, as there are tests checking for exactly this.
>>> Could you please run a 'make clean && make' in the build folder? I su
hanks Karl. I thought it was strange too. I blasted my ~/.nv folder
>> and am waiting for make clean && make to finish. Ill keep you posted.
>>
>> -E
>>
>> On Wed, Jul 24, 2013 at 4:53 PM, Karl Rupp wrote:
>>> Hi Evan,
>>>
>>> this is s
morning.
>
> -E
>
> On Wed, Jul 24, 2013 at 9:03 PM, Evan Bollig wrote:
>> Ok ill take a look. Yes, im using ELL.
>>
>> -Evan Bollig
>>
>> On Jul 24, 2013 6:34 PM, "Karl Rupp" wrote:
>>>
>>> Hi again,
>>>
>>> whic
Hi Evan,
this is now fixed for COO, ELL, and HYB formats on OpenCL. CPU and CUDA
backends will be fixed later today.
Best regards,
Karli
On 07/24/2013 11:24 PM, Evan Bollig wrote:
> Cool. I appreciate the help!
>
> -E
>
> On Wed, Jul 24, 2013 at 11:20 PM, Karl Rupp wr
Hi Evan,
> I need to scatter the elements of a vector out to multiple processors.
> The mapping is one to many (vector elements can go to many procs). I
> would like to do this with a permutation matrix which has 1 nonzero
> per row.
>
> I'd like the process to run on the GPU, so a warp would need
Hi,
@all: We will use this mailinglist for *all* development discussions
now, replacing previous private email communications - expect quite some
additional traffic. :-)
> I did some brief benchmarks using the following function:
>
def dobench(size):
> ... v = p.Vector(size, 0.1)
> .
Hey,
> I'm proud to announce that after about 3weeks, I've recoded from scratch
> the OpenCL code generator to integrate it fully with
> viennacl::scheduler::statement.
hurray :-) With the changes to the generator I pushed yesterday there is
now a clear spot on where to hand the expression over
Hey,
> My preferred option is to pad by default and either to make the
> padding a multiple of four or sixteen. However, we need to maintain
> a full set of unpadded operations, because user-provided buffers
> need not be padded (and a subsequent padding may be too expensive)
>
>
Hey,
see commit message here:
https://github.com/viennacl/viennacl-dev/commit/1a214259f577acd1b329197285e26cf2cd774e34
Best regards,
Karli
--
See everything from the browser to the database with AppDynamics
Get end-to-en
Hi Andreas,
- good news:
I made good progress on introducing a generic context. For code lines
such as
viennacl::vector x = y + z;
the vector x is created in the correct context (i.e. deduced from y, z).
This resolves most of the issues with temporaries.
- bad news:
I haven't found enough tim
Hi Andreas,
I just pushed the remaining code changes to the viennacl-dev repository.
examples/benchmarks/solver.cpp now runs the iterative solvers and
preconditioners in their own context, passing viennacl::context() to the
constructor to overwrite the default context. This should allow for the
Hi Phil,
> Thanks Karl !
> This will allow serve us when we deal with multiple GPU ;)
I'm pretty happy with the model now, basically extending the concept of
a 'context' in OpenCL beyond OpenCL boundaries: Create vectors as follows:
viennacl::vector x(42); //vector in default context
vi
Hi guys,
as I was recently discussing asynchronous transfer and execution with
Evan in an MPI context, this is now addressed with
viennacl::async_copy()
Typical use case:
std::vector std_x(SIZE);
viennacl::vector vcl_x(SIZE);
viennacl::async_copy(std_x, vcl_x); // same as next line
Hi Phil,
> The generator code is pushed on the master branch.
Cool, thanks. I actually wasn't expecting this to arrive in master today
:-)
I commented the commit on github. The short summary is:
1.) I don't quite know/see why we need SYMBOLIC_*, since a true symbolic
operation could equally
Hi Toby,
thanks for submitting the evaluation :-)
> Now for the couple of questions...
>
> I'm currently thinking that I need to rewrite the majority of the Python
> expression tree code in C++/Boost.Python, because (as I feared a while
> ago) there's quite a lot of overhead in using pure Python
Hey,
>> I started going through the code, and you're right, the change isn't
>> very large. But it makes sense to split it up as you describe, to keep
>> the different semantics different, and same semantics shared (ie, to
>> make sure the concepts are as clear as possible). I also have another
>
Hi,
On 07/31/2013 03:48 PM, Philippe Tillet wrote:
> Hi,
>
> I've explored a bit of the execute_*.hpp files, but I'm not sure how to
> integrate the generator here, ie when to create the kernel generation
> object, when to add statements for generation, where to trigger
> generation, etc... Any id
Hi ho,
> However, my question was rather about packing multiple operations
> together, and specifically scoping the necessary kernel generator object
> (or more generally scoping the std::vector that has to be
> packed together for generation/execution)
> *bool code_generator::add(statement)*
> r
Hi Toby,
yes, I totally agree that we should have different types of exceptions.
Please commit it yourself directly, you should have push permissions. :-)
Best regards,
Karli
On 08/01/2013 02:01 PM, Toby St Clere Smithe wrote:
> Toby St Clere Smithe
> writes:
>> Actually, whoops -- that's bug
Hi again,
actually, please introduce an exception derived from std::exception just
like for the scheduler.
Thanks and best regards,
Karli
On 08/01/2013 02:01 PM, Toby St Clere Smithe wrote:
> Toby St Clere Smithe
> writes:
>> Actually, whoops -- that's buggy. I'll fix it...
>
> OK -- fixed pa
Hi,
>> yes, I totally agree that we should have different types of exceptions.
>> Please commit it yourself directly, you should have push permissions. :-)
>
> Right-ho, will do :) (I also managed to coerce Boost into handling char*
> exceptions -- the trick was remembering that they're all /cons
Hey,
>> Oh no - let's hope that this doesn't delay the process of actually
>> 'doing it right', i.e. throw exceptions derived from std::exception
>> rather than quick&dirty const char* stuff from the prototyping stage... ;-)
>
> Haha, it won't: I believe the time cost of not being able to write an
Hi Toby,
it certainly has to do with your scalar_vector. Have a look at the
documentation for fast_copy:
http://viennacl.sourceforge.net/doc/namespaceviennacl.html#a815cf9646ece6cc98ec80b3f925c482d
"However, keep in mind that the cpu type MUST represent a linear piece
of memory, otherwise you
Hi,
> I have had troubles compiling matrix-test-* for quite some time, but it
> has gone worse over time. The compilation process appears to eat up one
> core at 100% (i have a core i5!) and over 1GB on RAM, which is enough to
> freeze my computer for 20-25sec.
100% is just what a core is suppos
Hi,
> I've been thinking a bit about dynamically zero-padding
> viennacl::matrix<> for full hardware use ( best bandwidth for BLAS1,
> BLAS2, best performance for BLAS3).
>
> Basically, the big problem arising is that the blocking-parameter is not
> dependent on the hardware or the matrix, but ra
Hey,
>
> Hmm, I'm not completely sure.
> The best GEMM performance are not located "around" (distance-wise in the
> parameter space) the sweet spot, generally, since perturbating one
> parameter can result in disastrous performance.
Yeah, I agree, the sweet spot may not be defined 'distance-wise
Hi Evan,
> OpenMP 4.0 specification has been released, which includes support for
> accelerators, thread affinity, Fortran 2003, etc.:
>
> http://www.hpcwire.com/hpcwire/2013-07-31/openmp_40_specification_released_with_significant_new_standard_features.html
Thanks! I consider the thread affinity
Hey,
> I hope it wont take years.
First compiler implementations will be available in no time, sure.
However, it will take years until enterprise cluster systems like CentOS
have upgraded to these compilers. We still have clusters here with GCC
4.2.x...
> I saw a presentation earlier today th
Hi Phil,
the tests are now split into more light-weight units by separating
single and double precision. matrix-test was additionally split into
row-major and column-major tests. This should now allow you to build with
`make -j4`
on weaker machines with limited RAM.
Best regards,
Karli
On 0
Hi,
> A padding of 256 looks pretty expensive to me, resulting in a lot of
> unnecessary FLOPs in worst case. Can you please assemble a list of
> all GEMM kernel configuration parameters and their execution times
> for the GTX 470, Tesla C2050, HD 7970 and HD 5850? mL, nL, and kL
>
Hi,
> We actually need two sets of files: One for dumping the benchmark
> results, one for holding the 'best' parameter configuration. For
> dumping results, we probably want something more lightweight than XML:
> - JSON
> - Just CSV files with a metadata section, e.g.
> #
Hi,
> Suppose we have vectors v1 and v2. Then, we have four options for the
> semantics of "v1 * v2":
>
> 1) Element-wise product
> 2) Dot product
> 3) Outer product
> 4) Leave undefined
>
> Most of the time, in the rest of PyViennaCL, I've chosen semantics for
> the * operator that make sense gi
Hi Toby,
> The main difficulty with following the conventions is that it's not
> clear which is the convention to pick. NumPy provides both a matrix()
> class and a ndarray() class -- the former has semantics closer to matrix
> algebra, whilst the latter is designed to be closer to having more
>
Hi Toby,
> At the moment, I have the following semantics for *:
>
>Matrix * Matrix -> Matrix (matrix product)
>Matrix * Vector -> Vector (matrix product)
>Matrix * Scalar -> Matrix (scalar product)
>
>Vector * Vector -> Matrix (outer product)
>Vector * Scalar -> Vector (scalar
Hi Toby,
>> I consider Vector * Vector -> Matrix to be surprising or at least
>> somewhat non-intuitive. Following your operations, the most reasonable
>> definition would be
>>
>> >Matrix * Matrix -> Matrix (matrix product)
>> >Vector * Vector -> Vector (element-wise product)
>> >
Hi,
> I've just realized i had forgotten to answer!
> My computer is no longer laggy in single-threaded mode, which is already
> a good thing :) it still cannot bear make -j4, even though it has 4GB of
> RAM, my desktop computer can without any issue, though. I'll update this
> when I have cleane
Hi,
> For a few days, I've been playing around with AMD's CodeXL, the HD5850
> and the generator/autotuner:
>
>
> - First of all, I want to share something that made me completely crazy.
> Avoid :
> *vector += scalar*vector
> *
> in a compute bound context. After replacing the above by:
> *vector.
Hi Toby,
> I'm currently trying to implement matrix and vector proxies in
> PyViennaCL, and I can't get my matrices to look right. Suppose I have
> the following arbitrary 5x5 matrix, as displayed in Python:
>
m.value
> array([[ 1., 2., 3., 4., 0.],
> [ 5., 6., 7., 0.
Hi guys,
let's have another IRC meeting (#ViennaCL on irc.freenode.net) this
week! Everybody is welcome to join :-)
As I'm presumably unavailable on Friday and don't know about my
availability over the weekend, what about tomorrow, Thursday, at 19:00
UTC? Does this work for at least Toby and P
Hi,
> 19:00 UTC suits me fine.
Cool, so have have a meeting of four already. Welcome, Evan :-)
>> Potential topics (of course all work in progress):
>>- pyViennaCL: current status
>>- Scheduler: interface extensions required?
>>- Generator: Define functionality for 1.5.0 release
>>
Hi Phil,
please don't drop the devel mailing list unless you mention some French
secrets (maybe cheese or wine recipes?) which you are not allowed to
share in public ;-)
> They switched from a VLIW-architecture to their GCN architecture
within
> the HD7xxx series:
> http://en.wik
> OK, all working now:
>
m = p.Matrix(10,10,1.0)
p.Assign(m[0:5,0:5], p.Matrix(5,5,5.0)).execute(); m.value
>
> array([[ 5., 5., 5., 5., 5., 1., 1., 1., 1., 1.],
> [ 5., 5., 5., 5., 5., 1., 1., 1., 1., 1.],
> [ 5., 5., 5., 5., 5., 1., 1., 1., 1.
Hi,
>> Currently the closest thing to a roadmap is the issue tracker. We had a
>> couple of long-term ideas on our Trac server in Vienna, but since we are
>> about to close that down, I better not share the link and instead
>> transfer it over to the Github wiki to get this rectified. It's not
>>
Hi again,
I transferred all the interesting stuff over to the wiki:
https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap
Best regards,
Karli
On 08/07/2013 04:18 PM, Toby St Clere Smithe wrote:
> Karl Rupp writes:
>> Currently the closest thing to a roadmap is the issue tr
Hi Evan,
> I remembered I posted this link on my blog a while ago. Hit the link
> again today and see that the new version (May 2013) gives a good shout
> out to ViennaCL for direct solvers:
>
> http://www.netlib.org/utk/people/JackDongarra/la-sw.html
Haha, ViennaCL is certainly most well-known
Hi Xeon Phil ;-)
> There are a lot of problems related to coupling the current BLAS3
> implementation with the kernel generator:
>
> - While I think I could add some range support, adding slices will be
> extremely difficult, and it would probably result in bad performance
> whatever kernel is u
Hi,
> Good news : the GEMMs calls for OpenCL on dense non-proxy matrix now
> call the generator ! It's a good step towards performance portability.
Hurray, indeed it is! Well done! :-)
Now as you fixed some things in the autotuner, I could also give it
another shot on the MIC. Does the autotune
Hi,
> I still have some polishing to do for the autotuner, so that it indeed
> print the same thing as viennacl-info. The problem of the MIC is that I
> have absolutely ZERO idea of how to prune profiles. We can try, with
> the GPU configuration so that it remains tractable, but I don't
> gua
Hi guys,
as discussed at the last IRC meeting on Thursday, I've updated the
scheduler node members. Both LHS and RHS now have the following members:
- type_family: One out of:
COMPOSITE_OPERATION_FAMILY
SCALAR_TYPE_FAMILY
VECTOR_TYPE_FAMILY
MATRIX_TYPE_FAMILY
- subtype:
Hey,
alright, we've got some issues to fight ;-)
On GPUs with 16kB of shared memory (e.g. GTX 285), the generated GEMM
kernels now exceed the available memory:
Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too much
shared data (0x40a0 bytes + 0x10 bytes system, 0x4000 max)
Thi
Hi,
> On GPUs with 16kB of shared memory (e.g. GTX 285), the generated
> GEMM kernels now exceed the available memory:
>
> Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too
> much shared data (0x40a0 bytes + 0x10 bytes system, 0x4000 max)
>
> This is because of
Hi,
> We can directly query the available local device memory (which is the
> reason why I added all this buffering to the device class). Am I missing
> something?
>
>
> Yes, we could. But having the combination {vendor, local memory} seems a
> bit weird to me, I think {vendor, genera
or ID: 4318
Version: OpenCL 1.0 CUDA
Driver Version:304.43
Maybe the work group size exceeds 512? It works well on the GTX 470,
though...
Best regards,
Karli
On 08/13/2013 11:01 AM, Philippe Tillet wrote:
> Hi hi,
>
>
> 20
Hi,
> Yes, the default NVidia profile for double precision uses a work group
> size of 1024... All this is checked during the autotuning procedure so
> that it will work for the hardware it's tunned for...
> Meh, seems like we need a couple additional levels of abstraction to
> reach safety.
In
Hey,
> {vendor, generation} is the natural format for the handling the
> profile internally, yes. This will presumably involve string parsing
> of the device name, yes :-(
>
>
> I'll do that :) Should I add a "generation" method in the ocl::device
> class? I think it is most suited he
Hi guys,
wow, AMD open-sourced their Math libraries...
Best regards,
Karli
---
*AMD Accelerated Parallel Processing Math Libraries (APPML) is now
available as open source as clMath.*
I am extremely pleased to have the opportunity to announce that the
APPML BLAS & FFT proje
Hey,
> I've pushed the changes. Does it solve the GTX285 case?
thanks, it does!
> The policy is :
>
> - One global GPU fallback (very conservative)
> - One global CPU fallback (very conservative)
> - One global Accelerator fallback (very conservative)
> -One Fallback per architecture family
>
Hi,
> Do we want to keep the full device name in the profiles map? With
> vendor and arch determined, we know pretty much everything we need
> to know. If we need to match the name 1:1, there may be too many
> devices which we miss even though the 'faster' profile should work?
>
>
Hi guys,
Parallella will ship their first (OpenCL-enabled!) boards in October and
also offers a university partner program:
http://www.parallella.org/pup/
I'm tempted to order one of the boards in September and eventually apply
for the university partnership program when I'm back in Vienna. Who
Hi Toby,
> Karl Rupp writes:
>> Parallella will ship their first (OpenCL-enabled!) boards in October and
>> also offers a university partner program:
>> http://www.parallella.org/pup/
>
> I'd forgotten about these things! Their roadmap does look intruiging;
>
Hi guys,
the scheduler for kernel fusion makes good progress. Toby, you should be
able to use all of the fundamental dense linear algebra operations now.
There should be only be two blocks of functionality missing:
- Sparse matrices (i.e. matrix-vector products)
- In some cases where += and
Hi again,
what about an IRC meeting on Monday, August 19, 19:00 UTC?
The largest items for the release are about to be completed, so we
transition into the polishing phase.
Potential topics:
- Low-level C interface: How close to BLAS should it be?
- PyViennaCL: Current status
- Interface e
Hey,
> Nice job guys! The peanut gallery approves. I'll continue watching the
> masters at play. :-P
haha, we'll do our best to keep you entertained :-D
Best regards,
Karli
> On Thu, Aug 15, 2013 at 7:22 PM, Karl Rupp wrote:
>> Hi guys,
>>
>> the s
Hi,
> It seems to me that most of the differences between CUDA and OpenCL come
> from the respective APIs, but that the kernel code is very similar in
> the two cases.
> Do you guys think it's possible to easily translate the generated kernel
> from OpenCL to CUDA, by just doing one-to-one repla
Hi Toby,
>> the scheduler for kernel fusion makes good progress. Toby, you should be
>> able to use all of the fundamental dense linear algebra operations now.
>> There should be only be two blocks of functionality missing:
>>- Sparse matrices (i.e. matrix-vector products)
>>- In some cas
Hi Philippe,
rather than having too much speculation here, what about adding a quick
OpenCL-to-CUDA translator (just string substitution, you don't need
more) to the generator? Put the best kernels for Fermi and Kepler into a
compilation unit and then hopefully Denis or Evan will give it a try?
Hey,
> Sorry for too much speculation, it seems like the problem comes from the
> generator, not the OpenCL SDK.
> Seems like I'm way too suspicious, sorry :D
Ok, apparently we need an even larger search space then...
> The good news is that the converter being done, I can have a better
> workf
Hi guys,
this is presumably mostly for Philippe:
I noticed that the autotuning targets in examples/autotuner use a couple
of static parameters. It would be nice to have them dynamically set via
named command line parameters, so that it's easier to run them in a
batched manner via some shell sc
n install Boost on a Parallella board right now. It's not that
hard to match the command line strings directly, is it? ;-)
Best regards,
Karli
>
> 2013/8/22 Karl Rupp mailto:r...@iue.tuwien.ac.at>>
>
> Hi guys,
>
> this is presumably mostly for Philippe:
>
&g
Hey,
> Also, I'm not sure of the 'correct' way to construct a unary type
> node. For instance, if you try and print a unary type node with only lhs
> set (as would make sense), then you get into an infinite loop, because
> the operator<< and print_node functions in scheduler/io.hpp assume
> you'v
Hey,
> So I've implemented a PyViennaCL translation of the blas3_prod test, but
> I'm having difficulties with OPERATION_UNARY_FABS_TYPE. If I construct
> an OPERATION_UNARY_FABS_TYPE node with just a DENSE_ROW_MATRIX as a
> leaf, then everything passes as expected. But if I try and execute
> OPE
Hey,
>> Yeah. I thought that, since I'd fixed the segfault and the expression
>> tree worked for other expressions, the problem didn't lie in the
>> wrapper. But I began to doubt that judgement, so I wrote a C++ test
>> program[1] to do the equivalent construction manually. And that
>> worked.. S
Hey,
> This is done :)
> You can check on ./examples/autotuner/gemm_autotuning --help and tell me
> if it is alright ! It is not heavily tested (I'm on my laptop) but he
> basic tests suggest that it works.
Cool, this looks great already. What about ordering the options such
that the required p
Hi guys,
the ViennaCL API does not allow the mix of row- and column-major dense
matrices for general operations yet, mixing is only allowed for
matrix-matrix products. You should run into some assertions, though, so
I wonder why this is not the case. Did you compile with NDEBUG?
Best regards,
Hi Toby,
> Can I convert all / most of the assertions in the scheduler code to
> exceptions? It would help me (so that Python doesn't just abort if
> something goes wrong), and I think it would help anyone else who happens
> to compile with NDEBUG defined, and who then does something undefined.
Hi Toby,
> Having fixed up a layout consistency check, I seem to have hit a real
> scheduler bug. I'm currently reading the code myself with a view to a
> fix, but since you wrote it, I suspect you'd be quicker. I've written a
> C++ program that produces a seg-fault: http://paste.ubuntu.com/603058
Hi Toby,
> Having fixed up a layout consistency check, I seem to have hit a real
> scheduler bug. I'm currently reading the code myself with a view to a
> fix, but since you wrote it, I suspect you'd be quicker. I've written a
> C++ program that produces a seg-fault: http://paste.ubuntu.com/60305
Hi,
> I have just realized that the Vendor ID was associated with the SDK
> provider, not the Hardware Vendor. That is, Apple's implementation of
> NVidia is slightly different from the original one :
>
> Apple :
> Vendor name : NVIDIA
> Vendor ID : 16918016
>
> NVidia :
> Vendor name : NVIDIA Co
Hi guys,
the OpenCL kernels are now integrated into the main source tree, the
auxiliary/ folder is no longer used. This makes the repository easier to
handle, as `make -j4` works out of the box again and no extra packaging
step is necessary to get a fully self-contained source tree.
There are
Hey,
> There are a few more relatively minor things left for the release, I
> hope to fix most of them today and update the documentation. Philippe,
> please also add more Doxygen documentation to the generator and focus on
> testing the generator output for corner cases. Presumab
Hi Philippe,
> About 6months ago I had heard of a library that also performed
> autotuning (http://raijincl.org), but that offered the same performance
> as ours back then.
> Since then, the performance have *greatly* improved, largely
> outperforming our autotuner :
> - Over 3TFLOP/s on HD7970
>
Hi Philippe,
> Since our generator is skeleton-based anyway, what about having a
look
> at the best performing kernels in RaijinCL and then extending the
> current generator accordingly such that these kernels are covered as
> well? I consider this to be *far* less painful then t
Hi Toby,
> Despite a fairly hectic last couple of days (girlfriend finished her
> PhD, then had to move house...),
congratulations!
> I've been doing some PyViennaCL bug
> fixing in order not to fall behind, and I think I've hit another
> scheduler bug. I've written a test case at [1], and a ba
Hi guys,
FYI: Due to the many new features we've added since the last release,
I'm still polishing the code. Rather than pushing out a few incomplete
pieces, I decided to invest a few more days to round things up, so I'll
postpone (again) the celebrations to next week...
Best regards,
Karli
-
Hey,
> On my side, I've been working a bit more on my unconstrained nonlinear
> minimization library, that I plan to keep as a separate package, because
> i've made it pretty generic (can work theoretically with any linear
> algebra backend - viennacl, eigen, armadillo ...) using template magic.
Hey,
> For my research I will have to deal with extremely nonsquare matrices.
> That is potentially 32*10 000, 4*80 000, or even 128*1 000 000.
> This often occurs in statistics, where one has a few numbers of
> variables in the first dimensions, and a significant number of samples
> in the other
Hi,
> I can't think of any such case where one would want to have control over
> this. This would require knowledge of our implementations to make
> appropriate choices anyway. In order to have a reasonable decision
> process, we need to come up with some heuristics...
> My first idea would be to
Hey hey hey,
> While integrating it completely is trivial. I have no real name for it
> right now though. Let's assume it is named fminpp, then :
>
> typedef fminpp::optimization_options optimization_options;
> typedef fminpp::directions::cg cg;
> //other aliases...
>
> template
> viennacl::vecto
Hi Toby,
sorry for the late reply. I'm finally back to Austria ;-)
> Just a quick update. I'm in the process of writing decent documentation
> for my classes and functions. There are quite a few: sloccount tells me
> that, including tests, I've got about 5500 source lines of code (not
> includi
Hi Toby,
> I get a lot of errors like the below when I enable T = char (or other
> integer numeric types) in PyViennaCL. The error always comes down to the
> "request for member 'handle' in 'val'". Previously, it was in the
> context of arithmetical functions like addition or element_pow, but here
1 - 100 of 511 matches
Mail list logo