Re: [ViennaCL-devel] CUDA slower than OpenCL in new R implementation?

2015-07-31 Thread Philippe Tillet
Hi Charles :) The BLAS kernels for CUDA and OpenCL are entirely different, actually. OpenCL kernels rely on a code-generator, and have been auto-tuned. As far as I know, the CUDA kernels have not been auto-tuned, and don't rely on the same generation engine as the OpenCL ones. While for BLAS1-2,

Re: [ViennaCL-devel] Column-wise kernels?

2015-07-27 Thread Philippe Tillet
Hi, Such row-rise / column-wise reductions could be generate-able by the OpenCL backend, but this won't work on the Host of CUDA backend. Plus, this is not really maintained at the moment. I would recommend Karl's solution, even though it won't be optimal when the vector does not fit in the L2

Re: [ViennaCL-devel] ViennaCL Benchmark GUI 1.0.0 Release Candidate

2014-11-24 Thread Philippe Tillet
Hey :-) Worked well on my laptop :-) A couple of suggestions: - Maybe use layout N-T for GEMM, or perhaps it is already possible to chose? From my experience NT-col major (TN row major) always leads to higher performance on GEMM. - The plots were hard to read because rather small on my laptop.

Re: [ViennaCL-devel] Roadmap update

2014-11-09 Thread Philippe Tillet
Hey :) 2014-11-09 10:06 GMT-05:00 Karl Rupp r...@iue.tuwien.ac.at: Hi guys, I've updated our roadmap taking into account the latest release: https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap Feel free to add your topics and post your wishes :-) Awesome! Is it like a

Re: [ViennaCL-devel] More weird problems

2014-11-05 Thread Philippe Tillet
I remember us already having a problem with strlen on the cache with your NVidia SDK, which disappeared when you rebooted. Didn't we? 2014-11-05 16:25 GMT-05:00 Toby St Clere Smithe m...@tsmithe.net: Toby St Clere Smithe m...@tsmithe.net writes: The segfault happens when calling (in

Re: [ViennaCL-devel] Segfault running PyViennaCL direct solver tests

2014-11-04 Thread Philippe Tillet
Hey, Sorry for the late answer. I've been extremely busy with my stats homework lately. The caching mechanism indeed doesn't account for the device. This is pretty easy to add, ie append the device name + platform version + platform name when doing the hashing. Philippe 2014-11-04 16:12

Re: [ViennaCL-devel] Segfault running PyViennaCL direct solver tests

2014-11-04 Thread Philippe Tillet
, Philippe Tillet phil.til...@gmail.com writes: Sorry for the late answer. I've been extremely busy with my stats homework lately. The caching mechanism indeed doesn't account for the device. This is pretty easy to add, ie append the device name + platform version + platform name when

Re: [ViennaCL-devel] Benchmark GUI - GSoC Closing Words and Future Plans

2014-08-25 Thread Philippe Tillet
Hey Namik, Congratulations! :-) Yes, we very hope that you'll stay with us in this adventure. I personally really like open-source development because (1) it's really educative, and (2) it makes me feel free. I think that research/jobs can put a lot of pressure on me, to the point that it can

Re: [ViennaCL-devel] Roadmap to 1.6 : Cleaning the code, refurbishing the test suite, the benchmark suite, etc...

2014-08-17 Thread Philippe Tillet
Hey, 2014-08-17 11:52 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, So it seems like most of the features are ready for ViennaCL 1.6. My merge from a few days ago (finally) fully integrated the use of device-specific kernels for BLAS1, BLAS2, BLAS3. hurray! :-) The reduction API

Re: [ViennaCL-devel] Roadmap to 1.6 : Cleaning the code, refurbishing the test suite, the benchmark suite, etc...

2014-08-17 Thread Philippe Tillet
://github.com/viennacl/viennacl-dev/issues/71 https://github.com/viennacl/viennacl-dev/issues/77 https://github.com/viennacl/viennacl-dev/issues/66 https://github.com/viennacl/viennacl-dev/issues/2 Philippe 2014-08-17 19:36 GMT+02:00 Philippe Tillet phil.til...@gmail.com: So the dense benchmark suite

Re: [ViennaCL-devel] Benchmark GUI Expert Mode

2014-08-17 Thread Philippe Tillet
Hey Namik, The code looks fine. As a small tip, I would advise to use blas3MatrixSize{A,B,C} = {M, N, K} ; it's much more conventional. I would also suggest to remove LU from the benchmark. I only achieve 11 GFLOP/s on my machine (GEMM peaks at 120GFLOP/s). It will smash the overall score if you

[ViennaCL-devel] Roadmap to 1.6 : Cleaning the code, refurbishing the test suite, the benchmark suite, etc...

2014-08-16 Thread Philippe Tillet
Hey! So it seems like most of the features are ready for ViennaCL 1.6. My merge from a few days ago (finally) fully integrated the use of device-specific kernels for BLAS1, BLAS2, BLAS3. The reduction API is still missing, though, but I think that the priority should be to polish the code, and to

[ViennaCL-devel] Testing GEMM

2014-08-14 Thread Philippe Tillet
Hey, The GEMM kernel(s) are getting pretty tricky, with quite a few fallbacks involved. This gets hard to test, so I thought it could be a good idea to discuss this. Basically, here is how it works: A = [A1 A2; A3 A4] B = [B1 B2; B3 B4] C = [C1 C2; C3 C4] Where each block is divided according

Re: [ViennaCL-devel] Benchmark GUI Feedback Needed

2014-08-11 Thread Philippe Tillet
Hello ! This all looks pretty good. Good job! 2014-08-12 3:40 GMT+02:00 Namik Karovic namik.karo...@gmail.com: Hi Karl, I'm fine with splitting things into something like Basic Benchmark and Expert Benchmark ('view' sounds inappropriate), but as long as both benchmark do the same thing,

Re: [ViennaCL-devel] Tolerances for tests

2014-08-05 Thread Philippe Tillet
Hey Toby, My two cents: Don't forget that while matrix-vector multiplication will still introduce some round-off errors. Ie, when you are computing y = A*[1,1,...] then you are actually computing something like y' = A*( [1,1,...]+eps). GEMV is (backward stable) so you are sure that y' will be

[ViennaCL-devel] On the use of vector types in viennacl's opencl kernels

2014-07-31 Thread Philippe Tillet
Hi, It's horrible! As soon as I want to introduce some vectorized types in an opencl template as simple as AXPY, everything starts exploding. Well, first things first, I probably need to justify why I think that we cannot do without double2, float4 in all of our dense kernel templates: - From my

[ViennaCL-devel] OpenMP Matrix Multiplication

2014-07-27 Thread Philippe Tillet
Hi guys, So I expect ViennaCL 1.6 to offer some really good performance on CPUs with the OpenCL backend -- possibly 80% of OpenBLAS / MKL on a Core i7 4770, for example. As the OpenCL kernel generator and the auto-tuner will get better, we can hope for further improvements. This will create a

[ViennaCL-devel] ViennaCL console benchmarks

2014-07-14 Thread Philippe Tillet
Hey, I've noted that the console benchmarks for ViennaCL were quite outdated, performance for AXPY are reported in FLOP/s, for example. I think it'd be great to have something compact, all incorporated in a single benchmarking executable: === BLAS [float, full]

Re: [ViennaCL-devel] Benchmark GUI First Look

2014-07-11 Thread Philippe Tillet
Hi Namik, Good job! It all looks very appealing. I don't have much to say. Just a few comments: - I'd rather use the median instead of the averge, indeed. - As for the latency in the expert section, it would be great to also have an execution time vs size plot, in order to show until when the

Re: [ViennaCL-devel] ViennaCL 1.6 Roadmap

2014-07-09 Thread Philippe Tillet
if caching is disabled. Philippe 2014-07-09 17:53 GMT+02:00 Philippe Tillet phil.til...@gmail.com: Hey hey, 2014-07-09 14:47 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hey, Philippe, did you by chance check the impact of the generator integration on kernel latency? We only have

[ViennaCL-devel] ViennaCL 1.6 Roadmap

2014-07-08 Thread Philippe Tillet
Hello, Watching at the roadmap: https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap I was concerned with 4 elements: (1) Hook in external BLAS libraries and use them as a computing backend (2) Distributed vectors and matrices (multiple devices, possibly mixed CUDA/OpenCL/OpenMP (3)

Re: [ViennaCL-devel] ViennaCL 1.6 Roadmap

2014-07-08 Thread Philippe Tillet
Hi, 2014-07-08 20:59 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi Philippe, Watching at the roadmap: https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap argl, I forgot to update this after our IRC meeting. The protocol here defines features for 1.6.0 which are far more

Re: [ViennaCL-devel] GEMM broken in Nightly builds

2014-07-07 Thread Philippe Tillet
Hey, After some investigations it looks like the problem is not with the GEMM kernel but with the way the kernel is enqueued. It fails when A and B are associated with the same handle in C = alpha*op(A)*op(A) + beta*C... (this handle-checking feature is to allow for some optimizations in other

Re: [ViennaCL-devel] GEMM broken in Nightly builds

2014-07-07 Thread Philippe Tillet
Until this is fixed, I disable the use of the generator for GEMM. 2014-07-07 15:00 GMT+02:00 Philippe Tillet phil.til...@gmail.com: Hey, After some investigations it looks like the problem is not with the GEMM kernel but with the way the kernel is enqueued. It fails when A and B

Re: [ViennaCL-devel] Implementation of multi_inner_prod

2014-06-27 Thread Philippe Tillet
is less than 12.5% over the ideal case already, but at the same time the kernel still works for older GPUs with limited amounts of shared memory. Best regards, Karli On 06/26/2014 11:09 PM, Philippe Tillet wrote: I'll add something. I assume that multiple kernels are launched thanks

Re: [ViennaCL-devel] PyViennaCL midterm

2014-06-27 Thread Philippe Tillet
Hi, Unfortunately I won't be available until Tuesday for a meeting. Python and CUDA-based libraries are widely used by the Machine Learning community. I also want to push OpenCL forwards, but supporting CUDA through PyViennaCL would be a very good thing to do, since a lot of researcher think that

Re: [ViennaCL-devel] PyViennaCL midterm

2014-06-27 Thread Philippe Tillet
I'll be available from Tuesday afternoon on. What about wednesday 13:00 UTC and 15:00 UTC? 2014-06-27 18:30 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hey, Unfortunately I won't be available until Tuesday for a meeting. Python and CUDA-based libraries are widely used by the Machine

[ViennaCL-devel] Implementation of multi_inner_prod

2014-06-26 Thread Philippe Tillet
Hello! I note this in the implementation of multi_inner_prod: switch (vec_tuple.const_size() - current_index) { case 7: case 6: case 5: case 4: //do stuff However, there is a test for 5,6,7 so I assume that these

Re: [ViennaCL-devel] Behavior of norm_* on vectorint

2014-06-24 Thread Philippe Tillet
Hey 2014-06-24 12:29 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hey, If yes,I think that it should be changed because this easily violates the axioms of a norm : we can have norm(alpha*v) != alpha*norm(v) because of the rounding. This will usually be the case

Re: [ViennaCL-devel] Are {op_row, op_diag, op_column} unary or binary?

2014-06-17 Thread Philippe Tillet
sometimes it may coincide) Philippe 2014-06-17 10:29 GMT+02:00 Toby St Clere Smithe m...@tsmithe.net: Hey Philippe, Philippe Tillet phil.til...@gmail.com writes: The integration of the generator is going on slowly but safely. Vector kernels are fully integrated and I'm about to support some

Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-06 Thread Philippe Tillet
Hey Namik, 2014-05-06 19:43 GMT+02:00 Namik Karovic namik.karo...@gmail.com: Hello, Apologies for not replying earlier, I've been quite busy these last two days. Don't worry ;) So far I have been exploring the advantages/disadvantages of using QML/QtQuick vs traditional widget based

Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-06 Thread Philippe Tillet
Hi, 2014-05-06 9:38 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, Why is data pointless? I'd rather have only a few datapoints on new hardware out there rather than having absolutely no data at all. I mean, the data is pretty useful because it tells us about the best default

Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-05 Thread Philippe Tillet
Hi, 2014-05-05 9:18 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, (CC-ing viennacl-devel, as this is developer-talk ;-) ) Either way, I want to let you know that the generator/auto-tuner is undergoing significant changes, and that you will, actually, not have to worry about it for

Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-05 Thread Philippe Tillet
Hi hi, 2014-05-05 21:49 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, Well, I think this is not entirely unrelated. The purpose of the GUI is still to allow a broader community to feed us with benchmark data, so somehow the loop over all possible configurations is still

Re: [ViennaCL-devel] OpenCL C++ API

2014-04-29 Thread Philippe Tillet
Hi, 2014-04-29 15:59 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, So I can't help but to bring up this topic :) Is there any reason why we're using the OpenCL C API instead of the C++ one? Yes, the reason is simple: The C++ API was standardized quite some time *after* the

Re: [ViennaCL-devel] OpenCL C++ API

2014-04-29 Thread Philippe Tillet
Hi, 2014-04-29 16:54 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, It seems like we could save several thousands of lines of code (and gain a lot of clarity) by using the C++ API directly. Well, I'm not so sure about that actually. I'd more conservatively

Re: [ViennaCL-devel] Benchmark GUI Project Overview

2014-04-27 Thread Philippe Tillet
Hey Namik, Congratulations for your acceptance to the GSoC! I don't know to which extent this blog is customizable, but it would be nice to have some sub-sections related to some sub-parts of the project, to clearly distinguish your updates/ideas on the GUI itself from those you'll have on e.g.

[ViennaCL-devel] WebCL 1.0 final specifications released

2014-03-24 Thread Philippe Tillet
Hello everybody, After two years pending, the final specifications for WebCL 1.0 were released a couple of days ago. It is logically based on OpenCL 1.1 since ViennaCL doesn't support anything more. I don't see any clear applications of ViennaCL with that, and I'm incredibly bad with everything

Re: [ViennaCL-devel] Ideas for Google Summer of Code?

2014-02-22 Thread Philippe Tillet
Hey everybody, My recent advances on auto-tuning gave birth to a new GSoC idea in my mind. More exactly, I've come up with something more complete around (crowd-sourced) auto-tuning and the GUI. This would include: - Developing a portable auto-tuning GUI (as of now : BLAS1 / Dense BLAS2 / Dense

Re: [ViennaCL-devel] Ideas for Google Summer of Code?

2014-02-15 Thread Philippe Tillet
Hi, I completely agree, concerning matrix-free implementations of the linear solver. Their absence is the very reason why I had to reimplement solvers for UMinTL. Furthermore, some other fancy stopping criterions may be provided. For example, some algorithms in unconstrained optimization use CG

Re: [ViennaCL-devel] More extensive Nightly Tests booting...

2014-02-14 Thread Philippe Tillet
Hi Karl, Wow, that's really neat! I'll fix the warnings for Clang and for generator_blas1-opencl Philippe 2014-02-14 10:38 GMT+01:00 Karl Rupp r...@iue.tuwien.ac.at: Hi guys, in the past few days we worked here in Vienna on setting up an automated nightly build system based on CTest and

[ViennaCL-devel] viennacl::reduce and viennacl::row/col_wise()

2014-02-12 Thread Philippe Tillet
Hello, So, as of now, the generation of row-wise reduction can be triggered through the interface: viennacl::reduceop_add(viennacl::row_wise(Mat)) viennacl::reduceop_max(viennacl::col_wise(Mat)) viennacl::reduceop_min(Vec) This plugs into a statement under the form:

Re: [ViennaCL-devel] Ideas for Google Summer of Code?

2014-02-03 Thread Philippe Tillet
Hi, I'll be once more available as a mentor :) I'll be myself pretty busy with some BLAS2/BLAS3 tuning for Hawaii. I'm also in favor of ideas of projects which don't require a strong knowledge of the current codebase, such as the GUI autotuning/benchmarking tool. I think that ViennaCL could also

Re: [ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-26 Thread Philippe Tillet
Hey, I think we agree on everything now! Okay, I will generate all the kernels, this will lead actually to 16 kernels for each cpu-gpu scalar combination, so 64 small kernels in total. This took time but it was a fruitful discussion :) Anyways, my ideas are much clearer now, thanks! Best

[ViennaCL-devel] Altera OpenCL optimization guide

2014-01-26 Thread Philippe Tillet
Hello everyone, I have found this relatively new and interesting PDF file : http://www.altera.com/literature/hb/opencl-sdk/aocl_optimization_guide.pdf. I'll read it overnight. This is of course for a mid/long-term perspective, but there are some remarkable points within, for example (some teasing

Re: [ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-25 Thread Philippe Tillet
Hey hey Karl, 2014/1/25 Karl Rupp r...@iue.tuwien.ac.at Hi Phil, Oh, I get it better now. I am not entirely convinced, though ;) From my experience, the overhead of the jit launch is negligible compared to the compilation of one kernel. I'm not sure whether compiling two kernels

[ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-24 Thread Philippe Tillet
Hello, I am a bit confused, is there any reason for using reciprocal and flip_sign, instead of just changing the scalar accordingly? Best regards, Philippe -- CenturyLink Cloud: The Leader in Enterprise Cloud Services.

Re: [ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-24 Thread Philippe Tillet
Hi Karl, 2014/1/24 Karl Rupp r...@iue.tuwien.ac.at Hey, I am a bit confused, is there any reason for using reciprocal and flip_sign, instead of just changing the scalar accordingly? yes (with a drawback I'll discuss at the end): Consider the family of operations x = +- y OP1 a +-

Re: [ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-24 Thread Philippe Tillet
Hey, 2014/1/24 Karl Rupp r...@iue.tuwien.ac.at Hi, I was in fact wondering why one passed reciprocal_alpha and flip_sign into the kernel. After thinking more about it, I have noticed that this permits us to do the corresponding inversion/multiplication within the kernel, and therefore

Re: [ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-24 Thread Philippe Tillet
Hey hey, 2014/1/25 Karl Rupp r...@iue.tuwien.ac.at Hi, I prefer option 3. This would allow for something like : if(size(x)1e5 stride==1 start==0){ Here we also need to check the internal_size to fit the vector width //The following steps are costly for small vectors

Re: [ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-24 Thread Philippe Tillet
Hey, 2014/1/25 Karl Rupp r...@iue.tuwien.ac.at Hey hey hey, Convergence depends on what is inside generate_execute() ;-) How is the problem with alpha and beta residing on the GPU addressed? How will the batch-compilation look like? The important point is that for the

Re: [ViennaCL-devel] AXPY and reciprocal, flip_sign parameters

2014-01-24 Thread Philippe Tillet
Hi, Oh, I get it better now. I am not entirely convinced, though ;) From my experience, the overhead of the jit launch is negligible compared to the compilation of one kernel. I'm not sure whether compiling two kernels in the same program or two different program creates a big difference. Plus,

Re: [ViennaCL-devel] Roadmap update after 1.5.0 release

2014-01-21 Thread Philippe Tillet
, they are duplicated between opencl/cuda/openmp . Once this is done, I will probably work towards the full integration of the micro-scheduler. Can we get rid of op_executor? Best regards, Philippe 2013/12/27 Philippe Tillet phil.til...@gmail.com Hey, Sorry for the late reply :P I'm supposed to defend my

[ViennaCL-devel] Blas linking and internal design

2014-01-21 Thread Philippe Tillet
Hey Karl, So today I went back to ViennaCL. I tried to move the equivalence columntrans = rownotrans upwards in the dispatching mechanism but it turns out to be impossible, because matrixT,row_major is not (and should not be) convertible to matrixT, column_major, rendering the underlying

Re: [ViennaCL-devel] Roadmap update after 1.5.0 release

2013-12-27 Thread Philippe Tillet
Hey, Sorry for the late reply :P I'm supposed to defend my MSc in 2 weeks, and I am yet to start writing my thesis... (I won't have a lot of time to give to ViennaCL until everything is sorted out) 2013/12/23 Karl Rupp r...@iue.tuwien.ac.at Hi guys, Now as 1.5.0 is out, I spent some thoughts

[ViennaCL-devel] Handling Layout/Transpose ASAP for GEMM/GEMV ?

2013-12-19 Thread Philippe Tillet
Hey, I've started back on the generator today, and realized how ugly the dispatching mechanism was, to take advantage of the equivalencies based on the fact that RowMajor + Trans = ColMajor + NoTrans Actually, I've been wondering : why wouldn't we do this on the whole codebase? We could

Re: [ViennaCL-devel] Call for testing: PyViennaCL on Ubuntu

2013-12-19 Thread Philippe Tillet
*Sneeks in* (Seems like it's time to hide a if( rand() RAND_MAX/2) return; somewhere in the code where Karl won't find it !) :D Philippe 2013/12/19 Karl Rupp r...@iue.tuwien.ac.at Hi Toby, please allow for ~1 more day, then 1.5.0 is out and I'm available for testing :-) Best

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-18 Thread Philippe Tillet
Hey, 2013/12/18 Karl Rupp r...@iue.tuwien.ac.at Hi. A short update : I've implemented linkage to CBlas and CuBlas with dynamic selection. If activated through VIENNACL_WITH_CUBLAS, one can go back and forth between cublas and the original backend by doing: A.blas().gemm(NULL);

Re: [ViennaCL-devel] Call for testing: PyViennaCL on Ubuntu

2013-12-17 Thread Philippe Tillet
Hey Toby, Excellent ! Thank you ! I'm installing it right away, and I'll test it later tonight. Philippe 2013/12/17 Toby St Clere Smithe m...@tsmithe.net Toby St Clere Smithe m...@tsmithe.net writes: Yep, looks like the build was successful, so I'll go ahead and make sure it's all

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-17 Thread Philippe Tillet
that option 2 is better, considering that there is already cuda_handle(), opencl_handle(), cpu_handle() or something similar, if I'm correct. Any advice? Best regards, Philippe 2013/12/15 Philippe Tillet phil.til...@gmail.com Hi, 2013/12/15 Karl Rupp r...@iue.tuwien.ac.at Hi, Yeah

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Philippe Tillet
Hey, 2013/12/15 Karl Rupp r...@iue.tuwien.ac.at Hi again, While we're at it, let's discuss the dynamic dispatching mechanism we'd ideally want. I see two options: (1) A global function pointer table. So, one could for example set: viennacl::internal_blas::sgemv_ptr =

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Philippe Tillet
Hi, 2013/12/15 Karl Rupp r...@iue.tuwien.ac.at Hey, I agree. However, it seems to me that setting the implementation for each matrix would end up being tedious... one table per memory backend since to make sense conceptually to me, since the performance (and the portability) of each

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Philippe Tillet
Hi, 2013/12/15 Karl Rupp r...@iue.tuwien.ac.at Hi, Yeah, it certainly is a bit tedious. Feel free to only do this for matrix-matrix multiplications for now, a full operation table is presumably too much of a refactoring for ViennaCL 1.x.y, but much better suited for

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-14 Thread Philippe Tillet
Hello, I've just realized that most BLAS implementation don't provide anyway to do strided matrix accesses in the non-leading dimension ... ! Is this correct? I was hoping that we could have avoided such special cases, but it seems like a couple of tests will need to be made. Philippe

[ViennaCL-devel] Generator's repmat API

2013-11-04 Thread Philippe Tillet
Hello everybody, I am done implementing : x = viennacl::reduceop(viennacl::rows(A)); x = viennacl::reduceop(viennacl::cols(A)); s = viennacl::reduceop(x); In the generator. For now, the op supported are : add, mult, max, min. I can't support them all, because I need to provide their neutral

Re: [ViennaCL-devel] implicit GPU-to-CPU scalar conversion of viennacl::scalar_expression...

2013-10-27 Thread Philippe Tillet
Hello, I had not noticed that only the first reduction would be executed in this case, so my arguments were indeed invalid :) However, I am now even more worried than before ;) This makes the assumption that the 2-way reduction will always be the best way to compute an inner-product on any OpenCL

Re: [ViennaCL-devel] implicit GPU-to-CPU scalar conversion of viennacl::scalar_expression...

2013-10-27 Thread Philippe Tillet
Hi hi, 2013/10/27 Karl Rupp r...@iue.tuwien.ac.at Hi, This makes the assumption that the 2-way reduction will always be the best way to compute an inner-product on any OpenCL device. We want the reduction-based programs to be device-specific, so these sometimes truncated operations

[ViennaCL-devel] implicit GPU-to-CPU scalar conversion of viennacl::scalar_expression...

2013-10-26 Thread Philippe Tillet
Hello, Now that I'm back to some C++ coding, I want to finish the integration of viennacl::op_reduce. I've noticed a lot of different operator overloads for viennacl::scalar_expression, with basically different implicit conversions to raw scalar. I'm a bit skeptical here :) This allows to handle

Re: [ViennaCL-devel] Adding op_element.subfamily_type into the scheduler

2013-10-17 Thread Philippe Tillet
A clearer classification : OPERATION_FUNCTION_SUB_TYPE_FAMILY (norm, prod, inner_prod, etc...) OPERATION_ELEMENT_FUNCTION_SUB_TYPE_FAMILY (abs, pow, etc) OPERATION_ELEMENT_OPERATOR_SUB_TYPE_FAMILY(+, ==, , etc...) Philippe 2013/10/18 Philippe Tillet phil.til...@gmail.com Hello, Currently

Re: [ViennaCL-devel] Adding op_element.subfamily_type into the scheduler

2013-10-17 Thread Philippe Tillet
Hey, While we're at it. I'm implementing reductions, now. There are two options here : templateclass OP, class VectorType reduce(VectorType const v) { return scalar_expressionVectorType, OP, reduce_type(v, OP()); } or templateclass OP, class VectorType reduce(VectorType const v) {

Re: [ViennaCL-devel] Adding op_element.subfamily_type into the scheduler

2013-10-17 Thread Philippe Tillet
to the same end-tree anyway, which will lead to the same problem inside the statement... Philippe 2013/10/18 Philippe Tillet phil.til...@gmail.com Hey, While we're at it. I'm implementing reductions, now. There are two options here : templateclass OP, class VectorType reduce(VectorType

[ViennaCL-devel] Common base for implicit_vector_base and vector_base...makes sense?

2013-10-16 Thread Philippe Tillet
Hi, It seems like the behavior of scalar_vector, unit_vector etc has changed a bit since the appearance of the kernel generator. I am currently extending the API of the generator, with relational operators. I want to design a specific kernel which checks for X[i] 0.42, for all i. Since operator

Re: [ViennaCL-devel] Common base for implicit_vector_base and vector_base...makes sense?

2013-10-16 Thread Philippe Tillet
Hi hi, 2013/10/16 Karl Rupp r...@iue.tuwien.ac.at Hi, It seems like the behavior of scalar_vector, unit_vector etc has changed a bit since the appearance of the kernel generator. I am currently extending the API of the generator, with relational operators. I want to design a specific

Re: [ViennaCL-devel] Common base for implicit_vector_base and vector_base...makes sense?

2013-10-16 Thread Philippe Tillet
Hey hey, Well, the main problem I have with incorporating implicit_vector_base inside vector_base is that this sounds like replacing inheritance with switches on enum :P However, I think I have found a solution which will satisfy both of us: viennacl::vector_base already have this constructor:

Re: [ViennaCL-devel] IRC meeting on Friday, 15:00 UTC?

2013-10-03 Thread Philippe Tillet
Hey, I'll be there! Philippe 2013/10/2 Karl Rupp r...@iue.tuwien.ac.at Hi guys, we haven't had an IRC meeting for quite a while now. I'm finally done with most of my relocation from the US back to Austria, so I propose to have our next IRC meeting on Friday, October 4, at 15:00 UTC. Is

Re: [ViennaCL-devel] IRC meeting on Friday, 15:00 UTC?

2013-10-03 Thread Philippe Tillet
have to provide a fair comparison in order to orient the scientists that are looking for a high-level GPGPU solution. Philippe 2013/10/3 Toby St Clere Smithe m...@tsmithe.net Yep, so will I. Toby Philippe Tillet phil.til...@gmail.com writes: Hey, I'll be there! Philippe

[ViennaCL-devel] Incorporating reductions in ViennaCL

2013-09-06 Thread Philippe Tillet
Hi everybody :) Okay, so in the roadmap i've added Reductions support for ViennaCL 1.6 ... I plan to take care of it for the three backends, but there are several things to consider here. For now, I will call them reduce, reduce_rows, reduce_cols. A convenience layer such that reduce(mat.rows())

Re: [ViennaCL-devel] Auto-Tuner, GEMM, GEMV... : Integrating RaijinCL into the generator

2013-08-30 Thread Philippe Tillet
Hi hi, 2013/8/30 Karl Rupp r...@iue.tuwien.ac.at Hi Philippe, About 6months ago I had heard of a library that also performed autotuning (http://raijincl.org), but that offered the same performance as ours back then. Since then, the performance have *greatly* improved, largely

Re: [ViennaCL-devel] Call to those with an NVidia GeForce Kepler graphic card : autotuning

2013-08-19 Thread Philippe Tillet
, 2013 4:14 PM, Philippe Tillet phil.til...@gmail.com wrote: Hello everybody, For providing good default GEMM kernels for the Kepler Architecture, I need the help of the community ! :) I'm looking for someone with an NVidia GeForce Kepler graphic card... If there is such person here, would he

[ViennaCL-devel] OpenCL to CUDA kernel translation

2013-08-16 Thread Philippe Tillet
Hey everyone, It seems to me that most of the differences between CUDA and OpenCL come from the respective APIs, but that the kernel code is very similar in the two cases. Do you guys think it's possible to easily translate the generated kernel from OpenCL to CUDA, by just doing one-to-one

Re: [ViennaCL-devel] Scheduler progresses

2013-08-15 Thread Philippe Tillet
Hi, 2013/8/16 Karl Rupp r...@iue.tuwien.ac.at Hi guys, the scheduler for kernel fusion makes good progress. Toby, you should be able to use all of the fundamental dense linear algebra operations now. There should be only be two blocks of functionality missing: - Sparse matrices (i.e.

Re: [ViennaCL-devel] Compilation load of matrix-test-*

2013-08-06 Thread Philippe Tillet
. This should now allow you to build with `make -j4` on weaker machines with limited RAM. Best regards, Karli On 08/01/2013 08:35 PM, Philippe Tillet wrote: Hi everybody, I have had troubles compiling matrix-test-* for quite some time, but it has gone worse over time. The compilation process

[ViennaCL-devel] On Autotuning GEMM

2013-08-06 Thread Philippe Tillet
Hey everybody, For a few days, I've been playing around with AMD's CodeXL, the HD5850 and the generator/autotuner: - First of all, I want to share something that made me completely crazy. Avoid : *vector += scalar*vector * in a compute bound context. After replacing the above by: *vector.s0 +=

Re: [ViennaCL-devel] Kernel Generator wrap-up

2013-07-29 Thread Philippe Tillet
Hi again ! The generator code is pushed on the master branch. 2013/7/28 Karl Rupp r...@iue.tuwien.ac.at Hey, My preferred option is to pad by default and either to make the padding a multiple of four or sixteen. However, we need to maintain a full set of unpadded

[ViennaCL-devel] Kernel Generator wrap-up

2013-07-28 Thread Philippe Tillet
Hello everybody, I'm proud to announce that after about 3weeks, I've recoded from scratch the OpenCL code generator to integrate it fully with viennacl::scheduler::statement. That being said, I'm entering the point where I need to inquire your opinion for (many) further design choices. Sorted by