Re: [ViennaCL-devel] Implementation of multi_inner_prod

2014-06-27 Thread Philippe Tillet
Ok, thanks! This sounds reasonable indeed. Philippe 2014-06-26 23:51 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, the cases 5, 6, and 7 are handled by running a kernel for four vectors, then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3 vectors. This could also be

[ViennaCL-devel] Implementation of multi_inner_prod

2014-06-26 Thread Philippe Tillet
Hello! I note this in the implementation of multi_inner_prod: switch (vec_tuple.const_size() - current_index) { case 7: case 6: case 5: case 4: //do stuff However, there is a test for 5,6,7 so I assume that these

Re: [ViennaCL-devel] Implementation of multi_inner_prod

2014-06-26 Thread Karl Rupp
Hi, the cases 5, 6, and 7 are handled by running a kernel for four vectors, then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3 vectors. This could also be handled by a generated kernel, yes, but I haven't implemented this for two reasons: 1. less kernels to compile 2.