Ok, thanks!
This sounds reasonable indeed.
Philippe
2014-06-26 23:51 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at:
Hi,
the cases 5, 6, and 7 are handled by running a kernel for four vectors,
then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3
vectors. This could also be
Hello!
I note this in the implementation of multi_inner_prod:
switch (vec_tuple.const_size() - current_index)
{
case 7:
case 6:
case 5:
case 4:
//do stuff
However, there is a test for 5,6,7 so I assume that these
Hi,
the cases 5, 6, and 7 are handled by running a kernel for four vectors,
then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3
vectors. This could also be handled by a generated kernel, yes, but I
haven't implemented this for two reasons:
1. less kernels to compile
2.