Hi,
It's horrible! As soon as I want to introduce some vectorized types in an
opencl template as simple as AXPY, everything starts exploding.
Well, first things first, I probably need to justify why I think that we
cannot do without double2, float4 in all of our dense kernel templates:
- From my own experience, it turns out that some element-wise expressions
can be easily compute-bound. In statistics it can be pretty easy to
encounter complicated elementwise transforms when evaluating a probability
density function. I've personally had to use SSE on my CPU a couple of
times to alleviate this problem.
- Some vendors explicitely state in their optimization guide that loads of
16 bytes will result in a better bandwidth.
On the other hand, using stride!=1 will prevent the use of vectorized loads
in any kernel (AXPY, GEMM, etc). We're definitely facing a dilemma, here,
where we have to choose between higher JIT overhead (the programs can be
cached, however) and potentially higher execution time. My belief is that
we should provide a fallback program for stride!=1, which will be compiled
only if strided accesses are used.
Note that even this wouldn't solve all our problems. How to handle offsets
that are not multiple of 4? How to handle sizes that are not multiple of 4.
We could use the same fallback, or provide a different optimized kernel.
http://paste.ubuntu.com/7915787/
optimized_1 should be able to handle quite well the remaining cases, while
optimized_0 should be faster because it doesn't have to check for the
alignment contrary to vload4, and doesn't have to do any clean up. In the
case of AXPY, I'd expect optimized_1 to be a better option. For GEMM, I'd
however prefer the "cleanup" to be done in some other kernel calls.
Seriously, what a headache !! But discarding vector types for everything
but GEMM just sounds wrong to me...
Philippe
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel