[ViennaCL-devel] Kernel Generator wrap-up

Philippe Tillet Sun, 28 Jul 2013 08:07:57 -0700

Hello everybody,

I'm proud to announce that after about 3weeks, I've recoded from scratch
the OpenCL code generator to integrate it fully with
viennacl::scheduler::statement.


That being said, I'm entering the point where I need to inquire your
opinion for (many) further design choices. Sorted by priority :

1 > How to handle padding? For example, the best kernels for a given
operation may use float4, in which case an alignment of 4 is required. For
GEMM, though, the kernel internally used blocking. Since the iteration over
the blocks is unrolled, I prefer to keep the loop boundary static (known at
the OpenCL compile time), so padding inside a kernel is not really an
option here. How to handle this?
Should we have a plethora of kernels optimized for a large number of
block-sizes?If yes, how to choose the block sizes?

2 > For each operation (BLAS1/BLAS2/BLAS3 for now), an infinite number of
kernels can be generated. Designing a proper test suite in such a situation
is a challenging task. I've thought about testing a fixed amount of
randomly chosen kernel.
We also have to choose multiple sizes for the test (because of 1>)...
Finally, multiple operations can be packed together (multiple SAXPY,
multiple scalar reduction/inner product, multiple vector reduction/gemv).
If that number of packed operations is too high, the local memory usage
will be too high and the OpenCL kernel may not *compile*. Should we provide
a mechanism to evaluate this upper bound at runtime (doable) or just use a
very conservative value for now (The OpenCL standards guarantees 16kB of
local memory, the kernel generator guarantees an upperbound on the amount
of local memory used.) ? I prefer the second option.

3 > There are several expression nodes that should be supported only by the
generator for now (even though not yet implemented):
   - reduce<op>(vector_expression)
   - reduce_rows<op>(matrix_expression)
   - reduce_cols<op>(matrix_expression)
   - elementwise relational operators : operator<, operator<= operator>,
operator >=, operator==, operator!=.
   - repmat(mat or vector, row_tiling, col_tiling)
   - vector expression : diag(Mat)
   - matrix expression : diag(vec)
My question is : how to provide access for the user to OpenCL-specific
content, not available (yet) for other backends?
Another possibility is to keep this issue for ViennaCL.version > 1.5

4 > I want to maintain explicit specifications of the generator (apart from
the hard-coded bool-returning C++ function) : what operations it supports,
what it doesn't support. Are you interested? If yes, what format would you
prefer?

Best regards,
Philippe

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] Kernel Generator wrap-up

Reply via email to