I'm proud to announce that after about 3weeks, I've recoded from scratch
the OpenCL code generator to integrate it fully with
That being said, I'm entering the point where I need to inquire your
opinion for (many) further design choices. Sorted by priority :
1 > How to handle padding? For example, the best kernels for a given
operation may use float4, in which case an alignment of 4 is required. For
GEMM, though, the kernel internally used blocking. Since the iteration over
the blocks is unrolled, I prefer to keep the loop boundary static (known at
the OpenCL compile time), so padding inside a kernel is not really an
option here. How to handle this?
Should we have a plethora of kernels optimized for a large number of
block-sizes?If yes, how to choose the block sizes?
2 > For each operation (BLAS1/BLAS2/BLAS3 for now), an infinite number of
kernels can be generated. Designing a proper test suite in such a situation
is a challenging task. I've thought about testing a fixed amount of
randomly chosen kernel.
We also have to choose multiple sizes for the test (because of 1>)...
Finally, multiple operations can be packed together (multiple SAXPY,
multiple scalar reduction/inner product, multiple vector reduction/gemv).
If that number of packed operations is too high, the local memory usage
will be too high and the OpenCL kernel may not *compile*. Should we provide
a mechanism to evaluate this upper bound at runtime (doable) or just use a
very conservative value for now (The OpenCL standards guarantees 16kB of
local memory, the kernel generator guarantees an upperbound on the amount
of local memory used.) ? I prefer the second option.
3 > There are several expression nodes that should be supported only by the
generator for now (even though not yet implemented):
- elementwise relational operators : operator<, operator<= operator>,
operator >=, operator==, operator!=.
- repmat(mat or vector, row_tiling, col_tiling)
- vector expression : diag(Mat)
- matrix expression : diag(vec)
My question is : how to provide access for the user to OpenCL-specific
content, not available (yet) for other backends?
Another possibility is to keep this issue for ViennaCL.version > 1.5
4 > I want to maintain explicit specifications of the generator (apart from
the hard-coded bool-returning C++ function) : what operations it supports,
what it doesn't support. Are you interested? If yes, what format would you
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
ViennaCL-devel mailing list