Hey,

> I'm proud to announce that after about 3weeks, I've recoded from scratch
> the OpenCL code generator to integrate it fully with
> viennacl::scheduler::statement.

hurray :-) With the changes to the generator I pushed yesterday there is 
now a clear spot on where to hand the expression over to the generator.


> That being said, I'm entering the point where I need to inquire your
> opinion for (many) further design choices. Sorted by priority :
>
> 1 > How to handle padding? For example, the best kernels for a given
> operation may use float4, in which case an alignment of 4 is required.
> For GEMM, though, the kernel internally used blocking. Since the
> iteration over the blocks is unrolled, I prefer to keep the loop
> boundary static (known at the OpenCL compile time), so padding inside a
> kernel is not really an option here. How to handle this?
> Should we have a plethora of kernels optimized for a large number of
> block-sizes?If yes, how to choose the block sizes?

My preferred option is to pad by default and either to make the padding 
a multiple of four or sixteen. However, we need to maintain a full set 
of unpadded operations, because user-provided buffers need not be padded 
(and a subsequent padding may be too expensive)



> 2 > For each operation (BLAS1/BLAS2/BLAS3 for now), an infinite number
> of kernels can be generated. Designing a proper test suite in such a
> situation is a challenging task. I've thought about testing a fixed
> amount of randomly chosen kernel.

Please no random tests. This makes it awfully complicated to fix, 
because eventually one may not even be able to reproduce a previous failure.

Even though the number of possible kernel variations is large (though 
finite), there's only a limited set which actually gives good 
performance. These are the important kernels to be tested thoroughly.


> We also have to choose multiple sizes for the test (because of 1>)...

Sure. Keeping the sizes moderately small will give us a sufficiently 
fast test procedure.


> Finally, multiple operations can be packed together (multiple SAXPY,
> multiple scalar reduction/inner product, multiple vector
> reduction/gemv). If that number of packed operations is too high, the
> local memory usage will be too high and the OpenCL kernel may not
> *compile*. Should we provide a mechanism to evaluate this upper bound at
> runtime (doable) or just use a very conservative value for now (The
> OpenCL standards guarantees 16kB of local memory, the kernel generator
> guarantees an upperbound on the amount of local memory used.) ? I prefer
> the second option.

Sooner or later we will have to go for the runtime option anyway. I 
don't see any benefit of being overly pessimistic with 16kB if we have 
the true local memory available at runtime.



> 3 > There are several expression nodes that should be supported only by
> the generator for now (even though not yet implemented):
>     - reduce<op>(vector_expression)
>     - reduce_rows<op>(matrix_expression)
>     - reduce_cols<op>(matrix_expression)
>     - elementwise relational operators : operator<, operator<=
> operator>, operator >=, operator==, operator!=.
>     - repmat(mat or vector, row_tiling, col_tiling)
>     - vector expression : diag(Mat)
>     - matrix expression : diag(vec)
> My question is : how to provide access for the user to OpenCL-specific
> content, not available (yet) for other backends?
> Another possibility is to keep this issue for ViennaCL.version > 1.5

After the 1.5.0 release. There's too much other new functionality, so 
the release is already over-due. This gives us more time to design the 
API properly rather than coming up with some quick-fix.


> 4 > I want to maintain explicit specifications of the generator (apart
> from the hard-coded bool-returning C++ function) : what operations it
> supports, what it doesn't support. Are you interested? If yes, what
> format would you prefer?

I'm not sure about what you mean by 'explicit specifications'. Could you 
please elaborate?

Best regards,
Karli


------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to