Hey, > I'm proud to announce that after about 3weeks, I've recoded from scratch > the OpenCL code generator to integrate it fully with > viennacl::scheduler::statement.
hurray :-) With the changes to the generator I pushed yesterday there is now a clear spot on where to hand the expression over to the generator. > That being said, I'm entering the point where I need to inquire your > opinion for (many) further design choices. Sorted by priority : > > 1 > How to handle padding? For example, the best kernels for a given > operation may use float4, in which case an alignment of 4 is required. > For GEMM, though, the kernel internally used blocking. Since the > iteration over the blocks is unrolled, I prefer to keep the loop > boundary static (known at the OpenCL compile time), so padding inside a > kernel is not really an option here. How to handle this? > Should we have a plethora of kernels optimized for a large number of > block-sizes?If yes, how to choose the block sizes? My preferred option is to pad by default and either to make the padding a multiple of four or sixteen. However, we need to maintain a full set of unpadded operations, because user-provided buffers need not be padded (and a subsequent padding may be too expensive) > 2 > For each operation (BLAS1/BLAS2/BLAS3 for now), an infinite number > of kernels can be generated. Designing a proper test suite in such a > situation is a challenging task. I've thought about testing a fixed > amount of randomly chosen kernel. Please no random tests. This makes it awfully complicated to fix, because eventually one may not even be able to reproduce a previous failure. Even though the number of possible kernel variations is large (though finite), there's only a limited set which actually gives good performance. These are the important kernels to be tested thoroughly. > We also have to choose multiple sizes for the test (because of 1>)... Sure. Keeping the sizes moderately small will give us a sufficiently fast test procedure. > Finally, multiple operations can be packed together (multiple SAXPY, > multiple scalar reduction/inner product, multiple vector > reduction/gemv). If that number of packed operations is too high, the > local memory usage will be too high and the OpenCL kernel may not > *compile*. Should we provide a mechanism to evaluate this upper bound at > runtime (doable) or just use a very conservative value for now (The > OpenCL standards guarantees 16kB of local memory, the kernel generator > guarantees an upperbound on the amount of local memory used.) ? I prefer > the second option. Sooner or later we will have to go for the runtime option anyway. I don't see any benefit of being overly pessimistic with 16kB if we have the true local memory available at runtime. > 3 > There are several expression nodes that should be supported only by > the generator for now (even though not yet implemented): > - reduce<op>(vector_expression) > - reduce_rows<op>(matrix_expression) > - reduce_cols<op>(matrix_expression) > - elementwise relational operators : operator<, operator<= > operator>, operator >=, operator==, operator!=. > - repmat(mat or vector, row_tiling, col_tiling) > - vector expression : diag(Mat) > - matrix expression : diag(vec) > My question is : how to provide access for the user to OpenCL-specific > content, not available (yet) for other backends? > Another possibility is to keep this issue for ViennaCL.version > 1.5 After the 1.5.0 release. There's too much other new functionality, so the release is already over-due. This gives us more time to design the API properly rather than coming up with some quick-fix. > 4 > I want to maintain explicit specifications of the generator (apart > from the hard-coded bool-returning C++ function) : what operations it > supports, what it doesn't support. Are you interested? If yes, what > format would you prefer? I'm not sure about what you mean by 'explicit specifications'. Could you please elaborate? Best regards, Karli ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel