Hi Philippe, > Watching at the roadmap: > https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap
argl, I forgot to update this after our IRC meeting. The protocol here defines features for 1.6.0 which are far more reasonable: https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Developer-Meetings > I was concerned with 4 elements: > (1) Hook in external BLAS libraries and use them as a computing backend > (2) Distributed vectors and matrices (multiple devices, possibly mixed > CUDA/OpenCL/OpenMP > (3) Support for reductions (vector-reduction, row-wise reduction, > col-wise reduction). Naive OpenMP/CUDA implementation, but integrated in > the kernel generator for OpenCL. > (4) Full integration of the micro-scheduler and the generator. > > Needless to say that this seems overly ambitious! > I had done a prototype for (1), but realized quickly that it would be > pretty complicated to make it stable and robust with respect to devices, > context, etc. Plus, the generator now gives the same (DENSE!) > performance as CuBlas on NVidia GPUs (for Fermi, at least), and > clAmdBlas on AMD GPUs. Linking could allow us to have very good > performance on OPENMP/CUDA, as well as Sparse Linear algebra on OpenCL. > This is interesting, but it is also a good amount of work! We postponed that and instead agreed to focus on the full scheduler integration. > (2) Will also require a huge amount of work. Plus, I think it is > dangerous to do that when we're not even sure of how we handle ViennaCL > on a single device (considering input-dependent kernels, for example). > I'd say we should postpone this Certainly postpone this. Today I got notice that we will have funding for a PhD student working on this. It's still hard to find a good candidate, but at least we have the funding now ;-) > I'll do (3). It's not a lot of work and the kernel generator already > supports it. We just need to add an API. Today there was a user requesting this on sourceforge. I'll also have time in the next days to work on this, but since you volunteered for it, I'll go for the iterative solver optimizations first. > (4) is where I've spent and will spend most of my time. The Kernel > Generator is now fully integrated for all the vector operations, all the > matrix-vector operations (except rank1 updates) and most of the dense > matrix operations (all but LU, FFT,Inplace triangular substitution). > While the database is not populated yet, recent benchmarks suggest very > good performance (Like CuBlas on GTX470, and 80% of the peak on R9 > 290x). I think it is necessary to push forward in this direction, and > make ViennaCL 1.6 a BIG DATA BIG DATA BIG DATA BIG DATAperformance-based > release. I'll help with stripping the op_executor<> beast, so that everything interfaces the scheduler directly. Philippe, did you by chance check the impact of the generator integration on kernel latency? We only have a 1-10us margin to work with, which I haven't checked yet. > I've been very motivated to work on the kernel generator recently, and > simply don't feel like working on (1) or (2) at the moment. Now, there > are two different options, for (4): > 4.1 - Implementing the kernel fusion mechanism inside the scheduler. > 4.2 - Input-dependent kernels, and performance prediction. > > While I could help with 4.1, I don't feel like I could do this task > alone, because I don't have a sufficient knowledge of the backend. Plus, > it implies to get rid of op_executor(), and I'm not sure how I could do > this, too! > I feel operational, though, for 4.2. I feel like ViennaCL 1.6 should be > a performance-oriented release, and having an (input+device)-dependent > kernel selection mechanism is something we have to do! I think we should not go for 4.1 with a 1.6.0 release, simply because it would delay the release cycle. We should provide features to our users fairly quickly after they are stabilized, not have them hanging around in the developer repository for too long. We have enough features for 1.6.0 already ;-) Some work from your side on 4.2 would be good, so if you have some resources left, please focus on that. > Any thoughts on how the roadmap could/should be rearranged? Does the one linked above sound more reasonable? ;-) Best regards, Karli ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft _______________________________________________ ViennaCL-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/viennacl-devel
