Hi Philippe,

 > Watching at the roadmap:
> https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap

argl, I forgot to update this after our IRC meeting. The protocol here 
defines features for 1.6.0 which are far more reasonable:

https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Developer-Meetings



> I was concerned with 4 elements:
> (1) Hook in external BLAS libraries and use them as a computing backend
> (2) Distributed vectors and matrices (multiple devices, possibly mixed
> CUDA/OpenCL/OpenMP
> (3) Support for reductions (vector-reduction, row-wise reduction,
> col-wise reduction). Naive OpenMP/CUDA implementation, but integrated in
> the kernel generator for OpenCL.
> (4) Full integration of the micro-scheduler and the generator.
>
> Needless to say that this seems overly ambitious!
> I had done a prototype for (1), but realized quickly that it would be
> pretty complicated to make it stable and robust with respect to devices,
> context, etc. Plus, the generator now gives the same (DENSE!)
> performance as CuBlas on NVidia GPUs (for Fermi, at least), and
> clAmdBlas on AMD GPUs. Linking could allow us to have very good
> performance on OPENMP/CUDA, as well as Sparse Linear algebra on OpenCL.
> This is interesting, but it is also a good amount of work!

We postponed that and instead agreed to focus on the full scheduler 
integration.



> (2) Will also require a huge amount of work. Plus, I think it is
> dangerous to do that when we're not even sure of how we handle ViennaCL
> on a single device (considering input-dependent kernels, for example).
> I'd say we should postpone this

Certainly postpone this. Today I got notice that we will have funding 
for a PhD student working on this. It's still hard to find a good 
candidate, but at least we have the funding now ;-)


> I'll do (3). It's not a lot of work and the kernel generator already
> supports it. We just need to add an API.

Today there was a user requesting this on sourceforge. I'll also have 
time in the next days to work on this, but since you volunteered for it, 
I'll go for the iterative solver optimizations first.


> (4) is where I've spent and will spend most of my time. The Kernel
> Generator is now fully integrated for all the vector operations, all the
> matrix-vector operations (except rank1 updates) and most of the dense
> matrix operations (all but LU, FFT,Inplace triangular substitution).
> While the database is not populated yet, recent benchmarks suggest very
> good performance (Like CuBlas on GTX470, and 80% of the peak on R9
> 290x). I think it is necessary to push forward in this direction, and
> make ViennaCL 1.6 a BIG DATA BIG DATA BIG DATA BIG DATAperformance-based
> release.

I'll help with stripping the op_executor<> beast, so that everything 
interfaces the scheduler directly.

Philippe, did you by chance check the impact of the generator 
integration on kernel latency? We only have a 1-10us margin to work 
with, which I haven't checked yet.


> I've been very motivated to work on the kernel generator recently, and
> simply don't feel like working on (1) or (2) at the moment. Now, there
> are two different options, for (4):
> 4.1 - Implementing the kernel fusion mechanism inside the scheduler.
> 4.2 - Input-dependent kernels, and performance prediction.
>
> While I could help with 4.1, I don't feel like I could do this task
> alone, because I don't have a sufficient knowledge of the backend. Plus,
> it implies to get rid of op_executor(), and I'm not sure how I could do
> this, too!
> I feel operational, though, for 4.2. I feel like ViennaCL 1.6 should be
> a performance-oriented release, and having an (input+device)-dependent
> kernel selection mechanism is something we have to do!

I think we should not go for 4.1 with a 1.6.0 release, simply because it 
would delay the release cycle. We should provide features to our users 
fairly quickly after they are stabilized, not have them hanging around 
in the developer repository for too long. We have enough features for 
1.6.0 already ;-)

Some work from your side on 4.2 would be good, so if you have some 
resources left, please focus on that.


> Any thoughts on how the roadmap could/should be rearranged?

Does the one linked above sound more reasonable? ;-)

Best regards,
Karli


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to