Hi,



2014-07-08 20:59 GMT+02:00 Karl Rupp <r...@iue.tuwien.ac.at>:

> Hi Philippe,
>
>
> > Watching at the roadmap:
>
>> https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap
>>
>
> argl, I forgot to update this after our IRC meeting. The protocol here
> defines features for 1.6.0 which are far more reasonable:
>
> https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Developer-Meetings
>
>
>
>
>  I was concerned with 4 elements:
>> (1) Hook in external BLAS libraries and use them as a computing backend
>> (2) Distributed vectors and matrices (multiple devices, possibly mixed
>> CUDA/OpenCL/OpenMP
>> (3) Support for reductions (vector-reduction, row-wise reduction,
>> col-wise reduction). Naive OpenMP/CUDA implementation, but integrated in
>> the kernel generator for OpenCL.
>> (4) Full integration of the micro-scheduler and the generator.
>>
>> Needless to say that this seems overly ambitious!
>> I had done a prototype for (1), but realized quickly that it would be
>> pretty complicated to make it stable and robust with respect to devices,
>> context, etc. Plus, the generator now gives the same (DENSE!)
>> performance as CuBlas on NVidia GPUs (for Fermi, at least), and
>> clAmdBlas on AMD GPUs. Linking could allow us to have very good
>> performance on OPENMP/CUDA, as well as Sparse Linear algebra on OpenCL.
>> This is interesting, but it is also a good amount of work!
>>
>
> We postponed that and instead agreed to focus on the full scheduler
> integration.
>
>
>
>
>  (2) Will also require a huge amount of work. Plus, I think it is
>> dangerous to do that when we're not even sure of how we handle ViennaCL
>> on a single device (considering input-dependent kernels, for example).
>> I'd say we should postpone this
>>
>
> Certainly postpone this. Today I got notice that we will have funding for
> a PhD student working on this. It's still hard to find a good candidate,
> but at least we have the funding now ;-)
>
>
>
>  I'll do (3). It's not a lot of work and the kernel generator already
>> supports it. We just need to add an API.
>>
>
> Today there was a user requesting this on sourceforge. I'll also have time
> in the next days to work on this, but since you volunteered for it, I'll go
> for the iterative solver optimizations first.
>
>
>
>  (4) is where I've spent and will spend most of my time. The Kernel
>> Generator is now fully integrated for all the vector operations, all the
>> matrix-vector operations (except rank1 updates) and most of the dense
>> matrix operations (all but LU, FFT,Inplace triangular substitution).
>> While the database is not populated yet, recent benchmarks suggest very
>> good performance (Like CuBlas on GTX470, and 80% of the peak on R9
>> 290x). I think it is necessary to push forward in this direction, and
>> make ViennaCL 1.6 a BIG DATA BIG DATA BIG DATA BIG DATAperformance-based
>> release.
>>
>
> I'll help with stripping the op_executor<> beast, so that everything
> interfaces the scheduler directly.
>
> Philippe, did you by chance check the impact of the generator integration
> on kernel latency? We only have a 1-10us margin to work with, which I
> haven't checked yet.
>
>
>
Don't worry for the overhead. It used to be fine. I'll re-check to see
whether everything is still fine, but when the program-name and the kernel
name prefix is known in advance (ie for the pre-compiled programs), I don't
see where a significant overhead could come from! I'll benchmark this ASAP,
once some other modifications are done.


>
>  I've been very motivated to work on the kernel generator recently, and
>> simply don't feel like working on (1) or (2) at the moment. Now, there
>> are two different options, for (4):
>> 4.1 - Implementing the kernel fusion mechanism inside the scheduler.
>> 4.2 - Input-dependent kernels, and performance prediction.
>>
>> While I could help with 4.1, I don't feel like I could do this task
>> alone, because I don't have a sufficient knowledge of the backend. Plus,
>> it implies to get rid of op_executor(), and I'm not sure how I could do
>> this, too!
>> I feel operational, though, for 4.2. I feel like ViennaCL 1.6 should be
>> a performance-oriented release, and having an (input+device)-dependent
>> kernel selection mechanism is something we have to do!
>>
>
> I think we should not go for 4.1 with a 1.6.0 release, simply because it
> would delay the release cycle. We should provide features to our users
> fairly quickly after they are stabilized, not have them hanging around in
> the developer repository for too long. We have enough features for 1.6.0
> already ;-)
>
> Some work from your side on 4.2 would be good, so if you have some
> resources left, please focus on that.
>
>
Sure. 4.2 is part of my (future) PhD work, so I can't expect to have
everything working flawlessly for ViennaCL 1.6.0. But I feel like I should
be able to create the backbone for this release.a simple
environment-variable based mechanism that points to a folder where the f
spitted out by the python auto-tuner. I'd like an environment-variable
based extension, as they can be easily exploited by the advanced users in
C++, and generalized by pyviennacl. (since python has a portable filesystem
framework) !

Here's my idea. We could have VIENNACL_MODELS_PATH pointing to a directory
containing standardized device names (lower-case, spaces replaced by
dashes). At runtime, we check if the environment variable is set and if we
can open the corresponding file. If not, we fallback on the built-in,
input-agnostic database.

The good point is that the auto-tuner can be integrated in pyviennacl's
installation, since there is no other dependency!

python configure.py --autotune
python setup.py build;
python setup.py install;

Of course, --autotune can take some more options (activated for all the
devices by default, but we can chose to auto-tune just one device, if
needed.) I suggest, too, that it is activated by default and that some
warning is done at the beginning of setup.py that explains what auto-tuning
does, that it can lengthen the compilation time and how to deactivate it.

Philippe




>
>  Any thoughts on how the roadmap could/should be rearranged?
>>
>
> Does the one linked above sound more reasonable? ;-)
>

yes, it does!

>
> Best regards,
> Karli
>
>
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to