Hello !

I have finally started my work at the crossroads between Machine Learning
and HPC. This is an excellent example of how PyViennaCL and ViennaCL can
interact.


Goal
--------------------
We want to execute a routine (GEMM, GEMV, DOT, FFT, etc...) on some
hardware and a set of inputs .
For now, the auto-tuner / generator optimizes only the routine with respect
to the hardware. I'm working on optimizing it as well with respect to the
properties of the inputs (in the case of GEMM : the three sizes involved).

Solution
--------------------
The idea is to run a large enough number of auto-tuning procedure and to
record the best profiles for different given inputs (different M, N, K for
GEMM). One can then do supervised learning to find the most suitable
profile to execute (i.e. kernel to generate) given new inputs, without
re-runing the auto-tuning procedure.

Experiments
--------------------
I have basically carried out about 30 carefully-chosen GEMM auto-tuning
procedures on Hawaii SGEMM. And I can tell that the size and the shape both
matter... Basically, if you use the wrong kernel, you might end up with
lower performance (up to 20-30%).

Anyway, 30 is an extremely small number considering that we are spanning
three dimensions. I obtain 13 different optimal kernels, and in many cases
the optimal kernel appears only once. Things can get better if we accept
different input to share the same optimum if it doesn't hurt the
performance too much (say, not more than 5%).
For now, the results I have obtained running an SVM-classifier seem to make
sense, but I think that we should have between 50 and 100 examples to make
it work properly. This is not very tractable as of now, but another part of
my research is to find a way to speed-up the auto-tuning procedure. Anyway,
this is altogether an interesting research direction which could
potentially lead to nice performance improvements (in average). I'm not
sure whether or not SVM is the most appropriate classifier for this, but it
is what seems to make most sense for me in this particular case.

Disussion
-------------
There are to very distinct steps in that procedure, that I'll recall for
those who don't have a ML background:

-> The training step: the parameters of the classifier are found. This is
where all the auto-tuning procedures execute, and this is basically what
takes potentially forever (a couple of days, perhaps). That is, we don't
care about the over-head here. The point is that this is also a separate
routine, so there's absolutely no reason to write it in C++! Plus, the
whole Machine Learning community uses Python. That is, what we want to do
here is to provide a few wrappers into pyViennaCL generate a kernel for a
given profile. From that point, we can re-use the existing work of other
researchers to speed-up the auto-tuning procedure, and *train* some
classifier for input-dependent kernel generation. Once the classifier is
trained, we can export the model to a file (most ML libraries allow this).
We could ideally replace the vendor-specific model file by some header-only
C++ source code.

-> The prediction step: This is executed every-time a matrix-multiplication
is carried out. A prediction is made at run-time given the inputs, the
hardware and the model created during the training step. This triggers the
generation/compilation of a hardware/input-specific kernel for optimal
performance. I'm not afraid of the prediction overhead if we use C++.
(since input is only 3-dimensional.)

This is imho a perfect example of how pyViennaCL could be used to increase
the productivity on the core.

@Toby: Do you think that it would be possible to provide a wrapper for the
class inside viennacl/generator/generate.hpp ? I would love to do it, but I
don't know a lot about how python wrappers are done...

Philippe
------------------------------------------------------------------------------
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to