Hey,

>     Philippe, did you by chance check the impact of the generator
>     integration on kernel latency? We only have a 1-10us margin to work
>     with, which I haven't checked yet.
>
>
>
> Don't worry for the overhead. It used to be fine. I'll re-check to see
> whether everything is still fine, but when the program-name and the
> kernel name prefix is known in advance (ie for the pre-compiled
> programs), I don't see where a significant overhead could come from!
> I'll benchmark this ASAP, once some other modifications are done.

The overhead could come from too many indirections in memory accesses, 
i.e. if too many lookups in maps and string comparisons are involved. 
Since you know the implementation better than me and don't think this is 
an issue, it should be fine. Either way, it needs to be checked, as it 
is such a fundamental quantity for the onset of scaling behavior of 
almost all 'higher-level' algorithms.


>         I've been very motivated to work on the kernel generator
>         recently, and
>         simply don't feel like working on (1) or (2) at the moment. Now,
>         there
>         are two different options, for (4):
>         4.1 - Implementing the kernel fusion mechanism inside the scheduler.
>         4.2 - Input-dependent kernels, and performance prediction.
>
>         While I could help with 4.1, I don't feel like I could do this task
>         alone, because I don't have a sufficient knowledge of the
>         backend. Plus,
>         it implies to get rid of op_executor(), and I'm not sure how I
>         could do
>         this, too!
>         I feel operational, though, for 4.2. I feel like ViennaCL 1.6
>         should be
>         a performance-oriented release, and having an
>         (input+device)-dependent
>         kernel selection mechanism is something we have to do!
>
>
>     I think we should not go for 4.1 with a 1.6.0 release, simply
>     because it would delay the release cycle. We should provide features
>     to our users fairly quickly after they are stabilized, not have them
>     hanging around in the developer repository for too long. We have
>     enough features for 1.6.0 already ;-)
>
>     Some work from your side on 4.2 would be good, so if you have some
>     resources left, please focus on that.
>
>
> Sure. 4.2 is part of my (future) PhD work, so I can't expect to have
> everything working flawlessly for ViennaCL 1.6.0.

As always, it's better to have a smaller set of reliable features in a 
release rather than a larger set of broken features ;-)



> But I feel like I
> should be able to create the backbone for this release.a simple
> environment-variable based mechanism that points to a folder where the f
> spitted out by the python auto-tuner. I'd like an environment-variable
> based extension, as they can be easily exploited by the advanced users
> in C++, and generalized by pyviennacl. (since python has a portable
> filesystem framework) !
>
> Here's my idea. We could have VIENNACL_MODELS_PATH pointing to a
> directory containing standardized device names (lower-case, spaces
> replaced by dashes). At runtime, we check if the environment variable is
> set and if we can open the corresponding file. If not, we fallback on
> the built-in, input-agnostic database.

This sounds to me much more like a researchers facility rather than 
something an average user wants to be exposed to. Keep in mind that 
whenever something needs to go through the file system, it is subject to 
additional problems: These can be permission problems, problems with 
blanks (or umlauts, etc.), random IO errors, or tricky problems in batch 
systems on supercomputers. Since I'm part of the PETSc developer team 
I've learned about so many problems on machines 'out there', where 
Murphy's law is constantly in action. Can we focus on populating the 
built-in database for the 1.6.0 release instead? A standard-user with a 
standard-GPU should not have to worry about any tuning or stumble upon 
file system problems.


> The good point is that the auto-tuner can be integrated in pyviennacl's
> installation, since there is no other dependency!
>
> python configure.py --autotune
> python setup.py build;
> python setup.py install;
>
> Of course, --autotune can take some more options (activated for all the
> devices by default, but we can chose to auto-tune just one device, if
> needed.) I suggest, too, that it is activated by default and that some
> warning is done at the beginning of setup.py that explains what
> auto-tuning does, that it can lengthen the compilation time and how to
> deactivate it.

Regarding "python configure.py --autotune": If we are not super-careful 
about an efficient tuning process, we will pretty much inherit the 
problems of ATLAS, i.e. endless installations. I think it is a good 
feature to have, yes, but I think we need to develop some performance 
models and heuristics first before we can really provide this to our 
users. In essence, the autotuner should remain more important for us 
(and however wants to carry out some research in that direction) rather 
than for average users.

Just one more question on the interaction with the GUI: Is it still 
possible to iterate over all reasonable configurations within C++ so 
that we don't have to require Python for the benchmark GUI?

Best regards,
Karli (the hesitant...)



------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to