Hi hi hi,

 > Hey hey,

> (...)
> The process of enqueueing  the generator is extremely lightweight, there
> is no map involved. It does basically two things:
> - Parse the statement to retrieve some quantities (e.g. M,N,K in the
> case of GEMM)
> - Recursively enqueues the elements of the statement (matrix, vector,
> scalar, etc)
> When the program name is known in advance, there is no need to build the
> representation of the statement (which fills a char*), but even this
> should be fast enough. I remember having measured, some time ago, a
> total overhead < 10microseconds when building this representation. But
> I'll re-evaluate this ASAP.

Ok, this sounds indeed fairly light-weight.



>     This sounds to me much more like a researchers facility rather than
>     something an average user wants to be exposed to. Keep in mind that
>     whenever something needs to go through the file system, it is
>     subject to additional problems: These can be permission problems,
>     problems with blanks (or umlauts, etc.), random IO errors, or tricky
>     problems in batch systems on supercomputers. Since I'm part of the
>     PETSc developer team I've learned about so many problems on machines
>     'out there', where Murphy's law is constantly in action. Can we
>     focus on populating the built-in database for the 1.6.0 release
>     instead? A standard-user with a standard-GPU should not have to
>     worry about any tuning or stumble upon file system problems.
>
>
>
> I like to look it the other way around. Portable filesystem
> implementation is planned for C++17. Since, in 2014, we are still far
> away from using C++11, will we ever be able to rely on such C++17
> features ?

Well, first of all C++17 needs to be on time. :-P
Then, considering that current enterprise systems still use GCC 4.2 
today, which is ~8 years old, it's unlikely that we use C++17 any 
earlier than 2025.


> If we want ViennaCL to use input-dependent kernels (which is
> not reasonably doable without a model file) preferably before 2025,
> we'll have to deal with the lack of portable (!=boost) filesystem
> toolkit. Of course, the environment variables involved would be disabled
> by default.

Yeah, that's very annoying. On the other hand, we don't need all bells 
and whistles, so it may be worth relying on either a lightweight 
external library (!= Boost) or to implement the necessary layer ourselves.

> Similarly, it sounds ridiculous not to provide an optional
> caching mechanism because it would involve using the filesystem! I'm
> ready to bet that a lot of users would prefer significantly increased
> performance (kernels caching, input-dependent kernels) at the cost of
> optionally messing with the filesystem. I'm even sure that the python
> community would totally laugh at us if we didn't choose this option! In
> the worst case, if there is some filesystem problems, the environment
> variable can be unset and ViennaCL will only use the built-in database.

I'm not against optionally using the filesystem. For example, the 
optional kernel caching mechanism is great. Still, I want to make sure 
that we provide the best possible core and not rely too much on optional 
functionality just because it is convenient for us not to think about 
improving the core. A well populated device database is absolutely vital 
for such a healthy core.


>
>
>         The good point is that the auto-tuner can be integrated in
>         pyviennacl's
>         installation, since there is no other dependency!
>
>         python configure.py --autotune
>         python setup.py build;
>         python setup.py install;
>
>         Of course, --autotune can take some more options (activated for
>         all the
>         devices by default, but we can chose to auto-tune just one
>         device, if
>         needed.) I suggest, too, that it is activated by default and
>         that some
>         warning is done at the beginning of setup.py that explains what
>         auto-tuning does, that it can lengthen the compilation time and
>         how to
>         deactivate it.
>
>
>     Regarding "python configure.py --autotune": If we are not
>     super-careful about an efficient tuning process, we will pretty much
>     inherit the problems of ATLAS, i.e. endless installations. I think
>     it is a good feature to have, yes, but I think we need to develop
>     some performance models and heuristics first before we can really
>     provide this to our users.
>
>
> Hmm, in the worst case it should still be possible to do this the other
> way around:
> sudo python autotune.py --models-path ...
>
> You're right, perhaps we should, by default, disable the auto-tuner for
> the pyviennacl installation.

Definitely! Imagine that users install PyViennaCL over some packaging 
system: If the tuning process runs by default and takes too long, the 
build system might decide to just cancel the job.


> The auto-tuning process for square matrices
> is pretty short if we choose large square matrices only the right
> parameters space (~15 to 30 mins on most <5 years old desktop GPUs), but
> most pyviennacl users will.

I recall computer games: In a number of cases I had the *option* to run 
an autotuning session at first start to get a few extra frames per 
second, but the game runs still pretty well without it.


>     In essence, the autotuner should remain more important for us (and
>     however wants to carry out some research in that direction) rather
>     than for average users.
>
>
> Yep, but it can also allow us to collect more data if it's simple enough
> for the users to run it!

Valid point! I'm totally in favor of an easy autotuning process. It's 
just a matter of defaults: Some users will be pretty upset if we 
automatically run the autotuning process *and* send results back.


>     Just one more question on the interaction with the GUI: Is it still
>     possible to iterate over all reasonable configurations within C++ so
>     that we don't have to require Python for the benchmark GUI?
>
>
> Yes, it is possible, even though there is no API.
>
> for(p1 in values_p1)
>   for(p2 in values_p2)
>      ....
>
> device_specific::execute(some_template(some_parameters(p1,p2,...),
> some_options), some_statement, some_context)

Ok, this is fine.

Best regards,
Karli


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to