Hey,
Philippe, did you by chance check the impact of the generator
integration on kernel latency? We only have a 1-10us margin to work
with, which I haven't checked yet.
Don't worry for the overhead. It used to be fine. I'll re-check to see
whether everything is still fine,
Hi,
I'd like to add something, to point out that input-dependent kernels are
pointless without kernel caching (both would use an environment variable
and the filesystem). Indeed, each program will contain multiple versions of
a given operations, which can make the compilation time very long if
Hello,
Watching at the roadmap:
https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap
I was concerned with 4 elements:
(1) Hook in external BLAS libraries and use them as a computing backend
(2) Distributed vectors and matrices (multiple devices, possibly mixed
CUDA/OpenCL/OpenMP
(3)
Hi Philippe,
Watching at the roadmap:
https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap
argl, I forgot to update this after our IRC meeting. The protocol here
defines features for 1.6.0 which are far more reasonable:
Hi,
2014-07-08 20:59 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at:
Hi Philippe,
Watching at the roadmap:
https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap
argl, I forgot to update this after our IRC meeting. The protocol here
defines features for 1.6.0 which are far more