Jerome Kieffer <jerome.kief...@esrf.fr> writes: > On Fri, 20 Apr 2018 11:17:15 -0500 > Andreas Kloeckner <li...@informa.tiker.net> wrote: > >> I have (at one point) verified that this does work. In order for >> overlapped transfers to actually happen, you need to allocate the >> host-side end of the transfer with ALLOC_HOST_PTR (or some such--I don't >> remember precisely)--the same as 'page-locked' memory in CUDA. > > Yes, this is what Vincent noticed. We are still working on it. > My question was also about all processing/io appearing in the same > queue while submitted in different ones. If it actually occures like > this it is a bug according to me (unless the profiler enforces only one > queue ??)
Have you raised this question on the Nvidia forums? If you learn more, I would be grateful to learn what you hear. >> Another (mostly speculative--but interesting) option might be to go >> through the (experimental!) CUDA backend for pocl--that goes through >> the CUDA API, and, as a result, restores the ability to profile. > > Thanks for the hint, I re-compiled pocl with cuda support (it backports > smoothly on debian9) and it works. > > The nvprof is now usable for profiling kernel using POCL while it sees > none when using the Nvidia opencl driver. > > This is an information which is worth sharing with the other OpenCL > developers. I try: https://andreask.cs.illinois.edu/the-state-of-opencl-for-scientific-computing-in-2018/ :) Andreas _______________________________________________ PyOpenCL mailing list PyOpenCL@tiker.net https://lists.tiker.net/listinfo/pyopencl