Jerome Kieffer <jerome.kief...@esrf.fr> writes:

> On Fri, 20 Apr 2018 11:17:15 -0500
> Andreas Kloeckner <li...@informa.tiker.net> wrote:
>
>> I have (at one point) verified that this does work. In order for
>> overlapped transfers to actually happen, you need to allocate the
>> host-side end of the transfer with ALLOC_HOST_PTR (or some such--I don't
>> remember precisely)--the same as 'page-locked' memory in CUDA.
>
> Yes, this is what Vincent noticed. We are still working on it.
> My question was also about all processing/io appearing in the same
> queue while submitted in different ones. If it actually occures like
> this it is a bug according to me (unless the profiler enforces only one
> queue ??)

Have you raised this question on the Nvidia forums? If you learn more, I
would be grateful to learn what you hear.

>> Another (mostly speculative--but interesting) option might be to go
>> through the (experimental!) CUDA backend for pocl--that goes through
>> the CUDA API, and, as a result, restores the ability to profile.
>
> Thanks for the hint, I re-compiled pocl with cuda support (it backports
> smoothly on debian9) and it works.
>
> The nvprof is now usable for profiling kernel using POCL while it sees
> none when using the Nvidia opencl driver.
>
> This is an information which is worth sharing with the other OpenCL 
> developers.

I try:

https://andreask.cs.illinois.edu/the-state-of-opencl-for-scientific-computing-in-2018/

:)

Andreas

_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net
https://lists.tiker.net/listinfo/pyopencl

Reply via email to