Dear all,

As some of you may have noticed, Nvidia dropped the capability to
profile OpenCL code since Cuda8. I am looking into the profiling info
available in PyOpenCL's events if it would be possible to re-gernetate
this file.

Did anybody look into this ? It would prevent me from re-inventing the wheel.

I found some "oddities" while trying to profile mulit-queue processing.
I collected ~100 events, evenly distributed in 5 queues. 

Every single event has a different command queue (as obtained from
event.command_queue) but they all point to the same object at the
C-level according to their event.command_queue.int_ptr.

This would be consistent with the fact that using multiple queues works
exactly at the same speed as using only one :(

Did anybody manage to (actually) interleave sending buffers, retrieving
buffers and calculation on the GPU with PyOpenCL ?

Thanks for you help

-- 
Jérôme Kieffer

_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net
https://lists.tiker.net/listinfo/pyopencl

Reply via email to