Dear all, As some of you may have noticed, Nvidia dropped the capability to profile OpenCL code since Cuda8. I am looking into the profiling info available in PyOpenCL's events if it would be possible to re-gernetate this file.
Did anybody look into this ? It would prevent me from re-inventing the wheel. I found some "oddities" while trying to profile mulit-queue processing. I collected ~100 events, evenly distributed in 5 queues. Every single event has a different command queue (as obtained from event.command_queue) but they all point to the same object at the C-level according to their event.command_queue.int_ptr. This would be consistent with the fact that using multiple queues works exactly at the same speed as using only one :( Did anybody manage to (actually) interleave sending buffers, retrieving buffers and calculation on the GPU with PyOpenCL ? Thanks for you help -- Jérôme Kieffer _______________________________________________ PyOpenCL mailing list PyOpenCL@tiker.net https://lists.tiker.net/listinfo/pyopencl