Hi everybody,
I have definitely used parallel send,retreive and GPU calculations with
PyOpenCL on Nvidia devices using multiple queues, although I have done
the profiling via wrapping the pyopenCL events on the python host an
enriching timing information with different queues. It does work with
image objects as far as I know, if you are interested in a MWE, please
let me know.
Regards
Jonathan
On 04/24/2018 09:54 PM, pyopencl-requ...@tiker.net wrote:
Send PyOpenCL mailing list submissions to
pyopencl@tiker.net
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.tiker.net/listinfo/pyopencl
or, via email, send a message with subject or body 'help' to
pyopencl-requ...@tiker.net
You can reach the person managing the list at
pyopencl-ow...@tiker.net
When replying, please edit your Subject line so it is more specific
than "Re: Contents of PyOpenCL digest..."
Today's Topics:
1. Profiling events in PyOpenCL (Jerome Kieffer)
2. Re: Profiling events in PyOpenCL (Andreas Kloeckner)
3. Re: Profiling events in PyOpenCL (Vincent Favre-Nicolin)
4. Re: Profiling events in PyOpenCL (Jerome Kieffer)
5. Re: Profiling events in PyOpenCL (Andreas Kloeckner)
6. Re: Profiling events in PyOpenCL (Jerome Kieffer)
----------------------------------------------------------------------
Message: 1
Date: Fri, 20 Apr 2018 17:26:15 +0200
From: Jerome Kieffer <jerome.kief...@esrf.fr>
To: pyopencl@tiker.net
Subject: [PyOpenCL] Profiling events in PyOpenCL
Message-ID: <20180420172615.6b107...@lintaillefer.esrf.fr>
Content-Type: text/plain; charset=UTF-8
Dear all,
As some of you may have noticed, Nvidia dropped the capability to
profile OpenCL code since Cuda8. I am looking into the profiling info
available in PyOpenCL's events if it would be possible to re-gernetate
this file.
Did anybody look into this ? It would prevent me from re-inventing the wheel.
I found some "oddities" while trying to profile mulit-queue processing.
I collected ~100 events, evenly distributed in 5 queues.
Every single event has a different command queue (as obtained from
event.command_queue) but they all point to the same object at the
C-level according to their event.command_queue.int_ptr.
This would be consistent with the fact that using multiple queues works
exactly at the same speed as using only one :(
Did anybody manage to (actually) interleave sending buffers, retrieving
buffers and calculation on the GPU with PyOpenCL ?
Thanks for you help
_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net
https://lists.tiker.net/listinfo/pyopencl