Re: [PyOpenCL] Additionnal memory transfert from host to device ?

Bonnel Mon, 08 Feb 2010 05:40:38 -0800

Hi Andreas,

I didn't know what settings you had in your temp_cl_profiler.conf file,so made a run with default settings.


$ python matrix-multiply.py
$ cat opencl_profile.log
# OPENCL_PROFILE_LOG_VERSION 1.0
# OPENCL_DEVICE 0 Tesla C1060
# TIMESTAMPFACTOR fcec360a5d451e4
method,gputime,cputime,occupancy
method=[ memcpyHtoDasync ] gputime=[ 3487.264 ] cputime=[ 4092.000 ]
method=[ memcpyHtoDasync ] gputime=[ 2722.880 ] cputime=[ 3283.000 ]

method=[ matrixMul ] gputime=[ 91482.430 ] cputime=[ 13.000 ]occupancy=[ 0.750 ]method=[ matrixMul ] gputime=[ 91467.359 ] cputime=[ 12.000 ]occupancy=[ 0.750 ]method=[ matrixMul ] gputime=[ 91516.477 ] cputime=[ 13.000 ]occupancy=[ 0.750 ]

method=[ memcpyDtoHasync ] gputime=[ 1119.680 ] cputime=[ 1893.000 ]
method=[ memcpyDtoHasync ] gputime=[ 2664.768 ] cputime=[ 3464.000 ]
method=[ memcpyDtoHasync ] gputime=[ 1117.952 ] cputime=[ 1823.000 ]
method=[ memcpyDtoHasync ] gputime=[ 2690.976 ] cputime=[ 3514.000 ]

The 3 last lines are the extra memory transfert. gputime and cputime arenot accurate because of other jobs running on the host computer at themoment. For instance gputime for the first, 2nd, 7th and last row shouldbe equal (512 Mb transfert).


My nvidia driver is 190.29 and the device is a tesla c1060.

Feel free to ask additionnal information.
Regards,
Nicolas


Andreas Klöckner a écrit :

Hi Nicolas,

On Dienstag 02 Februar 2010, Bonnel wrote:

I was just playing with the profiler from nvidia and I'm wondering why
all data from the graphic card are read back. I though memory was read
back only when using cl.enqueue_read_buffer. Here is the result I get
from the profiling of matrix-multiply.py :

method                        memory transfert size
memcpyHtoDasync      5.12e+06
memcpyHtoDasync      5.12e+06
memcpyDtoHasync      2.56e+06
memcpyDtoHasync      5.12e+06
memcpyDtoHasync      2.56e+06
memcpyDtoHasync      5.12e+06

As there is only one cl.enqueue_read_buffer call, there should be only
one memcpyDtoHasync call.


I recently had an informative conversation with someone on the Nvidia
driver team, and they indicated that CL may 'transparently' issue
transfers after kernel launches based on the flags with which the buffer
was created.

Now I'm faced with two problems. First, all the Nvidia profiler does for
me is crash. I've figured out that I can invoke it from the command line
by specifying

export OPENCL_PROFILE=1
export OPENCL_PROFILE_CONFIG='temp_cl_profiler.conf'

and then find data in "opencl_profile_0.log". However no matter what I
put in temp_cl_profiler.conf, I can't see the extra transfers you are
seeing. Can you grab and post the generated config file, perhaps by

import os; print open(os.environ["OPENCL_PROFILE_CONFIG"], "r").read()

That would be very helpful. (If you could generate a survey of what the
file can look like, that would of course help even more!)

As far as flags were concerned, COPY_HOST_PTR was a natural suspect, but
removing that didn't change the timings. It would really help if I could
observe the extra transfers.

Thanks for posting your observations!

Andreas



_______________________________________________
PyOpenCL mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pyopencl_tiker.net

Re: [PyOpenCL] Additionnal memory transfert from host to device ?

Reply via email to