The visual profiler shows overlaping mem-copies and execution for the Working.py. You are probably staring at your computer so if you are in doubt, try it :D
(and this was one of my original questions ... how do you profile the code if the profiler is obviously broken?) -Magnus On Mon, Mar 21, 2011 at 8:04 PM, Andreas Kloeckner <li...@buster.tiker.net> wrote: > On Mon, 21 Mar 2011 19:55:31 +0100, Magnus Paulsson <paulsso...@gmail.com> > wrote: >> > Wild theory: Maybe the print statements introduce GPU synchronization? >> > Does your observation change with multiple loops through the code? >> > >> > Also note that the profiler won't help you debug overlap. If it is >> > active, all GPU activity is synchronous. >> > >> > Andreas >> >> No. None of the above. The "Working.py" code runs overlapping using >> the profiler including print statments. > > CUDA 4.0 programming guide, 3.2.5.1: > > "When an application is run via a CUDA debugger or profiler (cuda-gdb, CUDA > Visual Profiler, Parallel Nsight), all launches are synchronous." > > (and that sentence has been around for a few versions) > > Either you are or that sentence is wrong. :) > > Andreas > > -- ----------------------------------------------- Magnus Paulsson Assistant Professor School of Computer Science, Physics and Mathematics Linnaeus University Phone: +46-480-446308 Mobile: +46-70-6942987 _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda