I have a weird problem when using the visual profiler: for about two seconds my program works fine, but after that the kernel launches become extremly slow (total running time goes up over hundredfold). I made a small signal handler that reacts to SIGUSR1, and saw that while the program does move on slowly, it's always busy waiting at func._launch_kernel of driver.py. I then tried to decrease the number of loops my simulation does to decrease the total run time to that crucial two seconds for testing purposes, but the profiler runs the program multiple times and on second time it's as slow right from the first iteration.
I also tried running the program with CUDA_PROFILE=1, and everything works just fine, runtime being roughly doubled compared to running without any profiling. Trying to use nvprof (that the visual profiler uses underneath, IIUC) just gives "Warning: Application received signal 139". Have you used the visual profiler or nvprof succesfully? Or noticed similar behaviour? In case it matters, I'm running the program on a remote headless server with ssh -X, and using cuda 5.0. -- Tomi Pieviläinen, +358 400 487 504 A: Because it disrupts the natural way of thinking. Q: Why is top posting frowned upon? _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
