Hello Michiel, On Wed, Apr 4, 2012 at 8:39 PM, Michiel Bruinink <[email protected]> wrote: > I don't think streams will do any good, because I have seen that the memcpy > time is a small part of the total time and it is the same for nvcc and > pyCuda.
Streams can be used for kernels too, not only for operations with memory. But I agree, from your explanations it seems that streams are not the issue here. > The larger pyCuda execution time is pure calculation time. > In fact, when I comment out a section of the device code, the nvcc and > pyCuda times are almost equal. This sounds interesting, could you possibly quote this section here? Or, even better, construct two simple programs, in Python and in C, which reproduce this effect? _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
