Hello Michiel,

On Wed, Apr 4, 2012 at 8:39 PM, Michiel Bruinink
<[email protected]> wrote:
> I don't think streams will do any good, because I have seen that the memcpy
> time is a small part of the total time and it is the same for nvcc and
> pyCuda.

Streams can be used for kernels too, not only for operations with
memory. But I agree, from your explanations it seems that streams are
not the issue here.

> The larger pyCuda execution time is pure calculation time.
> In fact, when I comment out a section of the device code, the nvcc and
> pyCuda times are almost equal.

This sounds interesting, could you possibly quote this section here?
Or, even better, construct two simple programs, in Python and in C,
which reproduce this effect?

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to