Re: [PyCUDA] Why is cumath slower than an ElementwiseKernel? Is data copied back after each operation?

Andreas Klöckner Mon, 10 May 2010 00:25:05 -0700

On Freitag 07 Mai 2010, Ian Ozsvald wrote:
> Hi Andreas, great to have you back.
> 
> I should have said - the 100 items example was just so a visual
> comparison could be made to confirm that the results were equivalent.
> 
> Using 10,000,000 items in the list the timing *swaps* around:
> cumath took: 0.179667449951  seconds
> ElementwiseKernel took: 0.327829986572  seconds
> so whereas cumath was slower before, now it is faster. Up until about
> 100,000 items I observed the previous pattern, my bad for not
> extending the list.
> 
> I don't understand the above result so I guess I need to break out the
> profiler.


Well, your benchmark code compares apples to oranges. The cumath code
path contains one to-GPU transfer and also measures the time it takes
for the sin() CUDA code to be generated and compiled or pulled in from
disk--hence my warm-up recommendation. The elwise code path contains two
to-GPU transfers, a bunch of large numpy operations, and no compilation.
That the two yield different times doesn't surprise me, but that as such
doesn't mean much. You should be timing a set of tens of applications of
a sine to a large array, and just those. Each way should be warmed up at
least once or twice, with no other operations between timer start and
timer stop.

> Could you confirm that my earlier assumptions are correct?
> 1) When a cumath operation is performed (e.g. cumath.sin()) the result
> isn't copied back from the GPU to the CPU

Correct.

> 2) If multiple cumath operations are applied in sequence to a piece of
> data then the data still stays on the GPU (i.e. it doesn't have to be
> copied back to the CPU then back to the GPU to apply each subsequent
> cumath operation)

Correct.

> 3) The act of applying .get() (e.g. "print sinop" in the code) to a
> gpuarray is the only thing in the example below that causes the GPU
> memory to be copied back to the CPU


Correct.

> The above is what I understand from looking at the code but some of it
> is a bit beyond me, I just want to confirm that I understand what's
> happening behind the scenes with the auto-generated code from Python.

If the code is unclear, I'd be happy to take patches that improve it.

Andreas

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Why is cumath slower than an ElementwiseKernel? Is data copied back after each operation?

Reply via email to