On Freitag 07 Mai 2010, Ian Ozsvald wrote: > Hi Andreas, great to have you back. > > I should have said - the 100 items example was just so a visual > comparison could be made to confirm that the results were equivalent. > > Using 10,000,000 items in the list the timing *swaps* around: > cumath took: 0.179667449951 seconds > ElementwiseKernel took: 0.327829986572 seconds > so whereas cumath was slower before, now it is faster. Up until about > 100,000 items I observed the previous pattern, my bad for not > extending the list. > > I don't understand the above result so I guess I need to break out the > profiler.
Well, your benchmark code compares apples to oranges. The cumath code path contains one to-GPU transfer and also measures the time it takes for the sin() CUDA code to be generated and compiled or pulled in from disk--hence my warm-up recommendation. The elwise code path contains two to-GPU transfers, a bunch of large numpy operations, and no compilation. That the two yield different times doesn't surprise me, but that as such doesn't mean much. You should be timing a set of tens of applications of a sine to a large array, and just those. Each way should be warmed up at least once or twice, with no other operations between timer start and timer stop. > Could you confirm that my earlier assumptions are correct? > 1) When a cumath operation is performed (e.g. cumath.sin()) the result > isn't copied back from the GPU to the CPU Correct. > 2) If multiple cumath operations are applied in sequence to a piece of > data then the data still stays on the GPU (i.e. it doesn't have to be > copied back to the CPU then back to the GPU to apply each subsequent > cumath operation) Correct. > 3) The act of applying .get() (e.g. "print sinop" in the code) to a > gpuarray is the only thing in the example below that causes the GPU > memory to be copied back to the CPU Correct. > The above is what I understand from looking at the code but some of it > is a bit beyond me, I just want to confirm that I understand what's > happening behind the scenes with the auto-generated code from Python. If the code is unclear, I'd be happy to take patches that improve it. Andreas
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda