Hi Bogdan,
On Tue, 5 Jul 2011 19:17:52 +1000, Bogdan Opanchuk wrote:
I just bumped into a certain problem with copying numpy arrays to
GPU.
Consider the following code:
---
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
from pycuda.elementwise import ElementwiseKernel
import numpy
arr = numpy.random.randn(50, 50).astype(numpy.float32)
arr_tr = arr.transpose()
target = numpy.empty_like(arr)
arr_gpu = gpuarray.to_gpu(arr_tr)
target_gpu = gpuarray.to_gpu(target)
copy = ElementwiseKernel(
"float *x, float *y",
"x[i] = y[i]")
copy(target_gpu, arr_gpu)
print numpy.linalg.norm(target_gpu.get() - arr_gpu.get())
---
Here array 'arr' is transposed, copied to GPU and copied to
target_gpu
via simple elementwise kernel. Surprisingly, it turns out that there
is a non-zero difference between target_gpu and arr_gpu.
The explanation is that transpose() does not do actual transpose ---
it just swaps strides of 'arr'. Then gpuarray.to_gpu() copies linear
memory (which is unchanged) inside 'arr_tr' and its strides to
'arr_gpu'. If we now perform arr_gpu.get(), the result will be equal
to 'arr_tr', because the strides will be copied back too. But inside
the kernel we do not know about the strides, and copy supposedly
transposed data linearly. 'target_gpu' has strides of 'arr', not
'arr_tr', so when copied back to CPU, the same linear data is read in
a different way, which leads to norm() being non-zero.
Now I must say that this approach to copying is logical in its own
way
--- but if we copy strides to GPUArray, one will have to pass them to
kernels every time; plus many kernels are optimized based on the
assumption of 'normal' layout of data. So I think it would be more
convenient if to_gpu() performed flatten() (or something like this)
before copying. Otherwise some warning in the documentation is
necessary.
Ok, we should introduce a warning when to_gpu'ing arrays that are not
in C order. And probably also add a function
gpuarray.i_know_about_strides() to turn that warning off.
In addition, I guess it's an issue that ElementwiseKernel happily
copies between arrays with different strides. I guess we could warn
about that, too, although it's a bit harder to tell where legitimate use
ends and abuse starts.
Can you make a patch for this?
Andreas
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda