2011/4/29 Andreas Kloeckner <[email protected]>:
> It's not clear to me that this is a fix, or rather, it's not clear that
> what you're seeing is a problem.
>
> The error you're getting tells you that you are trying to copy a
> non-contiguous array. Since GPU<->host transfers are defined as
> byte-for-byte, they depend on the memory layout of the host array, so it
> seems invalid to just do the copy and thereby change the memory layout
> behind the user's back. I'm more comfortable making the user responsible
> for handing us a contiguous array, everything else seems ill-defined.

I'm ok with that definition. I don't find it bad behaved to have the
transfer change the stride
on the gpu when needed, but I don't have problem requesting that this
is done explicitly.

What made me think that you wanted to be done automatically is that in
gpuarray.py in the function set(), there is those 2 lines:
        if not ary.flags.forc:
            ary = ary.copy()

There where placed in a way that make then never used, but if moved in
the function to_gpu() would work.

> I finally see what you're saying, although I don't think that the
> two-class solution would help much. Most everybody might not even check
> the type of what they're getting. A better solution might be to change
> the GPUArray.gpudata attribute to assert contiguity, and create new
> .noncontig_data attribute that does not assert contiguity. (We'll have
> to think of a better name.)

We can work directly on GPUArray if you prefer. In some past work we
did, we used extensively advanced feature of python to access data. We
moved away from this as it was very time consumming... I will try to
check the overhead of your solution. Do you see any reason why
removing gpudata(only when not contiguous) would be a problem? The
only reason is for a better error message.

What about gpunddata (not easy to read)? gpu_nd_data? The nd come from
the ndarray in numpy. I didn't is the nd used in pycuda.
gpu_strided_data(longer)? We need underscore to separate word if we
want to follow the pep8 coding style.


> Sure, something like this seems useful. One possibility might be to
> imitate the host-side array interface, rather than enforcing a common
> base type. On the other hand, what were your reasons for not using
> GPUArrays in Theano? Anything that could be fixed?

We need to be able to call CUBLAS and it was not possible at the time
with pycuda as by NVIDIA restriction(they are now lifted).
So we started our own with the feature that we needed: strides and
broadcasting. They appear in too many place in our code that we must
have the basic support them on the gpu. We did not have all gpu code
accept them, but in those case, we just recopy it contiguously before
calling the gpu code. Most of the time those are gpu code that are not
bottleneck, so we don't spend as much time optimizing them.

Otherwise we would have added strides and broadcast to pycuda or on
top of it at the start.

My suggestion will try to mimic as much as possible numpy.ndarray.

thanks

Fred

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to