Re: [PyOpenCL] transposition and zeropadding

Bogdan Opanchuk Fri, 15 Jan 2016 19:37:04 -0800

Hi Zac,

The main problem with fancy indexing is that the transfer to and from
global memory becomes inefficient if you are not accessing successive
memory elements from successive threads (somewhat simplified; read about
coalescing for more details). So you can easily implement something like
`a[:,1] = 0`, but you will have to remember that it may be slower than `b =
a.transpose(); b = [1,:]=0; a = b.transpose()`. Same applies to random
access indexing like `b[a]` where `a` is an array.

Allocating memory for temporary arrays may be an issue too, because GPU
memory pools are not as large as typical RAM amounts, and there's no swap
file to help (although if you hit swap in numerical calculations you're
already doing something wrong).

On Sat, Jan 16, 2016 at 2:30 AM, Zac Diggum <[email protected]> wrote:

> Hi Bogdan,
>
> thank you for your suggestions. I must admit I'd rather stick with using
> high level functions coming with pyopencl or reikna. Writing my own
> opencl kernels is a little out of reach for me. I'll deal with this when
> I have more complex sub tasks to solve. That transposing thing of mine
> works reasonably well and is still faster than padding on the host.
> Newbie question: is it even possible that fancy indexing will work one
> day on GPUs?
>
> Thanks again...
>
> _______________________________________________
> PyOpenCL mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pyopencl
>

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Re: [PyOpenCL] transposition and zeropadding

Reply via email to