Re: [PyCUDA] index multiple blocks and grids

Lev Givon Thu, 24 Mar 2011 15:14:39 -0700

Received from Mike Tischler on Thu, Mar 24, 2011 at 03:41:30PM EDT:
> Hi,
> I'm new to CUDA and PyCUDA, and have having a problem indexing multiple 
> grids.  
> I'm using an older CUDA enabled card (Quadro FX 1700) before I begin writing 
> for 
> a larger GPU.  I've been trying to understand the relationship between 
> threads, 
> blocks, and grids in the context of my individual card.  To do so, I've set 
> up a 
> simple script.


(snip)

> However, what if I have an array that's 1024 in length?  If I understand the 
> documentation correctly, block=(16,16,1) is the max value (256 threads) 
> allowed 
> for my hardware, which means I have to increase the number of grids.  If I 
> change the parameters of my script to:
> 
>      z1 = numpy.zeros((1024)).astype(numpy.float32)
>      kernel1(drv.Out(z1),block=(16,16,1),grid=(2,2))
> 
> How do I correctly index the array locations in my kernel function given 
> multiple grids (z1[???]=???) ?  There is a gridDim property, but not gridIdx 
> property, like with threads and blocks. 
> 
> 
> Thanks!
> Mike

threadIdx identifies the thread in a single block. To access a 1D
array of 1024 elements assuming a maximum of 256 threads per block,
you can combine the values in threadIdx and blockIdx, e.g.,

int idx = blockIdx.x*blockDim.x + threadIdx.x;

and launch the kernel with a thread block with dimensions (256, 1, 1) and a
grid with dimensions (4, 1). See Chapter 2 of the CUDA Programming
Guide for more info.

                                                        L.G.

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] index multiple blocks and grids

Reply via email to