Hi You may want to hold out for a more authoritative response from someone else, but I have noticed and write my code assuming that
- func() will launch the kernel and return (almost) immediately - attempts to access gpuarrays involved in a launched kernel will block until launched kernel has completed - pycuda.driver.Context.synchronize can be called to explicitly wait for kernel launch to complete (which is useful if you have two kernels operating on same data, as they could otherwise run simultaneously) cheers Marmaduke On Fri, Jul 6, 2012 at 11:39 AM, Orestis K <[email protected]> wrote: > Hello everyone! > > I'm new to PyCUDA and GPU programming however initial experiences have > been very pleasant. I started out by some simple task and it seems blazing > faster than running on a CPU. However, I would like to confirm that it's > indeed as fast as it seems. > > My main question is whether after 'func' is called and access of the > prompt is regained, are there still any of the tasks running on the GPU? If > so, is there a way to block from performing the next tasks until it has > finished? > > I've posted the code below for reference purposes. You can change the > value of N so that it's faster. I set it very close to the limit so that I > might witness a delay on returning control of the command prompt. > > Thank you in advance and please keep up the excellent work! > -Orestis > > ================================================================= > import pycuda.driver as cuda > import pycuda.autoinit > from pycuda.compiler import SourceModule > import pycuda.gpuarray as gpuarray > > import sys, numpy, random, string > > # create random input data > N = 33500000 > buf = ''.join(random.choice(string.ascii_uppercase + > string.ascii_lowercase + string.digits) for x in xrange(N)) > > > mod = SourceModule(""" > __global__ void get_words(int N, char *a,unsigned int *b) > { > int idx = blockIdx.x * blockDim.x + threadIdx.x; > if ( idx <N-3) > { > b[idx] = (a[idx] << 24) + (a[idx+3]); > } > } > """) > func = mod.get_function("get_words") > > # copy buffer to GPU > bufArray = cuda.mem_alloc(N) > cuda.memcpy_htod(bufArray, buf) > > # create results array on GPU > resArray = gpuarray.to_gpu(numpy.zeros((N-3,1),dtype=numpy.int32)) > > # setup parameters and execute function > threadsPerBlock = 512 > blocksPerGrid = (N+threadsPerBlock-1)/threadsPerBlock > func(numpy.int32(len(buf)), bufArray, resArray, grid=(blocksPerGrid ,1), > block=(threadsPerBlock,1,1)) > > # get back results > a = numpy.zeros(N-3,1),dtype=numpy.int32) > b = resArray.get(a) > a = a.reshape(-1).tolist() > > _______________________________________________ > PyCUDA mailing list > [email protected] > http://lists.tiker.net/listinfo/pycuda > >
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
