Consider the following code: import numpy import pyopencl as cl import pyopencl.array as array
def to_device(ctx, queue, arr): size = arr.size * arr.dtype.itemsize buf = cl.Buffer(ctx, cl.mem_flags.READ_WRITE, size=size) arr_dev = array.Array(queue, arr.shape, arr.dtype, data=buf) arr_dev.set(arr, queue=queue, async_=True) return arr_dev ctx = cl.create_some_context() queue = cl.CommandQueue(ctx) a = numpy.empty(1024 * 4000, numpy.uint64) ad = to_device(ctx, queue, a) b = numpy.empty(1024 * 4000, numpy.int32) bd = to_device(ctx, queue, b) c = numpy.empty(1024 * 32, numpy.int32) cd = to_device(ctx, queue, c) # queue.finish() # uncommenting this line fixes the problem ad.get() When I run it on Linux, Tesla P100 and using CUDA as the OpenCL platform, most of the time (not always) the execution hangs on the last line. Does anyone have any ideas about what might be happening here? (If I just use array.to_device(), the problem disappears. This is an extract from a larger code, where a separate creation of the buffer is necessary)
_______________________________________________ PyOpenCL mailing list -- pyopencl@tiker.net To unsubscribe send an email to pyopencl-le...@tiker.net