Consider the following code:

    import numpy
    import pyopencl as cl
    import pyopencl.array as array

    def to_device(ctx, queue, arr):
        size = arr.size * arr.dtype.itemsize
        buf = cl.Buffer(ctx, cl.mem_flags.READ_WRITE, size=size)
        arr_dev = array.Array(queue, arr.shape, arr.dtype, data=buf)
        arr_dev.set(arr, queue=queue, async_=True)
        return arr_dev

    ctx = cl.create_some_context()
    queue = cl.CommandQueue(ctx)

    a = numpy.empty(1024 * 4000, numpy.uint64)
    ad = to_device(ctx, queue, a)

    b = numpy.empty(1024 * 4000, numpy.int32)
    bd = to_device(ctx, queue, b)

    c = numpy.empty(1024 * 32, numpy.int32)
    cd = to_device(ctx, queue, c)

    # queue.finish() # uncommenting this line fixes the problem

    ad.get()

When I run it on Linux, Tesla P100 and using CUDA as the OpenCL platform,
most of the time (not always) the execution hangs on the last line. Does
anyone have any ideas about what might be happening here?

(If I just use array.to_device(), the problem disappears. This is an
extract from a larger code, where a separate creation of the buffer is
necessary)
_______________________________________________
PyOpenCL mailing list -- pyopencl@tiker.net
To unsubscribe send an email to pyopencl-le...@tiker.net

Reply via email to