On Thu, 7 Jul 2011 17:38:25 -0700, Eli Stevens (Gmail) wrote:
We're getting an invalid command queue error from the following code.
I've tried to reduce it as much as possible to the minimal test case,
and I think that this is as small as it gets.

We (my coworker and I) are both running on Macbook Pro, OSX 10.7, with
2011.1 or 2011.2.
<pyopencl.Device 'GeForce GT 330M' at 0x1022600>
<pyopencl.Context at 0x103b00bf8 on <pyopencl.Device 'GeForce GT 330M'
at 0x1022600>>

The thing that's stumping us is that if we make some very minor
changes (like "change all of the uints to ints") then it stops
crashing.  The things that we've seen stop the error are:

- replace all uint with int.
- remove any of the three levels of for loop; replace with a line like
int v=get_global_id(1); outside the loops.
- remove the while loop.
- hard code max_index to 128.

I'm sure that some of the "change this and the error goes away" items
are the result of things being optimized away, but it's not clear
exactly which those are.

Sadly, none of the things we've been trying are applicable to our
actual code (or they don't resolve the issue).

Any suggestions for trying to figure out what's going on?  We are
quite stumped.  If there's a better list or forum to use, we'd be
happy to move the discussion there (the dev forums for the OSX Lion
beta seem pretty dead; not sure where else is a good place to go for
help).

Works without a hitch with recent Nv drivers (275.09) on Linux. (attached version, slightly changed) Looks like an Apple bug. No idea as to fixes, sorry.

Andreas
KNL = """
__kernel void
crashing(
    __global float *data,
    __global const uint *data_shape
) {
    const uint max_index = data_shape[0];
    const uint max_row = data_shape[1];
    const uint max_col = data_shape[2];

    int4 myVector;

    int max_u, max_v, max_slab;
    max_slab = max_index;
    max_u = max_row;
    max_v = max_col;

    for (int slab=0; slab < max_slab; slab++) {
        for (int u=get_global_id(0); u < max_u; u += get_global_size(0)) {
            for (int v=get_global_id(1); v < max_v; v += get_global_size(1)) {
                myVector = (int4)(slab, 0, 0, 0);

                int counter = 0;

                while (counter < 1) {
                    if (myVector.s0 < max_index) {
                        counter = 999;
                    }
                    counter++;
                }
            }
        }
    }
}
"""

import os

import pyopencl as cl
mf = cl.mem_flags
import numpy

ctx = cl.create_some_context()
max_work_item_sizes = ctx.devices[0].max_work_item_sizes
queue = cl.CommandQueue(ctx)

code = KNL

prg = cl.Program(ctx, code).build()

data_farray = numpy.zeros((128, 128, 128), dtype=numpy.float32)
data_buf = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR |
        mf.ALLOC_HOST_PTR, hostbuf=data_farray)
data_shape_ndary = numpy.array(data_farray.shape, dtype=numpy.uint32)
data_shape_buf = cl.Buffer(ctx, mf.READ_ONLY |
        mf.COPY_HOST_PTR | mf.ALLOC_HOST_PTR, hostbuf=data_shape_ndary)

globalsize_tup = (1,1) # XXX DEBUG
localsize_tup = None

#evt =
prg.crashing(queue, globalsize_tup, localsize_tup,
        data_buf,
        data_shape_buf,
        ).wait()
cl.enqueue_read_buffer(queue, data_buf, data_farray).wait()

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to