Bogdan Opanchuk <bog...@opanchuk.net> writes: > Hello all, > > I have an old macbook that has a discrete GeForce on it, and have run into > the following problem. The simplified example is here: > https://gist.github.com/fjarri/9aff0474868e2faf438f7e8229d194ec > > Basically, what I'm trying to do: > - create a two-device context > - create a buffer > - split it into two subregions to use on each device > - run a kernel on each device in parallel working with the corresponding > subregion > - get the result back on the host > (the expected result is [0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7]) > > First, it turned out that if the context includes an nVidia card, the > Buffer must be necessarily created with the cl.mem_flags.ALLOC_HOST_PTR > flag, otherwise if one uses its subregion in a kernel, the program crashes. > If the context is created on a CPU + Iris Pro (the other two devices > available), everything works fine without this flag, giving the expected > result. > > After fixing that, the program finishes without crashing when run on a CPU > + GeForce or Iris Pro + GeForce context, but the result is [0 1 2 3 4 5 6 7 > 0 0 0 0 0 0 0 0] - that is, the second kernel (on the GeForce device) > either did not run, or its changes to the subregion were not incorporated > into the whole buffer. Uncommenting the explicit migration in the end does > not help either. Does anyone know what I'm missing here? Or is it an > nVidia/Apple bug?
I feel like maybe the context should be created to encompass the subregions? I've never used these features much, so that's a bit of a shot in the dark. HTH, Andreas
signature.asc
Description: PGP signature
_______________________________________________ PyOpenCL mailing list -- pyopencl@tiker.net To unsubscribe send an email to pyopencl-le...@tiker.net