Bogdan Opanchuk <bog...@opanchuk.net> writes:

> Hello all,
>
> I have an old macbook that has a discrete GeForce on it, and have run into
> the following problem. The simplified example is here:
> https://gist.github.com/fjarri/9aff0474868e2faf438f7e8229d194ec
>
> Basically, what I'm trying to do:
> - create a two-device context
> - create a buffer
> - split it into two subregions to use on each device
> - run a kernel on each device in parallel working with the corresponding
> subregion
> - get the result back on the host
> (the expected result is [0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7])
>
> First, it turned out that if the context includes an nVidia card, the
> Buffer must be necessarily created with the cl.mem_flags.ALLOC_HOST_PTR
> flag, otherwise if one uses its subregion in a kernel, the program crashes.
> If the context is created on a CPU + Iris Pro (the other two devices
> available), everything works fine without this flag, giving the expected
> result.
>
> After fixing that, the program finishes without crashing when run on a CPU
> + GeForce or Iris Pro + GeForce context, but the result is [0 1 2 3 4 5 6 7
> 0 0 0 0 0 0 0 0] - that is, the second kernel (on the GeForce device)
> either did not run, or its changes to the subregion were not incorporated
> into the whole buffer. Uncommenting the explicit migration in the end does
> not help either. Does anyone know what I'm missing here? Or is it an
> nVidia/Apple bug?

I feel like maybe the context should be created to encompass the
subregions? I've never used these features much, so that's a bit of a
shot in the dark.

HTH,
Andreas

Attachment: signature.asc
Description: PGP signature

_______________________________________________
PyOpenCL mailing list -- pyopencl@tiker.net
To unsubscribe send an email to pyopencl-le...@tiker.net

Reply via email to