Jerome Kieffer <jerome.kief...@esrf.fr> writes:
> LogicError: clEnqueueFillBuffer failed: INVALID_OPERATION
>
> The same "bug" occurs in the PoCL driver when addressing nvidia GPU,
> since the corresponding low-level primitive is absent in NVVM.
>
> I wonder if we should best address this issue within our code or it
> could be addressed at a higher level. Getting from nvidia that they fix
> their code to conform for the specification is an illusion. But does it
> make sense to address this as part of pyopencl ?

Huh, that's not ideal. I don't have Fermi-gen hardware around any more,
so I didn't notice it. There exists a fallback path for the thing you
mention, and PyOpenCL tries to be careful about selecting it [0].
According to the CL spec, clEnqueueFillBuffer is unconditionally
available if the device advertises CL1.2. So this looks like an Nvidia
bug to me, but that realization likely won't buy us much, since I'm
pretty sure Nvidia isn't going to fix it. We *could* complicate the
fallback logic to mop up after Nvidia. I'd be open to reviewing a patch.

As for pocl, my PhD student Isuru recently submitted a PR that might
help [1] You could try pocl master to see if that makes things
better. As an added bonus, pocl master also contains significant
performance fixes for CUDA POCL [2], also due to Isuru.

Hope that helps,
Andreas

[0] 
https://github.com/inducer/pyopencl/blob/fc5847239728d67cad157c1bbe50431e331553be/pyopencl/array.py#L1231-L1240
[1]  https://github.com/pocl/pocl/pull/834
[2] https://github.com/pocl/pocl/issues/830

Attachment: signature.asc
Description: PGP signature

_______________________________________________
PyOpenCL mailing list -- pyopencl@tiker.net
To unsubscribe send an email to pyopencl-le...@tiker.net

Reply via email to