On Wed, 24 Jun 2020 11:26:49 -0500 Andreas Kloeckner <li...@informa.tiker.net> wrote:
> Jerome Kieffer <jerome.kief...@esrf.fr> writes: > > LogicError: clEnqueueFillBuffer failed: INVALID_OPERATION > > > > The same "bug" occurs in the PoCL driver when addressing nvidia GPU, > > since the corresponding low-level primitive is absent in NVVM. > > > > I wonder if we should best address this issue within our code or it > > could be addressed at a higher level. Getting from nvidia that they fix > > their code to conform for the specification is an illusion. But does it > > make sense to address this as part of pyopencl ? > > Huh, that's not ideal. I agree. > I don't have Fermi-gen hardware around any more, > so I didn't notice it. We noticed recently many errors specific to Fermi cards (some of them related to textures as well) > There exists a fallback path for the thing you > mention, and PyOpenCL tries to be careful about selecting it [0]. > According to the CL spec, clEnqueueFillBuffer is unconditionally > available if the device advertises CL1.2. Interesting piece of information. On my computer I have a Maxell and a Fermi generation GPU, and indeed Fermi advertises only OpenCL 1.1, even if they share the same driver. Platform Name NVIDIA CUDA Number of devices 2 Device Name GeForce GTX 750 Ti Device Vendor NVIDIA Corporation Device Vendor ID 0x10de Device Version OpenCL 1.2 CUDA Driver Version 390.132 Device OpenCL C Version OpenCL C 1.2 [...] Device Name NVS 310 Device Vendor NVIDIA Corporation Device Vendor ID 0x10de Device Version OpenCL 1.1 CUDA Driver Version 390.132 Device OpenCL C Version OpenCL C 1.1 Device Type GPU > So this looks like an Nvidia > bug to me, but that realization likely won't buy us much, since I'm > pretty sure Nvidia isn't going to fix it. Indeed, nvidia dropped the support for Fermi last year (or so) > We *could* complicate the > fallback logic to mop up after Nvidia. I'd be open to reviewing a patch. So finally this comes down to check if the target device advertises OpenCL 1.1 or 1.2 and in the former case, first allocate the buffer and then run a kernel performing the memset. As your code already implements this, I guess this comes down to a bug in the detection of the OpenCL version: In [1]: import pyopencl In [2]: ctx_maxwell = pyopencl.create_some_context() Choose platform: [0] <pyopencl.Platform 'NVIDIA CUDA' at 0x1655690> [1] <pyopencl.Platform 'Intel(R) OpenCL' at 0x1613b60> Choice [0]: Choose device(s): [0] <pyopencl.Device 'GeForce GTX 750 Ti' on 'NVIDIA CUDA' at 0xdde4f0> [1] <pyopencl.Device 'NVS 310' on 'NVIDIA CUDA' at 0x15886b0> Choice, comma-separated [0]:1 Set the environment variable PYOPENCL_CTX=':1' to avoid being asked again. In [3]: queue = pyopencl.CommandQueue(ctx_maxwell) In [4]: queue._get_cl_version() Out[4]: (1, 2) But queue._get_cl_version() goes down to ctx._get_cl_version() which asks the version of the platform, not of the device, and here, they differ: ``` In [8]: device = ctx_maxwell.devices[0] In [9]: device._get_cl_version() Out[9]: (1, 1) In [10]: device.platform._get_cl_version() Out[10]: (1, 2) ``` I opened a PR on it. > As for pocl, my PhD student Isuru recently submitted a PR that might > help [1] You could try pocl master to see if that makes things > better. As an added bonus, pocl master also contains significant > performance fixes for CUDA POCL [2], also due to Isuru. That's great. Thanks to him. Cheers, Jerome _______________________________________________ PyOpenCL mailing list -- pyopencl@tiker.net To unsubscribe send an email to pyopencl-le...@tiker.net