First, the cl_khr_fp16 extension is supported and has been enabled on the device Intel GPU.
When I run the code as follows on the device 'Intel(R) HD Graphics', which uses 16-bit half-precision float instead of 32-bit float, it gets the error : loading directly from pointer to type 'const __global half' is not allowed. But the device 'Intel(R) HD Graphics' supports half data types for OpenCL. ''' Build on <pyopencl.Device 'Intel(R) HD Graphics' on 'Intel(R) OpenCL' at 0x1cd1a10>: 1:6:16: error: loading directly from pointer to type 'const __global half' is not allowed res_g[gid] = a_g[gid] + b_g[gid]; ^ (options: -I /usr/lib/python3/dist-packages/pyopencl/cl) ''' How can I fixed the code? ''' from future import absolute_import, print_function import numpy as np import pyopencl as cl a_np = np.random.rand(50000).astype(np.float16) b_np = np.random.rand(50000).astype(np.float16) ctx = cl.create_some_context() queue = cl.CommandQueue(ctx) # Create a command queue with your context mf = cl.mem_flags a_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a_np) b_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b_np) prg = cl.Program(ctx, """ __kernel void sum( __global const half *a_g, __global const half *b_g, __global half *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid]; } """).build() res_g = cl.Buffer(ctx, mf.WRITE_ONLY, a_np.nbytes) prg.sum(queue, a_np.shape, None, a_g, b_g, res_g) res_np = np.empty_like(a_np) cl.enqueue_copy(queue, res_np, res_g) print(res_np) print(res_np - (a_np + b_np)) print(np.linalg.norm(res_np - (a_np + b_np))) ''' Finally, how can it correct my code?
_______________________________________________ PyOpenCL mailing list PyOpenCL@tiker.net https://lists.tiker.net/listinfo/pyopencl