trevor-m opened a new pull request #5857: URL: https://github.com/apache/incubator-tvm/pull/5857
Some fixes a few months ago to the `get_valid_counts` CUDA implementation broke OpenCL at the same time, because of the atomic add intrinsic which was added. This PR fixes `get_valid_counts` for OpenCL with the following changes: 1. Register intrinsic atomic add for OpenCL. 2. Override `intrinsic::tvm_address_of` to include storage scope. 3. Enable `cl_khr_global_int32_base_atomics`. This isn't required for [OpenCL 1.1+ because atomic_add became a core feature](https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_Ext.html#cl_khr_int32_atomics). I'm happy to remove this if we don't care about OpenCL 1.0. Alternatively we can override `op->call_type == CallNode::PureExtern` and set a flag to enable this only when `atomic_add` is actually used. Original error messages before this fix: 1. During compilation: `Unresolved intrinsic atomic_add with return type int32` 2. During runtime: ``` <source>:6922:43: error: casting '__global void *' to type 'int *' changes address space of pointer atomic_add_return[(0)] = atomic_add(((int *)get_valid_counts_v0 + 0), 1); ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org