trevor-m opened a new pull request #5857:
URL: https://github.com/apache/incubator-tvm/pull/5857


   Some fixes a few months ago to the `get_valid_counts` CUDA implementation 
broke OpenCL at the same time, because of the atomic add intrinsic which was 
added.
   
   This PR fixes `get_valid_counts` for OpenCL with the following changes:
   
   1. Register intrinsic atomic add for OpenCL. 
   2. Override `intrinsic::tvm_address_of` to include storage scope.
   3. Enable `cl_khr_global_int32_base_atomics`. This isn't required for 
[OpenCL 1.1+ because atomic_add became a core 
feature](https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_Ext.html#cl_khr_int32_atomics).
 I'm happy to remove this if we don't care about OpenCL 1.0. Alternatively we 
can override `op->call_type == CallNode::PureExtern` and set a flag to enable 
this only when `atomic_add` is actually used.
   
   
   Original error messages before this fix:
   
   1. During compilation: `Unresolved intrinsic atomic_add with return type 
int32`
   2. During runtime:
     ```
     <source>:6922:43: error: casting '__global void *' to type 'int *' changes 
address space of pointer
           atomic_add_return[(0)] = atomic_add(((int *)get_valid_counts_v0 + 
0), 1);
     ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to