Oops, just realized I replied only to Andreas, not the whole list, for the 
record here it is: 


This has come up multiple times with the C++ bindings and even in the OpenCL 
working group. The stance taken there is the user should explicitly create 
copies of kernels for each thread that needs to set arguments. Though doing so 
is kind of cumbersome from an C API perspective as it means multiple calls to 
clCreateKernel with the associated kernel name lookup. A clCloneKernel API has 
been talked about, but needs someone to champion it all the way to inclusion in 
the specification. 

PyOpenCL's API is interesting cause users don't really deal with kernel objects 
directly that much since it seems the primary way of accessing a kernel is the 
attribute of a program object. With that in mind, I would propose using thread 
local storage instead of a mutex. This provides a separate kernel for each 
thread working around the set_args and enqueue_nd_range_kernel problem as well. 
It also nicely side-steps any possible performance problem. The implementation 
would look something like this: 

def __init__(self):
   self._kernels = threading.local()

def __getattr__(self, attr):
   try:
       knl = getattr(self._kernels, attr, None)
       if knl is None:
           knl = Kernel(self, attr)
           setattr(self._kernels, attr, knl)
       # Nvidia does not raise errors even for invalid names,                   
                                                                                
                      
       # but this will give an error if the kernel is invalid.                  
                                                                                
                      
       knl.num_args
       return knl
   except LogicError:
       raise AttributeError("'%s' was not found as a program "
                            "info attribute or as a kernel name" % attr)


Cheers, 
Brian

On Oct 7, 2012, at 7:04 PM, Andreas Kloeckner wrote:

> Hi Bogdan,
> 
> Bogdan Opanchuk <[email protected]> writes:
>> I personally do not mind the lock, as I won't notice an overhead so
>> small in my applications. I am just not completely sure about whether
>> such introduction is justified logically. You are saying that "it's
>> the only spot where the CL API is not thread-safe", but __call__ is
>> not CL API — set_args and enqueue_nd_range_kernel are. __call__ is a
>> convenience function that contains several calls to CL API, same as,
>> say, pyopencl.array.sum(). Does __call__ get special treatment because
>> it is somewhat closer to the "core" and used more often?
> 
> set_arg and enqueue_nd_range_kernel individually do not race, it's the
> combination of them that does. __call__ constitutes that combination, so
> it would need to be protected. That said, I'm inclined to follow what
> Brian said and simply say in the docs that if multiple threads enqueue a
> kernel, each should make its own copy. That's easy:
> 
> prg.mykernel
> 
> Doing that individually for each thread is no big deal, IMO, and it
> avoids this whole locking story. I'll add a remark to the docs once
> everyone has had a chance to reply.
> 
> Andreas
> 
> 
> _______________________________________________
> PyOpenCL mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pyopencl


_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to