Re: [PyCUDA] Recommended way to prepare ElementwiseKernel?

Michael McNeil Forbes Fri, 19 Jul 2013 12:31:38 -0700

Okay, my bad.  I was only looping 40 times, so the initial application was was 
eating all the time.  Iterating 4000 times through the loop gives more 
reasonable per-function calls -- @memoize_method is indeed working.


That being said, having a way to explicitly prepare the function before using 
it can be helpful.  One use-case is to facilitate profiling loops...:-)

Sorry for the red-herring.

Michael.

On Jul 19, 2013, at 11:50 AM, Andreas Kloeckner <[email protected]> wrote:

> Michael McNeil Forbes <[email protected]> writes:
>> Here is the profile of the slow __call__. All the time is spent in 
>> generate_stride_kernel_and_types:
>> 
>> Line #      Hits         Time  Per Hit   % Time  Line Contents
>> ==============================================================
>>   192                                               def __call__(self, 
>> *args, **kwargs):
>>   193        78          145      1.9      0.1          vectors = []
>>   ...
>>   204        78          104      1.3      0.1          func, arguments = 
>> self.generate_stride_kernel_and_types(
>>   205        78       199968   2563.7     97.3                  range_ is 
>> not None or slice_ is not None)
>>   206                                           
>>   207       156          354      2.3      0.2          for arg, arg_descr 
>> in zip(args, arguments):
>>   ...
>>   241                                           
>>   242        78         2780     35.6      1.4          
>> func.prepared_async_call(grid, block, stream, *invocation_args)
> 
> Now this is just confusing to me. generate_stride_kernel_and_types has a
> @memoize_method decorator, which should take care of caching the built
> kernel. Unless you're instantiating a new ElementwiseKernel for each
> call, generate_stride_kernel_and_types should only ever get called
> once. The default (cached) case should amount to one dictionary lookup,
> so I'm confused as to how that would eat up so much time. Can you
> perhaps create a small reproducer for this?
> 
> Thanks,
> Andreas


_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Recommended way to prepare ElementwiseKernel?

Reply via email to