Re: [PyCUDA] Recommended way to prepare ElementwiseKernel?

Andreas Kloeckner Fri, 19 Jul 2013 12:32:43 -0700

Michael McNeil Forbes <[email protected]> writes:
> Here is the profile of the slow __call__. All the time is spent in 
> generate_stride_kernel_and_types:
>
> Line #      Hits         Time  Per Hit   % Time  Line Contents
> ==============================================================
>    192                                               def __call__(self, 
> *args, **kwargs):
>    193        78          145      1.9      0.1          vectors = []
>    ...
>    204        78          104      1.3      0.1          func, arguments = 
> self.generate_stride_kernel_and_types(
>    205        78       199968   2563.7     97.3                  range_ is 
> not None or slice_ is not None)
>    206                                           
>    207       156          354      2.3      0.2          for arg, arg_descr 
> in zip(args, arguments):
>    ...
>    241                                           
>    242        78         2780     35.6      1.4          
> func.prepared_async_call(grid, block, stream, *invocation_args)


Now this is just confusing to me. generate_stride_kernel_and_types has a
@memoize_method decorator, which should take care of caching the built
kernel. Unless you're instantiating a new ElementwiseKernel for each
call, generate_stride_kernel_and_types should only ever get called
once. The default (cached) case should amount to one dictionary lookup,
so I'm confused as to how that would eat up so much time. Can you
perhaps create a small reproducer for this?

Thanks,
Andreas

pgpF6GV5YfgPS.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Recommended way to prepare ElementwiseKernel?

Reply via email to