Hey,
alright, we've got some issues to fight ;-)
On GPUs with 16kB of shared memory (e.g. GTX 285), the generated GEMM
kernels now exceed the available memory:
Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too much
shared data (0x40a0 bytes + 0x10 bytes system, 0x4000 max)
Thi
Hi Karl,
2013/8/12 Karl Rupp
> Hi Xeon Phil ;-)
>
>
> > There are a lot of problems related to coupling the current BLAS3
> > implementation with the kernel generator:
> >
> > - While I think I could add some range support, adding slices will be
> > extremely difficult, and it would probably r
Hi Xeon Phil ;-)
> There are a lot of problems related to coupling the current BLAS3
> implementation with the kernel generator:
>
> - While I think I could add some range support, adding slices will be
> extremely difficult, and it would probably result in bad performance
> whatever kernel is u
Hi everybody...
There are a lot of problems related to coupling the current BLAS3
implementation with the kernel generator:
- While I think I could add some range support, adding slices will be
extremely difficult, and it would probably result in bad performance
whatever kernel is used. The most