Re: [ViennaCL-devel] BLAS3, range, slice, compilation time...

2013-08-13 Thread Karl Rupp
Hey, alright, we've got some issues to fight ;-) On GPUs with 16kB of shared memory (e.g. GTX 285), the generated GEMM kernels now exceed the available memory: Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too much shared data (0x40a0 bytes + 0x10 bytes system, 0x4000 max) Thi

Re: [ViennaCL-devel] BLAS3, range, slice, compilation time...

2013-08-12 Thread Philippe Tillet
Hi Karl, 2013/8/12 Karl Rupp > Hi Xeon Phil ;-) > > > > There are a lot of problems related to coupling the current BLAS3 > > implementation with the kernel generator: > > > > - While I think I could add some range support, adding slices will be > > extremely difficult, and it would probably r

Re: [ViennaCL-devel] BLAS3, range, slice, compilation time...

2013-08-11 Thread Karl Rupp
Hi Xeon Phil ;-) > There are a lot of problems related to coupling the current BLAS3 > implementation with the kernel generator: > > - While I think I could add some range support, adding slices will be > extremely difficult, and it would probably result in bad performance > whatever kernel is u

[ViennaCL-devel] BLAS3, range, slice, compilation time...

2013-08-11 Thread Philippe Tillet
Hi everybody... There are a lot of problems related to coupling the current BLAS3 implementation with the kernel generator: - While I think I could add some range support, adding slices will be extremely difficult, and it would probably result in bad performance whatever kernel is used. The most