Hey,
alright, we've got some issues to fight ;-)
On GPUs with 16kB of shared memory (e.g. GTX 285), the generated GEMM
kernels now exceed the available memory:
Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too much
shared data (0x40a0 bytes + 0x10 bytes system, 0x4000 max)
Thi
Oops, again did "reply" instead of "reply to all". :)
-- Forwarded message --
From: Philippe Tillet
Date: 2013/8/13
Subject: Re: [ViennaCL-devel] BLAS3, range, slice, compilation time...
To: Karl Rupp
Hey,
2013/8/13 Karl Rupp
> Hey,
>
> alright, we've got some issues to fig
Hi,
> On GPUs with 16kB of shared memory (e.g. GTX 285), the generated
> GEMM kernels now exceed the available memory:
>
> Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too
> much shared data (0x40a0 bytes + 0x10 bytes system, 0x4000 max)
>
> This is because of
Hi hi,
2013/8/13 Karl Rupp
> Hi,
>
> > On GPUs with 16kB of shared memory (e.g. GTX 285), the generated
> > GEMM kernels now exceed the available memory:
> >
> > Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too
> > much shared data (0x40a0 bytes + 0x10 bytes sys
Hi,
> We can directly query the available local device memory (which is the
> reason why I added all this buffering to the device class). Am I missing
> something?
>
>
> Yes, we could. But having the combination {vendor, local memory} seems a
> bit weird to me, I think {vendor, genera
Hi again,
thanks, the compilation problem is fixed. Unfortunately, there's still
the invalid work group size error showing up. Output from viennacl-info:
Address Bits: 32
Available: 1
Compiler Available:1
Endian Little: 1
Error Cor
Hi hi,
Yes, the default NVidia profile for double precision uses a work group size
of 1024... All this is checked during the autotuning procedure so that it
will work for the hardware it's tunned for...
Meh, seems like we need a couple additional levels of abstraction to reach
safety.
Best regard
Hey,
2013/8/13 Karl Rupp
> Hi,
>
>
> > We can directly query the available local device memory (which is the
>
>> reason why I added all this buffering to the device class). Am I
>> missing
>> something?
>>
>>
>> Yes, we could. But having the combination {vendor, local memory} seems
Hi,
> Yes, the default NVidia profile for double precision uses a work group
> size of 1024... All this is checked during the autotuning procedure so
> that it will work for the hardware it's tunned for...
> Meh, seems like we need a couple additional levels of abstraction to
> reach safety.
In
Hey,
> {vendor, generation} is the natural format for the handling the
> profile internally, yes. This will presumably involve string parsing
> of the device name, yes :-(
>
>
> I'll do that :) Should I add a "generation" method in the ocl::device
> class? I think it is most suited he
Hey hey,
I've pushed the changes. Does it solve the GTX285 case?
The policy is :
- One global GPU fallback (very conservative)
- One global CPU fallback (very conservative)
- One global Accelerator fallback (very conservative)
-One Fallback per architecture family
if the vendor is not i
Hi guys,
wow, AMD open-sourced their Math libraries...
Best regards,
Karli
---
*AMD Accelerated Parallel Processing Math Libraries (APPML) is now
available as open source as clMath.*
I am extremely pleased to have the opportunity to announce that the
APPML BLAS & FFT proje
Hey,
> I've pushed the changes. Does it solve the GTX285 case?
thanks, it does!
> The policy is :
>
> - One global GPU fallback (very conservative)
> - One global CPU fallback (very conservative)
> - One global Accelerator fallback (very conservative)
> -One Fallback per architecture family
>
Hi,
2013/8/14 Karl Rupp
> Hey,
>
> > I've pushed the changes. Does it solve the GTX285 case?
>
> thanks, it does!
>
>
Cool !
>
>
> The policy is :
>>
>> - One global GPU fallback (very conservative)
>> - One global CPU fallback (very conservative)
>> - One global Accelerator fallback (very co
Hi,
> Do we want to keep the full device name in the profiles map? With
> vendor and arch determined, we know pretty much everything we need
> to know. If we need to match the name 1:1, there may be too many
> devices which we miss even though the 'faster' profile should work?
>
>
15 matches
Mail list logo