Re: [ViennaCL-devel] Fwd: BLAS3, range, slice, compilation time...

Philippe Tillet Tue, 13 Aug 2013 15:10:33 -0700

Hey hey,

I've pushed the changes. Does it solve the GTX285 case?


The policy is :

- One global GPU fallback (very conservative)
- One global CPU fallback (very conservative)
- One global Accelerator fallback (very conservative)
-One Fallback per architecture family
--------
if the vendor is not in the database, return the global fallback profile
if the vendor is in the database but the architecture fallback isn't,
return global fallback
if the vendor, architecture is in the database but not the name, test
architecture fallback. If the profile is invalid (work group size not
compatible, too much local size), returns the global fallback. Else, return
the architecture fallback
If everything is fine, return the specific device profile.

Best regards,
Philippe


2013/8/13 Karl Rupp <[email protected]>

> Hey,
>
>
> >     {vendor, generation} is the natural format for the handling the
>
>>     profile internally, yes. This will presumably involve string parsing
>>     of the device name, yes :-(
>>
>>
>> I'll do that :) Should I add a "generation" method in the ocl::device
>> class? I think it is most suited here. We know however that vendors
>> offer revisions of the current generation. Should GTX 480 and GTX 580 be
>> both parsed as "Fermi". Since this would just be used for a fallback, I
>> think that having both GeForce 4xx and GeForce 5xx parsed as Fermi would
>> be a good solution.
>>
>
> "Fermi" should be fine (return that via some enum value rather than a
> string). What matters for us is the architecture generation, a change of
> the fabrication process shouldn't have any significant impact on the best
> kernels.
>
>
>
>      However, the local memory available might vary between devices of
>>     the same generation (think of desktop vs. mobile), so probably we
>>     extend it to:
>>        {vendor, generation, min_local_mem_required}
>>     If the device does not have enough local memory even though it is
>>     mapped correctly to a certain generation, we simply fall back to a
>>     legacy profile which stays within the 16kB.
>>
>>
>> Right. In this case we probably have to deal with a pretty low-end GPU,
>> and we should just fallback to the conservative vendor-default... It's
>> probably too error-prone to maintain multiple fallback profiles for a
>> given generation :P
>>
>
> Right, one fall-back profile per generation has to be enough (it can even
> be one fall-back profile for all generations for a start).
>
>
>
>  I have pushed the modifications to the vendor fallbacks so that they use
>> below 16kB.
>> It may not solve the invalid work group size problem... Seems we'll have
>> to be conservative on both the local size and the work group sizes...
>>
>
> I don't see a big problem with using a fairly 'slow' kernel when going for
> the fallback option. As you say, this is usually an indication of a rather
> weak GPU, so it's better to go for a reliable execution rather than to
> error out trying to squeeze out performance which isn't there anyway.
>
> Best regards,
> Karli
>
>

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk

_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Fwd: BLAS3, range, slice, compilation time...

Reply via email to