Hi hi hi,
On 07/21/2014 03:59 PM, Philippe Tillet wrote:
> Hi hi,
>
> Yes, the vendor id looks useless for our purpose. I think it's attached
> to a given machine though, contrary to the device id that changes across
> executions. There are some opencl extensions (provided by apple) to find
> out which opencl device is driving the display, using vendor ids, if I
> remember correctly.
>
> Let's agree on a device matching process. Here it how it works for now
> (including the improvements I made yesterday):
>
> 1) Device vendor (should be from the hardware generation, since it
> looks like we've agreed on a sdk independent mechanism)
> 2) Device Type
> 3) Hardware generation
> 4) Device name
>
> 1) and 2) should be swapped, indeed.
If everything else fails, at least the device type is a reliable
information. :-)
> I don't really use vendor defaults,
> here is how fallbacks are handled:
>
> -> There are global defaults for each device type, for both double and
> floats.
> -> When the generation is not found in the database, we look for the
> closest generation for this vendor.
> Example. Generations are stored as:
> enum architecture_generation
> {
> tesla,
> fermi,
> kepler,
> maxwell,
>
> evergreen,
> northern_islands,
> southern_islands
> }
>
> if the user has the combination (nvidia, kepler) and there is no kepler
> in the database, then the database will parse the database for (nvidia,
> $architecture), and will select the $architecture which is closest to
> kepler (in terms of difference in the enum)
>
> -> If the device name is not found, the first device found for this
> architecture is selected. We should change this to have a similar
> mechanism as above, so that similar devices are closer in an enum.
>
> enum device_name
> {
> ...
> gtx560,
> gtx570,
> gtx580,
> gtx590,
> ...
> }
>
> So that if there is a profile for the gtx520 and one for the gtx580,
> then the gtx530 will pick the former and the gtx570 will pick the
> latter. There is a pitfall with this approach, though, since it won't
> handle how close devices from different generations are. (Ie a gtx470
> may be similar to a gtx550?)
What if 'closer' always picks the weaker GPU?
GPU-rebranding is indeed annoying here, but I think it can be addressed
in the same way vendors relabel it: We just take the tuning profile for
device 'a' and rebrand it with name 'b'. ;-)
> -> Finally, even with all these fallbacks we cannot ensure correctness
> for an unknown device. The kernel might require too much resources, for
> example. For now, an exception would be thrown at
> template_base::generate(), which is not acceptable. How should we handle
> this ?
> My idea is that we should check for invalidity when constructing the
> template, and if the profile is not valid for this device, then fallback
> on the defaults.
Does this affect kernels other than GEMM? Either way, this needs some
kind of hierarchical fallback, which, in simplest term, is just falling
back to a super-conservative profile.
> As a general rule, when the slow default profile is
> used, should we output a warning?
We should provide a diagnostics function to the user, yes. We should not
dump anything to stdout without the user explicitly asking for it.
Best regards,
Karli
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel