Hi hi,

Yes, the vendor id looks useless for our purpose. I think it's attached to
a given machine though, contrary to the device id that changes across
executions. There are some opencl extensions (provided by apple) to find
out which opencl device is driving the display, using vendor ids, if I
remember correctly.

Let's agree on a device matching process. Here it how it works for now
(including the improvements I made yesterday):

1)  Device vendor (should be from the hardware generation, since it looks
like we've agreed on a sdk independent mechanism)
2)  Device Type
3)  Hardware generation
4)  Device name

1) and 2) should be swapped, indeed. I don't really use vendor defaults,
here is how fallbacks are handled:

-> There are global defaults for each device type, for both double and
floats.
-> When the generation is not found in the database, we look for the
closest generation for this vendor.
Example. Generations are stored as:
enum architecture_generation
{
  tesla,
  fermi,
  kepler,
  maxwell,

 evergreen,
 northern_islands,
 southern_islands
}

if the user has the combination (nvidia, kepler) and there is no kepler in
the database, then the database will parse the database for (nvidia,
$architecture), and will select the $architecture which is closest to
kepler (in terms of difference in the enum)

-> If the device name is not found, the first device found for this
architecture is selected. We should change this to have a similar mechanism
as above, so that similar devices are closer in an enum.

enum device_name
{
  ...
   gtx560,
   gtx570,
   gtx580,
   gtx590,
   ...
}

So that if there is a profile for the gtx520 and one for the gtx580, then
the gtx530 will pick the former and the gtx570 will pick the latter. There
is a pitfall with this approach, though, since it won't handle how close
devices from different generations are. (Ie a gtx470 may be similar to a
gtx550?)

-> Finally, even with all these fallbacks we cannot ensure correctness for
an unknown device. The kernel might require too much resources, for
example. For now, an exception would be thrown at
template_base::generate(), which is not acceptable. How should we handle
this ?
My idea is that we should check for invalidity when constructing the
template, and if the profile is not valid for this device, then fallback on
the defaults. As a general rule, when the slow default profile is used,
should we output a warning?

Philippe



2014-07-21 15:33 GMT+02:00 Karl Rupp <r...@iue.tuwien.ac.at>:

> Hi hi,
>
>
>
> >         Yesterday, I've realized that Toby and I didn't have the same
>
>>         vendor id
>>         for our Intel integrated GPU.
>>         I'm not sure of what the vendor id is supposed to represent, then.
>>
>>
>>     Looks like a unique identifier within each SDK. Not portable across
>>     SDKs...
>>
>>
>>
>> Unfortunately, this doesn't even seem to be the case! Toby and I both
>> use Beignet, but end up with a different Vendor ID...!   I could also
>> point to this stackoverflow question:
>> http://stackoverflow.com/questions/8146056/how-to-
>> programmatically-discover-specific-gpu-on-platform-with-multiple-gpus-op
>> According to what that person gets, we can't assume a one-to-one mapping
>> between vendor ids and platforms.
>>
>> |... Device ATI Radeon HD 5770[AMD]: vendorId[1021b00] ...
>> ... Device ATI Radeon HD 5770[AMD]: vendorId[2021b00] ...
>> |
>>
>
> Grml, so the only conclusion is that the vendor ID is totally useless?
>
>
>
>          Having
>>         (CL_DEVICE_TYPE_GPU, haswell) on beignet or window's sdk
>>         wouldn't make
>>         any difference from our point of view, and having
>>         (CL_DEVICE_TYPE_CPU,
>>         haswell) on Intel's SDK or AMD's wouldn't make any difference
>>         either.
>>
>>
>>     Why should we treat them equally? I can well imagine that different
>>     compiler backends work differently, so I'd actually *expect* that
>>     the best performance on these SDKs is obtained with different
>>     configurations. For example, a LLVM-backend vs. a non-LLVM-backend
>>     is unlikely to behave similarly with such an SDK.
>>
>>
>> Well, I would also expect that... but I would expect the worse
>> performance coming from mising optimizations (no automatic loop
>> unrolling, no auto-vectorization, etc...).
>>
>
> Not necessarily, as different compiler might have different approaches for
> vectorization. One compiler may decide to only vectorize the vector data
> types (double2, etc.)
>
>
>   My insight is that
>> auto-tuning the same device for two different SDKs is conceptually
>> similar to auto-tuning the same device for two versions of the same SDK.
>>
>
> Agreed.
>
>
>
>  Now, do we also want to store the platform version in the builtin
>> database? We could, but it will certainly involve a pretty complicated
>> fallback mechanism, which will be practically always used because of the
>> fragmentation of the SDK versions.
>>
>
> Not for 1.6.0. What I have in mind is that we don't design this 'database'
> to be too restrictive and suffer from that later. In other words, we should
> be reasonably prepared for regular user benchmark logs coming in,
> particularly when the benchmark GUI gets released. It would be a waste if
> we can't make use of such valuable data ;-)
>
>
>
>
>          This would also prevent some headache when populating the
>> database!
>>
>>
>>     Which headaches? I think if we treat all OpenCL SDKs equally, we
>>     will later have to refactor this because we will find differences
>>     among the SDKs...
>>
>>
>> Well, populating the database will be much longer if we consider the
>> variations in the compiler (platform versions, sdks...). Just like we
>> stick to auto-tuning a routine for the latest SDK version of a vendor...
>> Well, the vendor_id key could still be replaced by a platform enum that
>> could be obtained by parsing the platform name + version...
>>
>
> For the upcoming 1.6.0 it's certainly reasonable to ignore the platform
> and SDK versions and only tune on the latest software stack (i.e. SDK plus
> driver).
>
>
>
>  This also means : do we want to run all our tuning procedures for Apple,
>> since it has its own SDK? If not, should we use a fallback for Apple.
>> Which one for the CPU? AMD? Intel? Which one for the iGPU? Beignet,
>> Intel? Why?
>>
>
> Well, ultimately we have to (at least for verification purposes), even
> though the Apple SDK uses primarily vendor components under the hood.
>
>
> Can we agree on a device matching process? I suggest that we use the
> following iterative matching procedure
>
> 1.) Device Type
> 2.) Device Vendor
> 3.) Hardware Generation (from Device Name?)
> 4.) Device Name
> (further checks like driver SDK version, etc. can be added later)
>
> The matching proceeds as far as possible: If we only match the device
> type, we use defaults for that. If we can also match the vendor, we have
> better defaults for that. If we can map the device name to a vendor
> architecture, we can use an improved configuration for that. Ultimately, if
> we even match the full name, we have a full hardware-aware kernel parameter
> set. I'm confident that with some effort we can manage to get the first
> three points to match for most hardware out there.
>
> Best regards,
> Karli
>
>
>
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to