Hey,

>         So that if there is a profile for the gtx520 and one for the gtx580,
>         then the gtx530 will pick the former and the gtx570 will pick the
>         latter. There is a pitfall with this approach, though, since it
>         won't
>         handle how close devices from different generations are. (Ie a
>         gtx470
>         may be similar to a gtx550?)
>
>
>     What if 'closer' always picks the weaker GPU?
>
>
> Well. This is indeed something tricky. Should a gtx560 fallback on a
> gtx580 or a gtx 540 ... what about a gtx660... Well, I would argue that
> considering what a fallback should be, we wouldn't want to be too greedy
> about finding the best fallback possible.

By having close looks at
  http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
  http://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units
we should be able to define a reasonable fallback for each GPU. Similar 
strategies can be applied for CPUs. There is no maintenance work 
associated with that other than updating it about once a year for the 
latest hardware releases.


>     GPU-rebranding is indeed annoying here, but I think it can be
>     addressed in the same way vendors relabel it: We just take the
>     tuning profile for device 'a' and rebrand it with name 'b'. ;-)
>
>
> That's the best way to handle it indeed! We could rebrand a new version
> of viennacl which just adds support for the rebranded devices.
> Kidding... :-p

Actually, that's not that unreasonable. Vendors are doing the same with 
their libraries (drivers) whenever some new hardware hits the market.
So, is there some easy (non-static) way to forward/inherit the profile 
for device 'a' to device 'b'?


>     Does this affect kernels other than GEMM? Either way, this needs
>     some kind of hierarchical fallback, which, in simplest term, is just
>     falling back to a super-conservative profile.
>
>
> Yes, even the simplest vector addition template can be made invalid,
> just feed it with a local size too big and it'll die like a golden fish.

Well, it 'can' die in theory, but keep in mind that we've been using a 
local size of 128 for years now and there haven't been any significant 
issues with this so far.


> I will experiment another solution to handle this, since I'll need to
> try this for evolutionary tuning anyway. I'll come up with a mechanism
> to project the profile back on the space of the valid configurations.
> For example, if work_group_size > max_work_group_size, change the local
> sizes so that we obtain work_group_size = max_work_group_size. For some
> criterions it gets more tricky but it should give us a better
> device-adaptive fallback. It is actually the only viable solution, since
> the opencl standards only guarantees work_group_size>1, and for GPUs we
> certainly want to find a better lower-bound. If we use a fallback which
> uses a work group size of 128, we have no guarantee that it will work
> everywhere (even though it probably will). Also, we can do nice things
> like ensure for NVidia that the local size is a multiple of 32, and for
> AMD that it is a multiple of 64.

This somewhat boils down to the question of whether we can provide a 
fallback mechanism that is more robust than simply taking a conservative 
work group size of 64 or 128.



>
>         As a general rule, when the slow default profile is
>         used, should we output a warning?
>
>
>     We should provide a diagnostics function to the user, yes. We should
>     not dump anything to stdout without the user explicitly asking for it.
>
>
> The most convenient is probably to introduce a flag
> VIENNACL_DEBUG_GENERATOR .

You probably want to have more fine-grained control over the individual 
components. For example, to only warn about the fallbacks, you might 
want to use something similar to
   VIENNACL_DEBUG_GENERATOR_WARN_ON_FALLBACK

I can well imagine that this is something the user wants to use for 
runtime queries, so it's probably better to provide a function returning 
true/false or even more detailed information about the profile used 
rather than requiring recompilations with dumps to a terminal.

Best regards,
Karli


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to