Hey,
> So that if there is a profile for the gtx520 and one for the gtx580, > then the gtx530 will pick the former and the gtx570 will pick the > latter. There is a pitfall with this approach, though, since it > won't > handle how close devices from different generations are. (Ie a > gtx470 > may be similar to a gtx550?) > > > What if 'closer' always picks the weaker GPU? > > > Well. This is indeed something tricky. Should a gtx560 fallback on a > gtx580 or a gtx 540 ... what about a gtx660... Well, I would argue that > considering what a fallback should be, we wouldn't want to be too greedy > about finding the best fallback possible. By having close looks at http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units http://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units we should be able to define a reasonable fallback for each GPU. Similar strategies can be applied for CPUs. There is no maintenance work associated with that other than updating it about once a year for the latest hardware releases. > GPU-rebranding is indeed annoying here, but I think it can be > addressed in the same way vendors relabel it: We just take the > tuning profile for device 'a' and rebrand it with name 'b'. ;-) > > > That's the best way to handle it indeed! We could rebrand a new version > of viennacl which just adds support for the rebranded devices. > Kidding... :-p Actually, that's not that unreasonable. Vendors are doing the same with their libraries (drivers) whenever some new hardware hits the market. So, is there some easy (non-static) way to forward/inherit the profile for device 'a' to device 'b'? > Does this affect kernels other than GEMM? Either way, this needs > some kind of hierarchical fallback, which, in simplest term, is just > falling back to a super-conservative profile. > > > Yes, even the simplest vector addition template can be made invalid, > just feed it with a local size too big and it'll die like a golden fish. Well, it 'can' die in theory, but keep in mind that we've been using a local size of 128 for years now and there haven't been any significant issues with this so far. > I will experiment another solution to handle this, since I'll need to > try this for evolutionary tuning anyway. I'll come up with a mechanism > to project the profile back on the space of the valid configurations. > For example, if work_group_size > max_work_group_size, change the local > sizes so that we obtain work_group_size = max_work_group_size. For some > criterions it gets more tricky but it should give us a better > device-adaptive fallback. It is actually the only viable solution, since > the opencl standards only guarantees work_group_size>1, and for GPUs we > certainly want to find a better lower-bound. If we use a fallback which > uses a work group size of 128, we have no guarantee that it will work > everywhere (even though it probably will). Also, we can do nice things > like ensure for NVidia that the local size is a multiple of 32, and for > AMD that it is a multiple of 64. This somewhat boils down to the question of whether we can provide a fallback mechanism that is more robust than simply taking a conservative work group size of 64 or 128. > > As a general rule, when the slow default profile is > used, should we output a warning? > > > We should provide a diagnostics function to the user, yes. We should > not dump anything to stdout without the user explicitly asking for it. > > > The most convenient is probably to introduce a flag > VIENNACL_DEBUG_GENERATOR . You probably want to have more fine-grained control over the individual components. For example, to only warn about the fallbacks, you might want to use something similar to VIENNACL_DEBUG_GENERATOR_WARN_ON_FALLBACK I can well imagine that this is something the user wants to use for runtime queries, so it's probably better to provide a function returning true/false or even more detailed information about the profile used rather than requiring recompilations with dumps to a terminal. Best regards, Karli ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ ViennaCL-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/viennacl-devel
