Hello everybody !

So, it seems like padding sizes really do matter, and that imho making
ViennaCL truly peformance-portable will require at some point
hardware-adaptive padding size.

For now, the generator only handles NonTrans x Trans multiplication, but I
can already tell that :

----------------------------------------------------
-> Fermi :
SGEMM - Optimal padding size is 64 ,  about 10-15% performance loss if
using a padding size of 48 or 96
DGEMM - Have not tested it yet...

-> Hawaii
SGEMM - Optimal padding size is 96. About 10-15% performance loss if using
a padding size of 64 or 128
DGEMM - Optimal padding size is 48. About 10-15% loss if using a padding
size of 64 or 128.
More importantly, AMD GPUs usually have horrible performance on some sizes
(4096, 4608, etc) on which bank conflicts happen and performance drop by 5x
to 10x. Using a padding size of 48/96 not only gives better peak
performance, but also allows to circumvent this weird issue.
---------------------------------------------------

So, it seems like there is between 10 and 15% penalty (and much more
sometimes on AMD hardware) happening from not choosing the correct padding
size. On hawaii, this means that one will obtain (wowow, exclusive report
on ViennaCL 1.6's performance on Hawaii!) 3.8 TFLOP/s instead of 4.2
TFLOP/s, and I think that this difference is significant enough to be worth
being dealt with.

I'm not sure, however, how we should deal with this issue. Since kernels
are compiled at the context level and since we plan to use one device per
context, what would you think about handling the matrix-padding size within
the context instead of the matrix?

I think we shouldn't expose it to the user, though, since the kernels have
to be entirely compatible with the padding size (and we don't want the user
to break everything!). What would you think about, at context
initialization, querying the optimal padding size to the generator for the
current device? If the context has multiple devices with incompatible
padding size, how to handle it? Display a warning for low performance and
use a crappy fallback kernel?

Best regards,
Philippe
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to