Hi,

 > Good news : the GEMMs calls for OpenCL on dense non-proxy matrix now
> call the generator ! It's a good step towards performance portability.

Hurray, indeed it is! Well done! :-)
Now as you fixed some things in the autotuner, I could also give it 
another shot on the MIC. Does the autotuner print all the interesting 
device information already?

Oh, and I could also go for a K20X. Ready? ;-)


> For now, single precision : (...)
>
> All row-major.

This does not yet include the 'trick' with using the transposed case for 
column-major, does it?


> Bad news : peaky performance :
> There's no missing digit for the "2048" case. Dealing with it is fairly
> complicated, since it involves having different profiles for different
> sizes. Since it seems to only affect AMD Hardware, I think we just warn
> about the issue...

We should document this, yes. However, I don't know whether we can 
reliably warn about this. It may even be driver or OS dependent.


> However, this is the size used by default in the blas3_bench, which made
> me freak out.

Ouch... :-/


> What do you guys would think about a more "graphical" (ie either plot or
> a list, like above) benchmark, so that people who really care a lot
> about performance can have an idea of when to use what.

Printing a list with performances for different sizes would be nice, 
yes. Feel free to just print the output above, but better truncate near 
1500 so that it does not crash (or take too long) on weaker hardware.


> *Examples:*
> -> I'm having a matrix of internal size 384*384 => In C=A*B, I'd prefer
> to either order A as column-major (or to use C=trans(A)*B where A is
> row-major instead, on my specific hardware)
> -> If I want some performance increase for big problems, I'd rather go
> for C=A*B in all row-major.
>
> What do you think about that?

We should also have a section in the manual about this including some of 
our (i.e. particularly your) experiences.

Best regards,
Karli


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to