On 21 Oct 2010, at 03:28, Simon Urbanek wrote:

> It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use 
> "sysctl hw.model" to find out what you have). If in doubt, check on Wikipedia 
> ;)
> 
> The latter uses the Nehalem architecture but I don't have a specimen of those 
> so I can't confirm that the bug still holds true for those.

Not just those ... I'm plagued by the same problem on my Penryn-based 
MacBookPro4,1.  In 64-bit mode, BLAS performance breaks down to single core 
levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores.  I 
posted some benchmark results to this list a few weeks ago.

My solution has also been to switch to the reference BLAS, which outperforms 
vecLib on most of the operations I benchmarked, except for crossprod(), which 
is terribly slow (more than 10x slower than tcrossprod()).  I've just tested 
again with R 2.12.0, and the situation has become even worse: now an explicit 
matrix multiplication M %*% t(M) -- which used to be fast -- performs as poorly 
as crossprod().

Any ideas about this?  The crossprod() slowdown isn't a Mac problem: I got 
similar results on a Pentium Dual Core laptop running Ubuntu.  If this is a 
known problem of the reference BLAS, is there any way to work around it?

Apart from the speed hiccups, in my benchmarks vecLib BLAS performed 
consistently slower than the reference BLAS.  Is there evidence from other 
benchmarks / hardware architectures that vecLib can be faster?  If not, perhaps 
the default should be _not_ to use vecLib on Mac?  Or perhaps it would be 
possible to autodetect hardware in the R startup wrapper and select the BLAS 
that's known to run faster on this setup?

Best wishes,
Stefan

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Reply via email to