On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote: > > On 21 Oct 2010, at 03:28, Simon Urbanek wrote: > >> It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use >> "sysctl hw.model" to find out what you have). If in doubt, check on >> Wikipedia ;) >> >> The latter uses the Nehalem architecture but I don't have a specimen of >> those so I can't confirm that the bug still holds true for those. > > Not just those ... I'm plagued by the same problem on my Penryn-based > MacBookPro4,1. In 64-bit mode, BLAS performance breaks down to single core > levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores. I > posted some benchmark results to this list a few weeks ago. >
Well, given that it is only a two-thread CPU there is not much you can gain so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while different story ;). For illustration, those are the timings from your benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz MacPro4,1: test R BLAS vecLib ATLAS MKL inner M %*% t(M) D 19.961 3.470 0.519 0.662 inner tcrossprod D 0.658 1.867 0.243 0.235 inner crossprod t(M) D 9.574 1.849 0.242 0.256 cosine normalised D 0.798 2.009 0.385 0.411 cosine general D 0.770 1.993 0.380 0.352 euclid() D 2.072 3.271 1.637 1.635 euclid() small D 0.515 0.821 0.421 0.395 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of magnitude. It's sad, because vecLib used to be fairly well optimized ... (in fact it is actually some version of ATLAS which is even more strange ...). > My solution has also been to switch to the reference BLAS, which outperforms > vecLib on most of the operations I benchmarked, except for crossprod(), which > is terribly slow (more than 10x slower than tcrossprod()). I've just tested > again with R 2.12.0, and the situation has become even worse: now an explicit > matrix multiplication M %*% t(M) -- which used to be fast -- performs as > poorly as crossprod(). > > Any ideas about this? The crossprod() slowdown isn't a Mac problem: I got > similar results on a Pentium Dual Core laptop running Ubuntu. If this is a > known problem of the reference BLAS, is there any way to work around it? > > Apart from the speed hiccups, in my benchmarks vecLib BLAS performed > consistently slower than the reference BLAS. Is there evidence from other > benchmarks / hardware architectures that vecLib can be faster? If not, > perhaps the default should be _not_ to use vecLib on Mac? Or perhaps it > would be possible to autodetect hardware in the R startup wrapper and select > the BLAS that's known to run faster on this setup? > I don't think we would want to do that since that would prevent the user from choosing the BLAS they want to use. We will probably abandon vecLib as the default for the next release (more due to its numerical instability issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose from in case they have a machine that can take advantage of it. Cheers, Simon _______________________________________________ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac