Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based

Simon Urbanek Thu, 21 Oct 2010 09:36:34 -0700

On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote:

> 
> On 21 Oct 2010, at 03:28, Simon Urbanek wrote:
> 
>> It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use 
>> "sysctl hw.model" to find out what you have). If in doubt, check on 
>> Wikipedia ;)
>> 
>> The latter uses the Nehalem architecture but I don't have a specimen of 
>> those so I can't confirm that the bug still holds true for those.
> 
> Not just those ... I'm plagued by the same problem on my Penryn-based 
> MacBookPro4,1.  In 64-bit mode, BLAS performance breaks down to single core 
> levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores.  I 
> posted some benchmark results to this list a few weeks ago.
>


Well, given that it is only a two-thread CPU there is not much you can gain so 
I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while 
different story ;). For illustration, those are the timings from your 
benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz 
MacPro4,1:

test                    R BLAS  vecLib  ATLAS   MKL
inner M %*% t(M) D      19.961  3.470   0.519   0.662
inner tcrossprod D      0.658   1.867   0.243   0.235
inner crossprod t(M) D  9.574   1.849   0.242   0.256
cosine normalised D     0.798   2.009   0.385   0.411
cosine general D        0.770   1.993   0.380   0.352
euclid() D              2.072   3.271   1.637   1.635
euclid() small D        0.515   0.821   0.421   0.395

As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of 
magnitude. It's sad, because vecLib used to be fairly well optimized ... (in 
fact it is actually some version of ATLAS which is even more strange ...).


> My solution has also been to switch to the reference BLAS, which outperforms 
> vecLib on most of the operations I benchmarked, except for crossprod(), which 
> is terribly slow (more than 10x slower than tcrossprod()).  I've just tested 
> again with R 2.12.0, and the situation has become even worse: now an explicit 
> matrix multiplication M %*% t(M) -- which used to be fast -- performs as 
> poorly as crossprod().
> 
> Any ideas about this?  The crossprod() slowdown isn't a Mac problem: I got 
> similar results on a Pentium Dual Core laptop running Ubuntu.  If this is a 
> known problem of the reference BLAS, is there any way to work around it?
> 
> Apart from the speed hiccups, in my benchmarks vecLib BLAS performed 
> consistently slower than the reference BLAS.  Is there evidence from other 
> benchmarks / hardware architectures that vecLib can be faster?  If not, 
> perhaps the default should be _not_ to use vecLib on Mac?  Or perhaps it 
> would be possible to autodetect hardware in the R startup wrapper and select 
> the BLAS that's known to run faster on this setup?
> 

I don't think we would want to do that since that would prevent the user from 
choosing the BLAS they want to use. We will probably abandon vecLib as the 
default for the next release (more due to its numerical instability issues) and 
maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose 
from in case they have a machine that can take advantage of it.

Cheers,
Simon

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based

Reply via email to