Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based
Hi, Although I should, I don't follow too closely in the places I should (I don't know where those places are) to know if Apple is aware of this vecLib breakdown issue ... are they? Where can we go to add our voices/votes for them to fix it? Apple still has their apple.com/science section, so I guess this should be (somehow) important to them ... maybe we can make an R-SIG-Mac 20-person-strong tidal wave to help force their hand? ;-) -steve On Thu, Oct 21, 2010 at 12:36 PM, Simon Urbanek simon.urba...@r-project.org wrote: On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote: On 21 Oct 2010, at 03:28, Simon Urbanek wrote: It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use sysctl hw.model to find out what you have). If in doubt, check on Wikipedia ;) The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those. Not just those ... I'm plagued by the same problem on my Penryn-based MacBookPro4,1. In 64-bit mode, BLAS performance breaks down to single core levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores. I posted some benchmark results to this list a few weeks ago. Well, given that it is only a two-thread CPU there is not much you can gain so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while different story ;). For illustration, those are the timings from your benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz MacPro4,1: test R BLAS vecLib ATLAS MKL inner M %*% t(M) D 19.961 3.470 0.519 0.662 inner tcrossprod D 0.658 1.867 0.243 0.235 inner crossprod t(M) D 9.574 1.849 0.242 0.256 cosine normalised D 0.798 2.009 0.385 0.411 cosine general D 0.770 1.993 0.380 0.352 euclid() D 2.072 3.271 1.637 1.635 euclid() small D 0.515 0.821 0.421 0.395 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of magnitude. It's sad, because vecLib used to be fairly well optimized ... (in fact it is actually some version of ATLAS which is even more strange ...). My solution has also been to switch to the reference BLAS, which outperforms vecLib on most of the operations I benchmarked, except for crossprod(), which is terribly slow (more than 10x slower than tcrossprod()). I've just tested again with R 2.12.0, and the situation has become even worse: now an explicit matrix multiplication M %*% t(M) -- which used to be fast -- performs as poorly as crossprod(). Any ideas about this? The crossprod() slowdown isn't a Mac problem: I got similar results on a Pentium Dual Core laptop running Ubuntu. If this is a known problem of the reference BLAS, is there any way to work around it? Apart from the speed hiccups, in my benchmarks vecLib BLAS performed consistently slower than the reference BLAS. Is there evidence from other benchmarks / hardware architectures that vecLib can be faster? If not, perhaps the default should be _not_ to use vecLib on Mac? Or perhaps it would be possible to autodetect hardware in the R startup wrapper and select the BLAS that's known to run faster on this setup? I don't think we would want to do that since that would prevent the user from choosing the BLAS they want to use. We will probably abandon vecLib as the default for the next release (more due to its numerical instability issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose from in case they have a machine that can take advantage of it. Cheers, Simon ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based
On Oct 22, 2010, at 9:58 AM, Steve Lianoglou wrote: Hi, Although I should, I don't follow too closely in the places I should (I don't know where those places are) to know if Apple is aware of this vecLib breakdown issue ... are they? Where can we go to add our voices/votes for them to fix it? The way to go is to file a concise bug report (simply using let's say DGEMM timings) - but I did not have time to do so. If someone else wants to go ahead, let me know and I can pass on the bug # to our contacts at Apple. Apple still has their apple.com/science section, so I guess this should be (somehow) important to them ... maybe we can make an R-SIG-Mac 20-person-strong tidal wave to help force their hand? ;-) Good question - and it's not only R people that are appalled. Cheers, Simon On Thu, Oct 21, 2010 at 12:36 PM, Simon Urbanek simon.urba...@r-project.org wrote: On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote: On 21 Oct 2010, at 03:28, Simon Urbanek wrote: It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use sysctl hw.model to find out what you have). If in doubt, check on Wikipedia ;) The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those. Not just those ... I'm plagued by the same problem on my Penryn-based MacBookPro4,1. In 64-bit mode, BLAS performance breaks down to single core levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores. I posted some benchmark results to this list a few weeks ago. Well, given that it is only a two-thread CPU there is not much you can gain so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while different story ;). For illustration, those are the timings from your benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz MacPro4,1: testR BLAS vecLib ATLAS MKL inner M %*% t(M) D 19.961 3.470 0.519 0.662 inner tcrossprod D 0.658 1.867 0.243 0.235 inner crossprod t(M) D 9.574 1.849 0.242 0.256 cosine normalised D 0.798 2.009 0.385 0.411 cosine general D0.770 1.993 0.380 0.352 euclid() D 2.072 3.271 1.637 1.635 euclid() small D0.515 0.821 0.421 0.395 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of magnitude. It's sad, because vecLib used to be fairly well optimized ... (in fact it is actually some version of ATLAS which is even more strange ...). My solution has also been to switch to the reference BLAS, which outperforms vecLib on most of the operations I benchmarked, except for crossprod(), which is terribly slow (more than 10x slower than tcrossprod()). I've just tested again with R 2.12.0, and the situation has become even worse: now an explicit matrix multiplication M %*% t(M) -- which used to be fast -- performs as poorly as crossprod(). Any ideas about this? The crossprod() slowdown isn't a Mac problem: I got similar results on a Pentium Dual Core laptop running Ubuntu. If this is a known problem of the reference BLAS, is there any way to work around it? Apart from the speed hiccups, in my benchmarks vecLib BLAS performed consistently slower than the reference BLAS. Is there evidence from other benchmarks / hardware architectures that vecLib can be faster? If not, perhaps the default should be _not_ to use vecLib on Mac? Or perhaps it would be possible to autodetect hardware in the R startup wrapper and select the BLAS that's known to run faster on this setup? I don't think we would want to do that since that would prevent the user from choosing the BLAS they want to use. We will probably abandon vecLib as the default for the next release (more due to its numerical instability issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose from in case they have a machine that can take advantage of it. Cheers, Simon ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based
On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote: On 21 Oct 2010, at 03:28, Simon Urbanek wrote: It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use sysctl hw.model to find out what you have). If in doubt, check on Wikipedia ;) The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those. Not just those ... I'm plagued by the same problem on my Penryn-based MacBookPro4,1. In 64-bit mode, BLAS performance breaks down to single core levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores. I posted some benchmark results to this list a few weeks ago. Well, given that it is only a two-thread CPU there is not much you can gain so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while different story ;). For illustration, those are the timings from your benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz MacPro4,1: testR BLAS vecLib ATLAS MKL inner M %*% t(M) D 19.961 3.470 0.519 0.662 inner tcrossprod D 0.658 1.867 0.243 0.235 inner crossprod t(M) D 9.574 1.849 0.242 0.256 cosine normalised D 0.798 2.009 0.385 0.411 cosine general D0.770 1.993 0.380 0.352 euclid() D 2.072 3.271 1.637 1.635 euclid() small D0.515 0.821 0.421 0.395 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of magnitude. It's sad, because vecLib used to be fairly well optimized ... (in fact it is actually some version of ATLAS which is even more strange ...). My solution has also been to switch to the reference BLAS, which outperforms vecLib on most of the operations I benchmarked, except for crossprod(), which is terribly slow (more than 10x slower than tcrossprod()). I've just tested again with R 2.12.0, and the situation has become even worse: now an explicit matrix multiplication M %*% t(M) -- which used to be fast -- performs as poorly as crossprod(). Any ideas about this? The crossprod() slowdown isn't a Mac problem: I got similar results on a Pentium Dual Core laptop running Ubuntu. If this is a known problem of the reference BLAS, is there any way to work around it? Apart from the speed hiccups, in my benchmarks vecLib BLAS performed consistently slower than the reference BLAS. Is there evidence from other benchmarks / hardware architectures that vecLib can be faster? If not, perhaps the default should be _not_ to use vecLib on Mac? Or perhaps it would be possible to autodetect hardware in the R startup wrapper and select the BLAS that's known to run faster on this setup? I don't think we would want to do that since that would prevent the user from choosing the BLAS they want to use. We will probably abandon vecLib as the default for the next release (more due to its numerical instability issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose from in case they have a machine that can take advantage of it. Cheers, Simon ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
[R-SIG-Mac] How to determine if a Mac is Nehalem-based
Hi Mac Gurus, I have been searching the web trying to find out how to determine if my Mac is Nehalem-based. I have not been able to find any discussion about how to determine this, beyond something such as Macs built in early 2009 or similarly vague statements. Does anyone know where I can find a description of which Intel chips in a Mac constitute Nehalem-based and what R or Mac-OS scripts, commands etc will yield information on the CPU chip in my Mac so I can make this determination? From About this Mac I get the following information, but I'm unclear on what if anything in such output will reliably show whether my Mac is Nehalem-based. Hardware Overview: Model Name: Mac Pro Model Identifier: MacPro4,1 Processor Name: Quad-Core Intel Xeon Processor Speed: 2.66 GHz Number Of Processors: 2 Total Number Of Cores:8 L2 Cache (per core): 256 KB L3 Cache (per processor): 8 MB Memory: 32 GB Processor Interconnect Speed: 6.4 GT/s Boot ROM Version: MP41.0081.B08 SMC Version (system): 1.39f5 SMC Version (processor tray): 1.39f5 Serial Number (system): H00230C320H Serial Number (processor tray): C070183004XDCVHAX Hardware UUID:482F1E2A-2588-5FA3-80AE-11BE11615B02 Any information appreciated Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based
On Oct 20, 2010, at 5:55 PM, Steven McKinney wrote: Hi Mac Gurus, I have been searching the web trying to find out how to determine if my Mac is Nehalem-based. I have not been able to find any discussion about how to determine this, beyond something such as Macs built in early 2009 or similarly vague statements. It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use sysctl hw.model to find out what you have). If in doubt, check on Wikipedia ;) The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those. Cheers, Simon Does anyone know where I can find a description of which Intel chips in a Mac constitute Nehalem-based and what R or Mac-OS scripts, commands etc will yield information on the CPU chip in my Mac so I can make this determination? From About this Mac I get the following information, but I'm unclear on what if anything in such output will reliably show whether my Mac is Nehalem-based. Hardware Overview: Model Name: Mac Pro Model Identifier:MacPro4,1 Processor Name: Quad-Core Intel Xeon Processor Speed: 2.66 GHz Number Of Processors:2 Total Number Of Cores: 8 L2 Cache (per core): 256 KB L3 Cache (per processor):8 MB Memory: 32 GB Processor Interconnect Speed:6.4 GT/s Boot ROM Version:MP41.0081.B08 SMC Version (system):1.39f5 SMC Version (processor tray):1.39f5 Serial Number (system): H00230C320H Serial Number (processor tray): C070183004XDCVHAX Hardware UUID: 482F1E2A-2588-5FA3-80AE-11BE11615B02 Any information appreciated Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based
Wikipedia didn't tell me about sysctl hw.model, and I had to correct their misspelled Gainestown (Gainstown on Intel sites, and in log reports on my Mac). I prefer to check on Simon :) Thank you very much for your clear explanations. Steven McKinney From: Simon Urbanek [simon.urba...@r-project.org] Sent: October 20, 2010 6:28 PM To: Steven McKinney Cc: R-SIG-Mac@stat.math.ethz.ch Subject: Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based On Oct 20, 2010, at 5:55 PM, Steven McKinney wrote: Hi Mac Gurus, I have been searching the web trying to find out how to determine if my Mac is Nehalem-based. I have not been able to find any discussion about how to determine this, beyond something such as Macs built in early 2009 or similarly vague statements. It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use sysctl hw.model to find out what you have). If in doubt, check on Wikipedia ;) The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those. Cheers, Simon Does anyone know where I can find a description of which Intel chips in a Mac constitute Nehalem-based and what R or Mac-OS scripts, commands etc will yield information on the CPU chip in my Mac so I can make this determination? From About this Mac I get the following information, but I'm unclear on what if anything in such output will reliably show whether my Mac is Nehalem-based. Hardware Overview: Model Name: Mac Pro Model Identifier:MacPro4,1 Processor Name: Quad-Core Intel Xeon Processor Speed: 2.66 GHz Number Of Processors:2 Total Number Of Cores: 8 L2 Cache (per core): 256 KB L3 Cache (per processor):8 MB Memory: 32 GB Processor Interconnect Speed:6.4 GT/s Boot ROM Version:MP41.0081.B08 SMC Version (system):1.39f5 SMC Version (processor tray):1.39f5 Serial Number (system): H00230C320H Serial Number (processor tray): C070183004XDCVHAX Hardware UUID: 482F1E2A-2588-5FA3-80AE-11BE11615B02 Any information appreciated Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac