Re: [R-SIG-Mac] [R-sig-ME] lme4 missing from repositories?
Interesting. When using R 2.12.0 in 32-Bit mode, I always get deterministic behavior with the reference BLAS, but random behavior with veclib. With R 2.11.1, I always get random behavior in 32-Bit mode, regardless of what BLAS implementation I chose. Finally, with R in 64-Bit mode, behavior is always deterministic (for both R 2.11.1 and R 2.12.0). The following table summarizes the behavior (for details including sessionInfo() scroll down to end of post): R 2.11.1R 2.12.0 32-Bit 64-Bit 32-Bit 64-Bit vecLib randdet randdet RBLAS randdet det det All of this has been tested on a MacPro4,1 (Quad-Core Intel Xeon, 2.26 GHz) and the 2.12.0 behavior confirmed on a MacBookPro5,5 (Intel Core 2 Duo, 2.53 GHz), with lme4_0.999375-35 and Matrix_0.999375-44. Maybe there were two bugs, and one has been fixed in 2.12.0? Also, could it be that vecLib takes some numerical shortcuts that escalate? In that case, maybe the behavior should be brought to Apple's attention, using bugreport.apple.com. Finally, should the actual BLAS version used be included in sessionInfo()? Thanks, Jochen On Oct 21, 2010, at 19:15 , Marc Schwartz wrote: Interesting. No matter what I do here, I can't seem to get the test to fail using R's BLAS with clean 32 bit builds. So perhaps it is not just the BLAS, but a combination of R's BLAS and specific hardware?, which gets me into a realm of knowledge below the event horizon. Have there been any repeatable scenarios where vecLib can be used without failure on a particular Mac platform? Also, I just noted Simon's reply to a different thread on r-sig-mac to Stefan Evert, in which he notes that there may be a change in the default BLAS for OSX to vecLib in the next R release. Of course, now given Prof. Ripley's observations, it will be interesting to see the actual impact in the wild. Thanks, Marc On Oct 21, 2010, at 11:23 AM, Prof Brian Ripley wrote: Let me point out https://stat.ethz.ch/pipermail/r-sig-mac/2010-July/007608.html This is not just a BLAS issue: I saw it with both vecLib and the reference BLAS. The lme4 code is doing exactly the same calculation for M2. and M2, but sometimes when it does that calculation the first time in a session it gives a different answer. That makes it really hard to get a handle on, and easy to suppose one has a fix (been there a few times myself). On Thu, 21 Oct 2010, Marc Schwartz wrote: On Oct 21, 2010, at 8:47 AM, Federico Calboli wrote: Mark, To the extent that it may be helpful here and I can do more if need be, I built 32 bit R 2.12.0 patched on Snow Leopard (10.6.4), using the R BLAS rather than Apple's veclib. This is on an early 2009 17 MBP with a 2.93 Ghz Core 2 Duo (MacBookPro5,2) and 4Gb of RAM. Based upon Doug's comment in this thread that the issue may be related to the use of Apple's veclib BLAS, as opposed to R's reference BLAS, I ran some tests. My config includes: --without-blas --without-lapack just to be sure that the above is the correct invocation, based upon what I found online. Using this build, with all CRAN packages freshly installed using this build, I ran the example used here with lme4 0.999375-35. I get: library(lme4) y - (1:20)*pi; x - (1:20)^2;group - gl(2,10) M2. - lmer (y ~ 1 + x + (1 + x | group)) M2 - lmer (y ~ x + ( x | group)) identical(fixef(M2), fixef(M2.)) [1] TRUE I then created a function so that I could use replicate() to run this test a larger number of times: testlme4 - function() { y - (1:20)*pi; x - (1:20)^2;group - gl(2,10) M2. - lmer (y ~ 1 + x + (1 + x | group)) M2 - lmer (y ~ x + ( x | group)) identical(fixef(M2), fixef(M2.)) } RES - replicate(1000, testlme4()) all(RES) [1] TRUE table(RES) RES TRUE 1000 Does the example need to be run a very large number of times to be sure that it does not fail, or is the above a reasonable indication that the use of R's BLAS is a more appropriate default option for R on OSX? If I am not mistaken (and somebody correct me if wrong), R's BLAS is the default on Windows and Linux (from my recollections on Fedora). Why should OSX be different in that regard? Thanks for the very informative post. I added R-Mac in my reply to see if someone can come up with a response to your query. It would also be interesting to know if it were possible to switch the OSX R binary to use the R BLAS library. Also, as an aside to Federico, I use 32 bit R on OSX largely because I have to interact with an Oracle server via RODBC. The only ODBC drivers available for Oracle on OSX are 32 bit and they are not compatible with 64 bit R. It would be rather cumbersome when running reports (via Sweave) to first extract the data in 32 bit R and then switch to 64 bit R to run the reports. I can run it all in a single
Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based
Hi, Although I should, I don't follow too closely in the places I should (I don't know where those places are) to know if Apple is aware of this vecLib breakdown issue ... are they? Where can we go to add our voices/votes for them to fix it? Apple still has their apple.com/science section, so I guess this should be (somehow) important to them ... maybe we can make an R-SIG-Mac 20-person-strong tidal wave to help force their hand? ;-) -steve On Thu, Oct 21, 2010 at 12:36 PM, Simon Urbanek simon.urba...@r-project.org wrote: On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote: On 21 Oct 2010, at 03:28, Simon Urbanek wrote: It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use sysctl hw.model to find out what you have). If in doubt, check on Wikipedia ;) The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those. Not just those ... I'm plagued by the same problem on my Penryn-based MacBookPro4,1. In 64-bit mode, BLAS performance breaks down to single core levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores. I posted some benchmark results to this list a few weeks ago. Well, given that it is only a two-thread CPU there is not much you can gain so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while different story ;). For illustration, those are the timings from your benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz MacPro4,1: test R BLAS vecLib ATLAS MKL inner M %*% t(M) D 19.961 3.470 0.519 0.662 inner tcrossprod D 0.658 1.867 0.243 0.235 inner crossprod t(M) D 9.574 1.849 0.242 0.256 cosine normalised D 0.798 2.009 0.385 0.411 cosine general D 0.770 1.993 0.380 0.352 euclid() D 2.072 3.271 1.637 1.635 euclid() small D 0.515 0.821 0.421 0.395 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of magnitude. It's sad, because vecLib used to be fairly well optimized ... (in fact it is actually some version of ATLAS which is even more strange ...). My solution has also been to switch to the reference BLAS, which outperforms vecLib on most of the operations I benchmarked, except for crossprod(), which is terribly slow (more than 10x slower than tcrossprod()). I've just tested again with R 2.12.0, and the situation has become even worse: now an explicit matrix multiplication M %*% t(M) -- which used to be fast -- performs as poorly as crossprod(). Any ideas about this? The crossprod() slowdown isn't a Mac problem: I got similar results on a Pentium Dual Core laptop running Ubuntu. If this is a known problem of the reference BLAS, is there any way to work around it? Apart from the speed hiccups, in my benchmarks vecLib BLAS performed consistently slower than the reference BLAS. Is there evidence from other benchmarks / hardware architectures that vecLib can be faster? If not, perhaps the default should be _not_ to use vecLib on Mac? Or perhaps it would be possible to autodetect hardware in the R startup wrapper and select the BLAS that's known to run faster on this setup? I don't think we would want to do that since that would prevent the user from choosing the BLAS they want to use. We will probably abandon vecLib as the default for the next release (more due to its numerical instability issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose from in case they have a machine that can take advantage of it. Cheers, Simon ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based
On Oct 22, 2010, at 9:58 AM, Steve Lianoglou wrote: Hi, Although I should, I don't follow too closely in the places I should (I don't know where those places are) to know if Apple is aware of this vecLib breakdown issue ... are they? Where can we go to add our voices/votes for them to fix it? The way to go is to file a concise bug report (simply using let's say DGEMM timings) - but I did not have time to do so. If someone else wants to go ahead, let me know and I can pass on the bug # to our contacts at Apple. Apple still has their apple.com/science section, so I guess this should be (somehow) important to them ... maybe we can make an R-SIG-Mac 20-person-strong tidal wave to help force their hand? ;-) Good question - and it's not only R people that are appalled. Cheers, Simon On Thu, Oct 21, 2010 at 12:36 PM, Simon Urbanek simon.urba...@r-project.org wrote: On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote: On 21 Oct 2010, at 03:28, Simon Urbanek wrote: It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use sysctl hw.model to find out what you have). If in doubt, check on Wikipedia ;) The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those. Not just those ... I'm plagued by the same problem on my Penryn-based MacBookPro4,1. In 64-bit mode, BLAS performance breaks down to single core levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores. I posted some benchmark results to this list a few weeks ago. Well, given that it is only a two-thread CPU there is not much you can gain so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while different story ;). For illustration, those are the timings from your benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz MacPro4,1: testR BLAS vecLib ATLAS MKL inner M %*% t(M) D 19.961 3.470 0.519 0.662 inner tcrossprod D 0.658 1.867 0.243 0.235 inner crossprod t(M) D 9.574 1.849 0.242 0.256 cosine normalised D 0.798 2.009 0.385 0.411 cosine general D0.770 1.993 0.380 0.352 euclid() D 2.072 3.271 1.637 1.635 euclid() small D0.515 0.821 0.421 0.395 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of magnitude. It's sad, because vecLib used to be fairly well optimized ... (in fact it is actually some version of ATLAS which is even more strange ...). My solution has also been to switch to the reference BLAS, which outperforms vecLib on most of the operations I benchmarked, except for crossprod(), which is terribly slow (more than 10x slower than tcrossprod()). I've just tested again with R 2.12.0, and the situation has become even worse: now an explicit matrix multiplication M %*% t(M) -- which used to be fast -- performs as poorly as crossprod(). Any ideas about this? The crossprod() slowdown isn't a Mac problem: I got similar results on a Pentium Dual Core laptop running Ubuntu. If this is a known problem of the reference BLAS, is there any way to work around it? Apart from the speed hiccups, in my benchmarks vecLib BLAS performed consistently slower than the reference BLAS. Is there evidence from other benchmarks / hardware architectures that vecLib can be faster? If not, perhaps the default should be _not_ to use vecLib on Mac? Or perhaps it would be possible to autodetect hardware in the R startup wrapper and select the BLAS that's known to run faster on this setup? I don't think we would want to do that since that would prevent the user from choosing the BLAS they want to use. We will probably abandon vecLib as the default for the next release (more due to its numerical instability issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose from in case they have a machine that can take advantage of it. Cheers, Simon ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ___ R-SIG-Mac mailing list R-SIG-Mac@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac