Re: [R-SIG-Mac] [R-sig-ME] lme4 missing from repositories?

2010-10-22 Thread jochen laubrock
Interesting. 

When using R 2.12.0 in 32-Bit mode, I always get deterministic behavior with 
the reference BLAS, but random behavior with veclib. With R 2.11.1, I always 
get random behavior in 32-Bit mode, regardless of what BLAS implementation I 
chose. Finally, with R in 64-Bit mode, behavior is always deterministic (for 
both R 2.11.1 and R 2.12.0).

The following table summarizes the behavior (for details including 
sessionInfo() scroll down to end of post):

R 2.11.1R 2.12.0
32-Bit  64-Bit  32-Bit  64-Bit
vecLib  randdet randdet
RBLAS   randdet det det

All of this has been tested on a MacPro4,1 (Quad-Core Intel Xeon, 2.26 GHz) and 
the 2.12.0 behavior confirmed on a MacBookPro5,5 (Intel Core 2 Duo, 2.53 GHz), 
with lme4_0.999375-35 and Matrix_0.999375-44.

Maybe there were two bugs, and one has been fixed in 2.12.0? Also, could it be 
that vecLib takes some numerical shortcuts that escalate? In that case, maybe 
the behavior should be brought to Apple's attention, using bugreport.apple.com. 
Finally, should the actual BLAS version used be included in sessionInfo()?

Thanks,
Jochen



On Oct 21, 2010, at 19:15 , Marc Schwartz wrote:

 Interesting. No matter what I do here, I can't seem to get the test to fail 
 using R's BLAS with clean 32 bit builds. So perhaps it is not just the BLAS, 
 but a combination of R's BLAS and specific hardware?, which gets me into a 
 realm of knowledge below the event horizon. 
 
 Have there been any repeatable scenarios where vecLib can be used without 
 failure on a particular Mac platform?
 
 Also, I just noted Simon's reply to a different thread on r-sig-mac to Stefan 
 Evert, in which he notes that there may be a change in the default BLAS for 
 OSX to vecLib in the next R release. Of course, now given Prof. Ripley's 
 observations, it will be interesting to see the actual impact in the wild.
 
 Thanks,
 
 Marc
 
 On Oct 21, 2010, at 11:23 AM, Prof Brian Ripley wrote:
 
 Let me point out 
 https://stat.ethz.ch/pipermail/r-sig-mac/2010-July/007608.html
 
 This is not just a BLAS issue: I saw it with both vecLib and the reference 
 BLAS.
 
 The lme4 code is doing exactly the same calculation for M2. and M2, but 
 sometimes when it does that calculation the first time in a session it gives 
 a different answer.  That makes it really hard to get a handle on, and easy 
 to suppose one has a fix (been there a few times myself).
 
 
 On Thu, 21 Oct 2010, Marc Schwartz wrote:
 
 
 On Oct 21, 2010, at 8:47 AM, Federico Calboli wrote:
 
 Mark,
 
 To the extent that it may be helpful here and I can do more if need be, I 
 built 32 bit R 2.12.0 patched on Snow Leopard (10.6.4), using the R BLAS 
 rather than Apple's veclib. This is on an early 2009 17 MBP with a 2.93 
 Ghz Core 2 Duo (MacBookPro5,2) and 4Gb of RAM.
 
 Based upon Doug's comment in this thread that the issue may be related to 
 the use of Apple's veclib BLAS, as opposed to R's reference BLAS, I ran 
 some tests.
 
 My config includes:
 
 --without-blas --without-lapack
 
 just to be sure that the above is the correct invocation, based upon what 
 I found online.
 
 Using this build, with all CRAN packages freshly installed using this 
 build, I ran the example used here with lme4 0.999375-35. I get:
 
 library(lme4)
 y - (1:20)*pi; x - (1:20)^2;group - gl(2,10)
 M2. - lmer (y ~ 1 + x + (1 + x | group))
 M2 - lmer (y ~ x + ( x | group))
 
 identical(fixef(M2), fixef(M2.))
 [1] TRUE
 
 
 
 I then created a function so that I could use replicate() to run this 
 test a larger number of times:
 
 testlme4 - function()
 {
 y - (1:20)*pi; x - (1:20)^2;group - gl(2,10)
 M2. - lmer (y ~ 1 + x + (1 + x | group))
 M2 - lmer (y ~ x + ( x | group))
 identical(fixef(M2), fixef(M2.))
 }
 
 
 RES - replicate(1000, testlme4())
 
 all(RES)
 [1] TRUE
 
 table(RES)
 RES
 TRUE
 1000
 
 Does the example need to be run a very large number of times to be sure 
 that it does not fail, or is the above a reasonable indication that the 
 use of R's BLAS is a more appropriate default option for R on OSX?  If I 
 am not mistaken (and somebody correct me if wrong), R's BLAS is the 
 default on Windows and Linux (from my recollections on Fedora). Why 
 should OSX be different in that regard?
 
 Thanks for the very informative post. I added R-Mac in my reply to see if 
 someone can come up with a response to your query. It would also be 
 interesting to know if it were possible to switch the OSX R binary to use 
 the R BLAS library.
 
 Also, as an aside to Federico, I use 32 bit R on OSX largely because I 
 have to interact with an Oracle server via RODBC. The only ODBC drivers 
 available for Oracle on OSX are 32 bit and they are not compatible with 
 64 bit R. It would be rather cumbersome when running reports (via Sweave) 
 to first extract the data in 32 bit R and then switch to 64 bit R to run 
 the reports. I can run it all in a single 

Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based

2010-10-22 Thread Steve Lianoglou
Hi,

Although I should, I don't follow too closely in the places I should
(I don't know where those places are) to know if Apple is aware of
this vecLib breakdown issue ... are they? Where can we go to add our
voices/votes for them to fix it?

Apple still has their apple.com/science section, so I guess this
should be (somehow) important to them ... maybe we can make an
R-SIG-Mac 20-person-strong tidal wave to help force their hand? ;-)

-steve


On Thu, Oct 21, 2010 at 12:36 PM, Simon Urbanek
simon.urba...@r-project.org wrote:
 On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote:


 On 21 Oct 2010, at 03:28, Simon Urbanek wrote:

 It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use 
 sysctl hw.model to find out what you have). If in doubt, check on 
 Wikipedia ;)

 The latter uses the Nehalem architecture but I don't have a specimen of 
 those so I can't confirm that the bug still holds true for those.

 Not just those ... I'm plagued by the same problem on my Penryn-based 
 MacBookPro4,1.  In 64-bit mode, BLAS performance breaks down to single core 
 levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores.  I 
 posted some benchmark results to this list a few weeks ago.


 Well, given that it is only a two-thread CPU there is not much you can gain 
 so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while 
 different story ;). For illustration, those are the timings from your 
 benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 2.66GHz 
 MacPro4,1:

 test                    R BLAS  vecLib  ATLAS   MKL
 inner M %*% t(M) D      19.961  3.470   0.519   0.662
 inner tcrossprod D      0.658   1.867   0.243   0.235
 inner crossprod t(M) D  9.574   1.849   0.242   0.256
 cosine normalised D     0.798   2.009   0.385   0.411
 cosine general D        0.770   1.993   0.380   0.352
 euclid() D              2.072   3.271   1.637   1.635
 euclid() small D        0.515   0.821   0.421   0.395

 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of 
 magnitude. It's sad, because vecLib used to be fairly well optimized ... (in 
 fact it is actually some version of ATLAS which is even more strange ...).


 My solution has also been to switch to the reference BLAS, which outperforms 
 vecLib on most of the operations I benchmarked, except for crossprod(), 
 which is terribly slow (more than 10x slower than tcrossprod()).  I've just 
 tested again with R 2.12.0, and the situation has become even worse: now an 
 explicit matrix multiplication M %*% t(M) -- which used to be fast -- 
 performs as poorly as crossprod().

 Any ideas about this?  The crossprod() slowdown isn't a Mac problem: I got 
 similar results on a Pentium Dual Core laptop running Ubuntu.  If this is a 
 known problem of the reference BLAS, is there any way to work around it?

 Apart from the speed hiccups, in my benchmarks vecLib BLAS performed 
 consistently slower than the reference BLAS.  Is there evidence from other 
 benchmarks / hardware architectures that vecLib can be faster?  If not, 
 perhaps the default should be _not_ to use vecLib on Mac?  Or perhaps it 
 would be possible to autodetect hardware in the R startup wrapper and select 
 the BLAS that's known to run faster on this setup?


 I don't think we would want to do that since that would prevent the user from 
 choosing the BLAS they want to use. We will probably abandon vecLib as the 
 default for the next release (more due to its numerical instability issues) 
 and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to 
 choose from in case they have a machine that can take advantage of it.

 Cheers,
 Simon

 ___
 R-SIG-Mac mailing list
 R-SIG-Mac@stat.math.ethz.ch
 https://stat.ethz.ch/mailman/listinfo/r-sig-mac




-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

___
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac


Re: [R-SIG-Mac] How to determine if a Mac is Nehalem-based

2010-10-22 Thread Simon Urbanek

On Oct 22, 2010, at 9:58 AM, Steve Lianoglou wrote:

 Hi,
 
 Although I should, I don't follow too closely in the places I should
 (I don't know where those places are) to know if Apple is aware of
 this vecLib breakdown issue ... are they? Where can we go to add our
 voices/votes for them to fix it?
 

The way to go is to file a concise bug report (simply using let's say DGEMM 
timings) - but I did not have time to do so. If someone else wants to go ahead, 
let me know and I can pass on the bug # to our contacts at Apple.


 Apple still has their apple.com/science section, so I guess this
 should be (somehow) important to them ... maybe we can make an
 R-SIG-Mac 20-person-strong tidal wave to help force their hand? ;-)
 

Good question - and it's not only R people that are appalled.

Cheers,
Simon


 
 
 On Thu, Oct 21, 2010 at 12:36 PM, Simon Urbanek
 simon.urba...@r-project.org wrote:
 On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote:
 
 
 On 21 Oct 2010, at 03:28, Simon Urbanek wrote:
 
 It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use 
 use sysctl hw.model to find out what you have). If in doubt, check on 
 Wikipedia ;)
 
 The latter uses the Nehalem architecture but I don't have a specimen of 
 those so I can't confirm that the bug still holds true for those.
 
 Not just those ... I'm plagued by the same problem on my Penryn-based 
 MacBookPro4,1.  In 64-bit mode, BLAS performance breaks down to single core 
 levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores.  I 
 posted some benchmark results to this list a few weeks ago.
 
 
 Well, given that it is only a two-thread CPU there is not much you can gain 
 so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while 
 different story ;). For illustration, those are the timings from your 
 benchmarks (only those that use BLAS) for 64-bit R 2.1...@10.6.4 on a 
 2.66GHz MacPro4,1:
 
 testR BLAS  vecLib  ATLAS   MKL
 inner M %*% t(M) D  19.961  3.470   0.519   0.662
 inner tcrossprod D  0.658   1.867   0.243   0.235
 inner crossprod t(M) D  9.574   1.849   0.242   0.256
 cosine normalised D 0.798   2.009   0.385   0.411
 cosine general D0.770   1.993   0.380   0.352
 euclid() D  2.072   3.271   1.637   1.635
 euclid() small D0.515   0.821   0.421   0.395
 
 As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order 
 of magnitude. It's sad, because vecLib used to be fairly well optimized ... 
 (in fact it is actually some version of ATLAS which is even more strange 
 ...).
 
 
 My solution has also been to switch to the reference BLAS, which 
 outperforms vecLib on most of the operations I benchmarked, except for 
 crossprod(), which is terribly slow (more than 10x slower than 
 tcrossprod()).  I've just tested again with R 2.12.0, and the situation has 
 become even worse: now an explicit matrix multiplication M %*% t(M) -- 
 which used to be fast -- performs as poorly as crossprod().
 
 Any ideas about this?  The crossprod() slowdown isn't a Mac problem: I got 
 similar results on a Pentium Dual Core laptop running Ubuntu.  If this is a 
 known problem of the reference BLAS, is there any way to work around it?
 
 Apart from the speed hiccups, in my benchmarks vecLib BLAS performed 
 consistently slower than the reference BLAS.  Is there evidence from other 
 benchmarks / hardware architectures that vecLib can be faster?  If not, 
 perhaps the default should be _not_ to use vecLib on Mac?  Or perhaps it 
 would be possible to autodetect hardware in the R startup wrapper and 
 select the BLAS that's known to run faster on this setup?
 
 
 I don't think we would want to do that since that would prevent the user 
 from choosing the BLAS they want to use. We will probably abandon vecLib as 
 the default for the next release (more due to its numerical instability 
 issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the 
 user to choose from in case they have a machine that can take advantage of 
 it.
 
 Cheers,
 Simon
 
 ___
 R-SIG-Mac mailing list
 R-SIG-Mac@stat.math.ethz.ch
 https://stat.ethz.ch/mailman/listinfo/r-sig-mac
 
 
 
 
 -- 
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact
 
 

___
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac