I have been timing a particular model fit using lmer on several different computers and came up with a peculiar result - the model fit is considerably slower on a dual-core Athlon 64 using Goto's multithreaded BLAS than on a single-core processor.
Here is the timing on a single-core Athlon 64 3000+ running under today's R-devel with version 0.995-5 of the Matrix package. > library(Matrix) > data(star, package = 'mlmRev') > system.time(fm1 <- lmer(math~gr+sx+eth+cltype+(yrs|id)+(1|tch)+(yrs|sch), > star, control = list(nit=0,grad=0,msV=1))) 0 241720.: 1.16440 0.335239 0.00000 1.78732 0.867209 0.382318 0.00000 1 239722.: 1.94952 5.00000e-10 0.00933767 1.65999 0.858003 0.341520 0.00908757 2 239580.: 1.95924 0.0884059 0.00933767 1.65308 0.857487 0.339296 0.00954718 3 239215.: 2.60877 0.0765848 0.0177699 1.45739 0.843562 0.275100 0.0236849 4 239204.: 2.62582 0.106670 0.0239698 1.41976 0.841086 0.261033 0.0267073 5 239176.: 2.63149 0.0787924 0.0367185 1.37952 0.838527 0.245076 0.0301134 6 239141.: 2.64949 0.108534 0.0594586 1.28846 0.832543 0.208404 0.0375456 7 239049.: 2.64794 0.0789214 0.121782 1.10436 0.819711 0.126101 0.0524965 8 239004.: 2.66044 0.117957 0.181505 0.932202 0.798982 0.0718116 0.0628958 9 238944.: 2.66310 0.0819653 0.334477 0.631735 0.740855 0.258359 0.0806590 10 238893.: 2.72626 0.0975205 0.653432 0.703912 0.666067 0.109922 0.201809 11 238892.: 2.74381 0.111146 0.666155 0.693544 0.662000 0.104060 0.207591 12 238888.: 2.75052 0.0990238 0.689199 0.694588 0.655781 0.106516 0.216460 13 238861.: 2.80325 0.126935 1.05912 0.733914 0.556162 0.159296 0.360938 14 238832.: 2.82656 0.117617 1.59471 0.607916 0.441371 0.0749944 0.976142 15 238811.: 2.87350 0.136332 1.59046 0.653141 0.353763 0.226061 1.79285 16 238810.: 2.87663 0.125135 1.58992 0.656808 0.352605 0.220488 1.79282 17 238806.: 2.89342 0.141551 1.58607 0.676523 0.344212 0.181833 1.79268 18 238804.: 2.90080 0.125137 1.56624 0.682921 0.261295 0.180598 1.74325 19 238802.: 2.91950 0.128569 1.56836 0.680436 0.336051 0.159940 1.80400 20 238801.: 2.92795 0.134762 1.56597 0.685121 0.331695 0.145547 1.80414 21 238801.: 2.93741 0.125667 1.56139 0.687827 0.332700 0.138854 1.81495 22 238800.: 2.94588 0.131757 1.55294 0.687909 0.330414 0.137834 1.82826 23 238799.: 2.96867 0.129716 1.52943 0.688678 0.323171 0.139912 1.84615 24 238799.: 2.98994 0.133378 1.52188 0.700038 0.337387 0.131403 1.77623 25 238799.: 3.00312 0.135308 1.51475 0.697550 0.311750 0.145683 1.78037 26 238799.: 3.00461 0.129920 1.51083 0.697666 0.306722 0.138745 1.81188 27 238799.: 3.00504 0.134882 1.50539 0.696745 0.302949 0.135897 1.84405 28 238799.: 3.00422 0.134013 1.47947 0.698115 0.303243 0.133806 1.86486 29 238799.: 3.00819 0.134378 1.48185 0.701871 0.307097 0.134637 1.84996 30 238799.: 3.01313 0.134279 1.49098 0.702883 0.304788 0.133682 1.86254 31 238799.: 3.01291 0.134253 1.49060 0.701818 0.303155 0.133771 1.84613 32 238799.: 3.01142 0.134314 1.48921 0.701782 0.303589 0.134439 1.84653 33 238799.: 3.01174 0.134315 1.48926 0.701641 0.304120 0.134145 1.84635 34 238799.: 3.01175 0.134304 1.48942 0.701740 0.303762 0.134185 1.84649 35 238799.: 3.01173 0.134307 1.48937 0.701724 0.303809 0.134206 1.84647 [1] 43.10 3.78 48.41 0.00 0.00 (If you run the timing yourself and don't want to see the iteration output, take the msV=1 out of the control list. I keep it in there so I can monitor the progress.) If I time the same model fit on a dual-core Athlon 64 X2 3800+ with the same version of R, BLAS and Matrix package, the timing ends up with something like 90 140 235 0 0 I do see that the multithreading is working on a calculation that is essentially BLAS-bound such as > mm <- Matrix(rnorm(1e6), nc = 1e3) > system.time(crossprod(mm)) [1] 0.57 0.02 0.60 0.00 0.00 On the X2 processor it still takes about 0.6 seconds user time but only 0.3 seconds elapsed time when the machine is otherwise idle and both cores are available for the calculation. Any suggestions why the dual-core processor is so much slower than the single core processor? By the way, I would be interested in accumulating timings of this model fit on other systems. If you do time it please send me (off-list) a summary of the version of R, version of the accelerated BLAS if you use them, processor speed and configuration (i.e. multiprocessor, multicore, etc.) and, if you know it, memory speed. This is an example of a complex multilevel model with crossed grouping factors fit to a relatively large (30000 observations on 10000 students, 1400 teachers and 80 schools) data set. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel