Luke - Thanks for your comments on the speed patches I submitted. I'm glad you like patch-transpose, patch-for, patch-evalList, and patch-vec-arith. I'll be interested to hear what you or other people think about patch-dollar and patch-vec-subset after you've had more time to consider them. (Recall I later posted a split of patch-vecsubset into patch-vec-subset and a new patch-subscript, fixing in patch-subscript a bug in my original combined patch.)
Regarding patch-square and patch-sum-prod, you make different changes that address the largest inefficiencies, but I'd like to do some runs comparing your version with mine to see how big the remaining differences are. This is complicated by the fact that your changes to sum and prod seem to encounter a bug in the C compiler I'm using (gcc 4.2.4 for i486-linux-gnu) at optimization level -O2 (the default for the R configuration), the effect of which is for sum to deliver the right answer, but just as slowly as before. This doesn't happen with -O3. I'll investigate this further and report the conclusion. Similarly, I'll do some more timing tests regarding patch-protect, patch-fast-base, patch-fast-spec, and patch-save-alloc, and then comment further on the gains that they produce. Regarding patch-matprod, the issue of what BLAS routines do with NaN and NA seems like it's one that needs to be resolved, preferably in a way that doesn't slow down vector dot products by a factor of six. However, I don't know what actual problem reports motivated the current costly check for NAs. This all interacts with the extreme slowness on some machines of arithmetic on LDOUBLEs, which also seems like it needs some resolution. It's not clear to me what the expectations regarding accuracy of functions like sum should be. One could certainly argue that users would expect the same accuracy as adding them up with "+", and no huge slowdown from trying to get better accuracy. But maybe there's some history here, or packages that depend on increased accuracy (though of course there's no guarantee that a C "long double" will actually be bigger than a "double".) Regarding patch-parens, I don't understand your reluctance to incorporate this two-line code change. According to my timing tests (medians of five runs), it speeds up the test-parens.r script by 4.5%. (Recall this is "for (i in 1:n) d <- (a+1)/(a*(b+c))".) This is not a huge amount, but of course the speedup for the actual parenthesis operator is greater, since there is other overhead in this test. It would be even better to make all BUILTIN operators faster (which my patch-evalList does), but there are limits to how much is possible. The fact that "(" is conceptually a BUILTIN rather than a SPECIAL doesn't seem relevant to me. "{" is also conceptually a BUILTIN. Both are "special" from the point of view of the users, few of whom will even imagine that one could call "(" and "{" as ordinary functions. Even if they can imagine this, it makes no difference, since the patch has no user-visible effects. If one is worried that someone reading the code in src/main/eval.c might become confused about the semantics of parentheses, a two-line comment saying "Parens are conceptually a BUILTIN but are implemented as a SPECIAL to bypass the overhead of creating an evaluated argument list" would avoid any problem. As an analogy, consider a C compiler generating code for "2*x". One could argue that multiplication is conceptually repeated addition, so converting this to "x+x" is OK, but converting it to "x<<1" would be wrong. But I think no one would argue this. Rather, everyone would expect the compiler to choose between "x+x" and "x<<1" purely on the basis of which one is faster. Radford ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel