Oops, I was thinking backwards. This sort of hack could avoid the Fortran aliasing rules, not run afoul of them.
Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > -----Original Message----- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of William Dunlap > Sent: Monday, March 23, 2009 6:18 PM > To: Wacek Kusnierczyk; r-devel@r-project.org > Subject: Re: [Rd] [R] variance/mean > > Doesn't Fortran still require that the arguments to > a function not alias each other (in whole or in part)? > I could imagine that var() might call into Fortran code > (BLAS or LAPACK). Wouldn you want to chance erroneous > results at a high optimization level to save a bit of > time in an unusual situation? > > (I could also imagine someone changing the R interpreter > so that x and x[-length(x)] could share the same memory > block and that could cause Fortran aliasing problems as > well.) > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > > -----Original Message----- > > From: r-devel-boun...@r-project.org > > [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek > Kusnierczyk > > Sent: Monday, March 23, 2009 4:40 PM > > To: r-devel@r-project.org > > Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter > > Subject: Re: [Rd] [R] variance/mean > > > > > > (this post suggests a patch to the sources, so i allow myself > > to divert > > it to r-devel) > > > > Bert Gunter wrote: > > > x a numeric vector, matrix or data frame. > > > y NULL (default) or a vector, matrix or data frame with compatible > > > dimensions to x. The default is equivalent to y = x (but > > more efficient). > > > > > > > > bert points to an interesting fragment of ?var: it suggests that > > computing var(x) is more efficient than computing var(x,x), > for any x > > valid as input to var. indeed: > > > > set.seed(0) > > x = matrix(rnorm(10000), 100, 100) > > > > library(rbenchmark) > > benchmark(replications=1000, columns=c('test', 'elapsed'), > > var(x), > > var(x, x)) > > # test elapsed > > # 1 var(x) 1.091 > > # 2 var(x, x) 2.051 > > > > that's of course, so to speak, unreasonable: for what > var(x) does is > > actually computing the covariance of x and x, which should > be the same > > as var(x,x). > > > > the hack is that if y is given, there's an overhead of memory > > allocation > > for *both* x and y when y is given, as seen in src/main/cov.c:720+. > > incidentally, it seems that the problem can be solved with a > > trivial fix > > (see the attached patch), so that > > > > set.seed(0) > > x = matrix(rnorm(10000), 100, 100) > > > > library(rbenchmark) > > benchmark(replications=1000, columns=c('test', 'elapsed'), > > var(x), > > var(x, x)) > > # test elapsed > > # 1 var(x) 1.121 > > # 2 var(x, x) 1.107 > > > > with the quick checks > > > > all.equal(var(x), var(x, x)) > > # TRUE > > > > all(var(x) == var(x, x)) > > # TRUE > > > > and for cor it seems to make cor(x,x) slightly faster than > > cor(x), while > > originally it was twice slower: > > > > # original > > benchmark(replications=1000, columns=c('test', 'elapsed'), > > cor(x), > > cor(x, x)) > > # test elapsed > > # 1 cor(x) 1.196 > > # 2 cor(x, x) 2.253 > > > > # patched > > benchmark(replications=1000, columns=c('test', 'elapsed'), > > cor(x), > > cor(x, x)) > > # test elapsed > > # 1 cor(x) 1.207 > > # 2 cor(x, x) 1.204 > > > > (there is a visible penalty due to an additional pointer > > test, but it's > > 10ms on 1000 replications with 10000 data points, which i think is > > negligible.) > > > > > This is as clear as I would know how to state. > > > > i believe bert is right. > > > > however, with the above fix, this can now be rewritten as: > > > > " > > x: a numeric vector, matrix or data frame. > > y: a vector, matrix or data frame with dimensions compatible > > to those of x. > > By default, y = x. > > " > > > > which, to my simple mind, is even more clear than what bert > would know > > how to state, and less likely to cause the sort of confusion that > > originated this thread. > > > > the attached patch suggests modifications to src/main/cov.c and > > src/library/stats/man/cor.Rd. > > it has been prepared and checked as follows: > > > > svn co https://svn.r-project.org/R/trunk trunk > > cd trunk > > # edited the sources > > svn diff > cov.diff > > svn revert -R src > > patch -p0 < cov.diff > > > > tools/rsync-recommended > > ./configure > > make > > make check > > bin/R > > # subsequent testing within R > > > > if you happen to consider this patch for a commit, please be sure to > > examine and test it carefully first. > > > > vQ > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel