>>>>> Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no> >>>>> on Tue, 24 Mar 2009 00:39:58 +0100 writes:
> (this post suggests a patch to the sources, so i allow myself to divert > it to r-devel) > Bert Gunter wrote: >> x a numeric vector, matrix or data frame. >> y NULL (default) or a vector, matrix or data frame with compatible >> dimensions to x. The default is equivalent to y = x (but more efficient). >> >> > bert points to an interesting fragment of ?var: it suggests that > computing var(x) is more efficient than computing var(x,x), for any x > valid as input to var. indeed: > set.seed(0) > x = matrix(rnorm(10000), 100, 100) > library(rbenchmark) > benchmark(replications=1000, columns=c('test', 'elapsed'), > var(x), > var(x, x)) > # test elapsed > # 1 var(x) 1.091 > # 2 var(x, x) 2.051 > that's of course, so to speak, unreasonable: for what var(x) does is > actually computing the covariance of x and x, which should be the same > as var(x,x). > the hack is that if y is given, there's an overhead of memory allocation > for *both* x and y when y is given, as seen in src/main/cov.c:720+. > incidentally, it seems that the problem can be solved with a trivial fix > (see the attached patch), so that > set.seed(0) > x = matrix(rnorm(10000), 100, 100) > library(rbenchmark) > benchmark(replications=1000, columns=c('test', 'elapsed'), > var(x), > var(x, x)) > # test elapsed > # 1 var(x) 1.121 > # 2 var(x, x) 1.107 > with the quick checks > all.equal(var(x), var(x, x)) > # TRUE > all(var(x) == var(x, x)) > # TRUE > and for cor it seems to make cor(x,x) slightly faster than cor(x), while > originally it was twice slower: > # original > benchmark(replications=1000, columns=c('test', 'elapsed'), > cor(x), > cor(x, x)) > # test elapsed > # 1 cor(x) 1.196 > # 2 cor(x, x) 2.253 > # patched > benchmark(replications=1000, columns=c('test', 'elapsed'), > cor(x), > cor(x, x)) > # test elapsed > # 1 cor(x) 1.207 > # 2 cor(x, x) 1.204 > (there is a visible penalty due to an additional pointer test, but it's > 10ms on 1000 replications with 10000 data points, which i think is > negligible.) >> This is as clear as I would know how to state. > i believe bert is right. > however, with the above fix, this can now be rewritten as: > " > x: a numeric vector, matrix or data frame. > y: a vector, matrix or data frame with dimensions compatible to those of x. > By default, y = x. > " > which, to my simple mind, is even more clear than what bert would know > how to state, and less likely to cause the sort of confusion that > originated this thread. Your patch is basically only affecting the default method = "pearson". For (most) other cases, 'y = NULL' would still remain *the* way to save computations, unless we'd start to use an R-level equivalent [which I think does not exist] of your C trick (DATAPTR(x) == DATAPTR(y)). Also, for S- and R- backcompatibility reasons, we'd need to continue allowing y = NULL (as your patch would, too), so currently I think this whole idea -- as slick as it is, I learned something! -- does not make sense applying here. > the attached patch suggests modifications to src/main/cov.c and > src/library/stats/man/cor.Rd. BTW: since you didn't (and shouldn't , because of method != "pearson" !) change the R code, the docs \usage{.} part should not have been changed either ! and as I mentioned: using 'y = NULL' in the function call must continue to work, hence should also be documented as possibility ==> the docs would not really become more clear, I think Martin Maechler, ETH Zurich > it has been prepared and checked as follows: > svn co https://svn.r-project.org/R/trunk trunk > cd trunk > # edited the sources > svn diff > cov.diff > svn revert -R src > patch -p0 < cov.diff > tools/rsync-recommended > ./configure > make > make check > bin/R > # subsequent testing within R > if you happen to consider this patch for a commit, please be sure to > examine and test it carefully first. > vQ > Content-Type: text/x-diff; name="cov.diff" > Content-Disposition: inline; filename="cov.diff" > Content-ID: <18899.7024.520234.153...@lynne.math.ethz.ch> > Content-Transfer-Encoding: binary > [Deleted text/x-diff] > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel