On Fri, Jun 5, 2009 at 4:09 AM, Allan Engelhardt <[email protected]> wrote:
> I'm all done now. The "max2" version below is what I went with in the end
> for my proposed change to caret::nearZeroVar (which used the "sort" method).
> Max Kuhn will make it available on CRAN soon. It speeds up that routine by
> a factor 2-5 on my test cases and uses much less memory.
You can save a little in max2 like this:
max2a = {w<-which.max(x); x[w]/max(x[-w], na.rm=TRUE);}
If you don't need to handle NA's (or if you know a priori how many there
are), you can also speed up part:
parta = {sel <- length(x)+c(-1,0); a<-sort.int(x, partial=sel,
na.last=NA)[2:1]; a[1]/a[2];}
which becomes about as fast as max2.
> library("rbenchmark")
set.seed(1); x <- runif(1e7, max=1e8);
benchmark(
replications=20,
columns=c("test","elapsed"),
order="elapsed"
, sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2];
a[1]/a[2];}
, qsrt = {a<-sort(x, decreasing=TRUE, na.last=NA, method="quick")[1:2];
a[1]/a[2];}
, part = {a<-sort.int(-x, partial=1:2, na.last=NA)[1:2];
a[1]/a[2];}
, parta = {end<-length(x)+c(-1,0);
a<-sort.int(x, partial=end, na.last=FALSE)[end];
a[1]/a[2]; }
, max1 = {m<-max(x, na.rm=TRUE);
w<-which(x==m)[1];
m/max(x[-w],na.rm=TRUE);}
, max2 = {w<-which.max(x);
max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
, max2a = {w<-which.max(x);
x[w]/max(x[-w], na.rm=TRUE);}
)
test elapsed
7 max2a 7.80
6 max2 8.94
4 parta 9.05
3 part 10.72
5 max1 20.21
2 qsrt 49.33
1 sort 94.18
For what it is worth, I also made a C version ("cmax" below) which of
> course is faster yet again and scales nicely for returning the top n values
> of the array:
>
> cmax <- function (v) {max <- vector("double",2); max <- .C("test",
> as.double(v), as.integer(length(v)), max, NAOK=TRUE)[[3]];
> return(max[1]/max[2]);}
>
> library("rbenchmark")
> set.seed(1); x <- runif(1e7, max=1e8); x[1] <- NA;
> benchmark(
> replications=20,
> columns=c("test","elapsed"),
> order="elapsed"
> , sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
> , qsrt = {a<-sort(x, decreasing=TRUE, na.last=NA, method="quick")[1:2];
> a[1]/a[2];}
> , part = {a<-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];}
> , max1 = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1];
> m/max(x[-w],na.rm=TRUE);}
> , max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
> , cmax = {cmax(x);}
> )
> # test elapsed
> # 6 cmax 4.394
> # 5 max2 8.954
> # 4 max1 18.835
> # 3 part 21.749
> # 2 qsrt 46.692
> # 1 sort 77.679
>
> Thanks for all the suggestions and comments.
>
> Allan.
>
>
> PS: Slightly off-topic but is there a way within the syntax of R to set up
> things so that 'sort' (or any function) would know it is called in a partial
> list context in sort(x)[1:2] and it therefore could choose to use the
> "partial" argument automatically for small [] lists? The R interpreter of
> course knows full well that it is going to drop all but the first two values
> of the result before it calls 'sort'. Perl has 'use Want' where howmany()
> and want(n) provides a subset of this functionality (essentially for []
> lists of the form 1:n).
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.