A few hours ago, I was making a small point on the R-SIG-robust
mailing list on the point that ifelse() was not too efficient
in a situation where pmax() could easily be used instead.
However, this has reminded me of some timing experiments that I
did 13 years ago with S-plus -- where I found that pmin() /
pmax() were really relatively slow for the most common case
where they are used with only two arguments {and typically one
of the arguments is a scalar; but that's not even important here}.
The main reason is that the function accept an arbitrary number
of arguments and that they do recycling.
Their source is at
https://svn.R-project.org/R/trunk/src/library/base/R/pmax.R
In April 2001 (as I see), I had repeated my timings with R (1.2.2)
which confirmed the picture more or less, but for some reason I
never drew "proper" consequences of my findings.
Of course one can argue pmax() & pmin() are still quite fast
functions; OTOH the experiment below shows that -- at least the
special case with 2 (matching) arguments could be made faster by
about a factor of 19 ...
I don't have yet a constructive proposition; just note the fact that
pmin. <- function(k,x) (x+k - abs(x-k))/2
pmax. <- function(k,x) (x+k + abs(x-k))/2
are probably the fastest way of computing pmin() and pmax() of
two arguments {yes, they "suffer" from rounding error of about 1
to 2 bits...} currently in R.
One "solution" could be to provide pmin2() and pmax2()
functions based on trival .Internal() versions.
The experiments below are for the special case of k=0 where I
found the above mentioned factor of 19 which is a bit
overoptimistic for the general case; here is my pmax-ex.R source file
(as text/plain attachment ASCII-code --> easy cut & paste)
demonstrating what I claim above.
#### Martin Maechler, Aug.1992 (S+ 3.x) --- the same applies to R 1.2.2
####
#### Observation: (abs(x) + x) / 2 is MUCH faster than pmax(0,x) !!
####
#### The function pmax.fast below is very slightly slower than (|x|+x)/2
### "this" directory --- adapt!
thisDir <- "/u/maechler/R/MM/MISC/Speed"
### For R's source,
### egrep 'pm(ax|in)' src/library/*/R/*.R
### shows you that most uses of pmax() / pmin() are really just with arguments
### (<number> , <vector>) where the fast versions would be much better!
pmax.fast <- function(scalar, vector)
{
## Purpose: FAST substitute for pmax(s, v) when length(s) == 1
## Author: Martin Maechler, Date: 21 Aug 1992 (for S)
vector[ scalar > vector ] <- scalar
vector
}
## 2 things:
## 1) the above also works when 'scalar' == vector of same length
## 2) The following is even (quite a bit!) faster :
pmin. <- function(k,x) (x+k - abs(x-k))/2
pmax. <- function(k,x) (x+k + abs(x-k))/2
### The following are small scale timing simulations which confirm:
N <- 20 ## number of experiment replications
kinds <- c("abs", "[.>.]<-", "Fpmax", "pmax0.", "pmax.0")
Tmat <- matrix(NA, nrow = N, ncol = length(kinds),
dimnames = list(NULL, kinds))
T0 <- Tmat[,1] # `control group'
n.inner <- 800 # should depend on the speed of your R / S (i.e. CPU)
set.seed(101)
for(k in 1:N) {
cat(k,"")
## no longer set.seed(101)
x <- rnorm(1000)
Tmat[k, "pmax0."] <- system.time(for(i in 1:n.inner)y <- pmax(0,x))[1]
Tmat[k, "pmax.0"] <- system.time(for(i in 1:n.inner)y <- pmax(x,0))[1]
Tmat[k, "abs"] <- system.time(for(i in 1:n.inner)y <- (x + abs(x))/2)[1]
Tmat[k,"[.>.]<-"] <- system.time(for(i in 1:n.inner)y <-{x[0 > x] <-
0;x})[1]
Tmat[k, "Fpmax"] <- system.time(for(i in 1:n.inner)y <- pmax.fast(0,x))[1]
}
save(Tmat, file = file.path(thisDir, "pmax-Tmat.rda"))
###-- Restart here {saving simulation/timing}:
if(!exists("Tmat"))
load(file.path(thisDir, "pmax-Tmat.rda"))
(Tm <- apply(Tmat, 2, mean, trim = .1))
## abs [.>.]<- Fpmax pmax0. pmax.0
## 0.025625 0.078750 0.077500 0.488125 0.511250
round(100 * Tm / Tm[1])
## abs [.>.]<- Fpmax pmax0. pmax.0
## 100 307 302 1905 1995
## earlier:
## abs [.>.]<- Fpmax pmax0. pmax.0
## 100 289 344 1804 1884
## pmax0. is really a bit faster than pmax.0 :
## P < .001 (for Wilcoxon; outliers!)
t.test(Tmat[,4], Tmat[,5], paired = TRUE)
t.test(Tmat[,4], Tmat[,5], paired = FALSE)# since random samples
wilcox.test(Tmat[,4], Tmat[,5])# P = 0.00012 {but ties -> doubt}
boxplot(data.frame(Tmat, check.names = FALSE),
notch = TRUE, ylim = range(0,Tmat),
main = "CPU times used for versions of pmax(0,x)")
mtext(paste("x <- rnorm(1000)"," ", N, "replicates"), side = 3)
mtext(paste(R.version.string, file.path(thisDir, "pmax-ex.R")),
side = 4, cex =.8, adj = 0)
## see if trimmed means were robust enough:
points(Tm, col = 2, cex = 1.5, pch = "X")
mtext("X : mean( * , trim = 0.10)", side = 1, line = -2.5, col = 2)
mtext(paste(round(100 * Tm / Tm[1]),"%"), side = 1, line = -1.2, col = 2,
at = 1:5)
Martin Maechler, ETH Zurich
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel