On Wed, Dec 9, 2009 at 12:11 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > Here are a couple of solutions. The first uses by and the second sqldf:
Brilliant! Now I have a whole collection of solutions. I did a simple performance comparison with a data frame that has 7929 lines. The results were as following (loading appropriate packages is not included in the measurements): times <- c(0.248, 0.551, 41.080, 0.16, 0.190) names(times) <- c("aggregate","summaryBy","by+transform","sqldf","tapply") barplot(times, log="y", ylab="log(s)") So sqldf clearly wins followed by tapply and aggregate. summaryBy is slower than necessary because it computes for x and dur both, mean /and/ sum. by+transform presumably suffers from the contruction of many intermediate data frames. Are there any canonical places where R-recipes are collected? If yes I would write-up a summary. These were the competitors: # Gary's and Nikhil's aggregate solution: aggregate.fixations1 <- function(d) { idx <- c(TRUE,diff(d$roi)!=0) d2 <- d[idx,] idx <- cumsum(idx) d2$dur <- aggregate(d$dur, list(idx), sum)[2] d2$x <- aggregate(d$x, list(idx), mean)[2] d2 } # Marek's symmaryBy: library(doBy) aggregate.fixations2 <- function(d) { idx <- c(TRUE,diff(d$roi)!=0) d2 <- d[idx,] d$idx <- cumsum(idx) d2$r <- summaryBy(dur+x~idx, data=d, FUN=c(sum, mean))[c("dur.sum", "x.mean")] d2 } # Gabor's by+transform solution: aggregate.fixations3 <- function(d) { idx <- cumsum(c(TRUE,diff(d$roi)!=0)) d2 <- do.call(rbind, by(d, idx, function(x) transform(x, dur = sum(dur), x = mean(x))[1,,drop = FALSE ])) d2 } # Gabor's sqldf solution: library(sqldf) aggregate.fixations4 <- function(d) { idx <- c(TRUE,diff(d$roi)!=0) d2 <- d[idx,] d$idx <- cumsum(idx) d2$r <- sqldf("select sum(dur), avg(x) x from d group by idx") d2 } # Titus' solution using plain old tapply: aggregate.fixations5 <- function(d) { idx <- c(TRUE,diff(d$roi)!=0) d2 <- d[idx,] idx <- cumsum(idx) d2$dur <- tapply(d$dur, idx, sum) d2$x <- tapply(d$x, idx, mean) d2 } ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.