On Jun 14, 2008, at 2:59 AM, T.D.Rudolph wrote:


I can't speak to the intricacies of the formula but when I run the
ByDataFrame() function provided on a subsample of my data (n=50) it returned only the very first id value in the output; the rest came out as <NA>.... This is not to say it has not properly selected the rows with min(x $diff), but I have no way of verifying without the id membership in the output.

And equally we can't help you with that without a reproducible example. Doesn't it do the right thing in the little sample I posted? It moves the id and day columns to the end. Without that, the only thing I can think of that might cause trouble is that you have a matrix instead of a data.frame, or otherwise the columns have some class I have not anticipated. Perhaps you can send me a part of your data off-list, if you can't post it here?

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

Charilaos Skiadas-3 wrote:


On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote:


aggregate() is indeed a useful function in this case, but it only
returns the
columns by which it was grouped. Is there a way I can use this while simultaneously retaining all the other column values in the dataframe?

e.g. add superfluous (yet pertinent for later) column containing any
information at all and retain it in the final output

I had exactly this kind of need many times, and I have finally
created a function for it, which I hope to include soon in an
upcoming package. Here is a run of it (I added an extra "A" column
containing just the numbers 1:8):

DF
   id      day diff A
1  1 01-01-09  0.5 1
2  1 01-01-09  0.7 2
3  2 01-01-09  0.2 3
4  2 01-01-09  0.4 4
5  1 01-02-09  0.1 5
6  1 01-02-09  0.3 6
7  2 01-02-09  0.3 7
8  2 01-02-09  0.4 8
byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),])
   diff A id      day
1  0.5 1  1 01-01-09
2  0.2 3  2 01-01-09
3  0.1 5  1 01-02-09
4  0.3 7  2 01-02-09

Would that do what you want?

I've appended the function byDataFrame, and its prerequisite, a
function parseIndexList. I'm not quite set on the names yet, but
anyway. Hope this helps. I haven't really tested it on large sets, it
might perform poorly. Any suggestions on speeding the code /
corrections are welcome.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College



parseIndexList <- function(indexList) {
   # browser()
   if (!is.list(indexList))
     indexList <- as.list(indexList)
   nI <- length(indexList)
   namelist <- vector("list", nI)
   names(namelist) <- names(indexList)
   extent <- integer(nI)
   nx <- length(indexList[[1]])
   one <- as.integer(1)
   group <- rep.int(one, nx)
   ngroup <- one
   for (i in seq.int(indexList)) {
       index <- as.factor(indexList[[i]])
       if (length(index) != nx)
           stop("arguments must have same length")
       namelist[[i]] <- sort(unique(indexList[[i]]))
       extent[i] <- length(namelist[[i]])
       group <- group + ngroup * (as.integer(index) - one)
       ngroup <- ngroup * nlevels(index)
   }
   nms <- do.call(expand.grid, namelist)
   ind <- unique(sort(group))
   res <- data.frame(index=ind, nms[ind, , drop=FALSE])
   return(list(cases=group, groups=res))
}

byDataFrame <- function (data, INDEX, FUN, newnames,
omit.index.cols=TRUE, ...) {
# # Part of the code shamelessly stolen from tapply
   IND <- eval(substitute(INDEX), data)
   nms <- as.character(as.list(substitute(INDEX)))
   if (!is.list(IND)) {
     IND <- list(IND)
     names(IND) <- nms
   } else {
     names(IND) <- nms[-1]
   }
   funname <- paste(as.character(substitute(FUN)), collapse=".")
   indexInfo <- parseIndexList(IND)
   FUNx <- if (omit.index.cols) {
     omit.cols <- match(names(indexInfo$groups)[-1], names(data))
     function(x, ...) FUN(data[x, -omit.cols], ...)
   } else {
     function(x, ...) FUN(data[x, ], ...)
   }
   ans <- lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...)
   index <- as.numeric(names(ans))
   if (!is.data.frame(ans[[1]])) {
     ans <- lapply(ans, function(x) {
       dframe <- as.data.frame(t(x))
       if (is.null(names(x)))
         names(dframe) <- funname
       dframe
     })
   }
   lengths <- sapply(ans, nrow)
   ans <- do.call(rbind, ans)
   if (!missing(newnames))
     names(ans) <- newnames
   nms <- indexInfo$groups[rep(index, lengths),-1, drop=FALSE]
   res <- cbind(ans, nms)
   res
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to