Re: [R] Subset by Factor by date

Charilaos Skiadas Fri, 13 Jun 2008 22:50:15 -0700


On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote:

aggregate() is indeed a useful function in this case, but it onlyreturns the

columns by which it was grouped.  Is there a way I can use this while
simultaneously retaining all the other column values in the dataframe?

e.g. add superfluous (yet pertinent for later) column containing any
information at all and retain it in the final output

I had exactly this kind of need many times, and I have finallycreated a function for it, which I hope to include soon in anupcoming package. Here is a run of it (I added an extra "A" columncontaining just the numbers 1:8):


> DF
  id      day diff A
1  1 01-01-09  0.5 1
2  1 01-01-09  0.7 2
3  2 01-01-09  0.2 3
4  2 01-01-09  0.4 4
5  1 01-02-09  0.1 5
6  1 01-02-09  0.3 6
7  2 01-02-09  0.3 7
8  2 01-02-09  0.4 8
> byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),])
  diff A id      day
1  0.5 1  1 01-01-09
2  0.2 3  2 01-01-09
3  0.1 5  1 01-02-09
4  0.3 7  2 01-02-09

Would that do what you want?

I've appended the function byDataFrame, and its prerequisite, afunction parseIndexList. I'm not quite set on the names yet, butanyway. Hope this helps. I haven't really tested it on large sets, itmight perform poorly. Any suggestions on speeding the code /corrections are welcome.


Haris Skiadas
Department of Mathematics and Computer Science
Hanover College



parseIndexList <- function(indexList) {
  # browser()
  if (!is.list(indexList))
    indexList <- as.list(indexList)
  nI <- length(indexList)
  namelist <- vector("list", nI)
  names(namelist) <- names(indexList)
  extent <- integer(nI)
  nx <- length(indexList[[1]])
  one <- as.integer(1)
  group <- rep.int(one, nx)
  ngroup <- one
  for (i in seq.int(indexList)) {
      index <- as.factor(indexList[[i]])
      if (length(index) != nx)
          stop("arguments must have same length")
      namelist[[i]] <- sort(unique(indexList[[i]]))
      extent[i] <- length(namelist[[i]])
      group <- group + ngroup * (as.integer(index) - one)
      ngroup <- ngroup * nlevels(index)
  }
  nms <- do.call(expand.grid, namelist)
  ind <- unique(sort(group))
  res <- data.frame(index=ind, nms[ind, , drop=FALSE])
  return(list(cases=group, groups=res))
}

byDataFrame <- function (data, INDEX, FUN, newnames,omit.index.cols=TRUE, ...) {

# # Part of the code shamelessly stolen from tapply
  IND <- eval(substitute(INDEX), data)
  nms <- as.character(as.list(substitute(INDEX)))
  if (!is.list(IND)) {
    IND <- list(IND)
    names(IND) <- nms
  } else {
    names(IND) <- nms[-1]
  }
  funname <- paste(as.character(substitute(FUN)), collapse=".")
  indexInfo <- parseIndexList(IND)
  FUNx <- if (omit.index.cols) {
    omit.cols <- match(names(indexInfo$groups)[-1], names(data))
    function(x, ...) FUN(data[x, -omit.cols], ...)
  } else {
    function(x, ...) FUN(data[x, ], ...)
  }
  ans <- lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...)
  index <- as.numeric(names(ans))
  if (!is.data.frame(ans[[1]])) {
    ans <- lapply(ans, function(x) {
      dframe <- as.data.frame(t(x))
      if (is.null(names(x)))
        names(dframe) <- funname
      dframe
    })
  }
  lengths <- sapply(ans, nrow)
  ans <- do.call(rbind, ans)
  if (!missing(newnames))
    names(ans) <- newnames
  nms <- indexInfo$groups[rep(index, lengths),-1, drop=FALSE]
  res <- cbind(ans, nms)
  res
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset by Factor by date

Reply via email to