On Sep 20, 2010, at 2:01 PM, David Winsemius wrote:


On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:

On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
<spec...@stat.berkeley.edu> wrote:
Harold -
 Two ways that come to mind:

1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2) subset(tmp,unlist(tapply(foo,index,seq))<=5)
3) do.call(rbind, by(tmp, tmp$index, .Primitive("["), 1:5, 1:2))

I found that rather interesting but somewhat puzzling. I generally thought that using "[" should "work" but by() was complaining:
Error in FUN(X[[1L]], ...) : could not find function "FUN"

So tried using back-quotes and got a sensible result.

The need for back-quoting disappears if we add a match.fun call to by.data.frame():

by.data.frame <-
function (data, INDICES, FUN, ..., simplify = TRUE)
{ FUN <- match.fun(FUN)
    if (!is.list(INDICES)) {
        IND <- vector("list", 1L)
        IND[[1L]] <- INDICES
        names(IND) <- deparse(substitute(INDICES))[1L]
    }
    else IND <- INDICES
    FUNx <- function(x) FUN(data[x, , drop = FALSE], ...)
    nd <- nrow(data)
ans <- eval(substitute(tapply(1L:nd, IND, FUNx, simplify = simplify)),
        data)
    attr(ans, "call") <- match.call()
    class(ans) <- "by"
    ans
}

I would have thought such a call would be in the by.data.frame and by.default code but they seem to be "missing in action". Would there be any downside to modifying those functions in that manner?

--
David.



> do.call(rbind, by(tmp, tmp$index, FUN=`[`, 1:5, 1:2))
    index        foo
1.6      1 -3.0267759
1.7      1 -1.3725536
1.19     1 -1.1476048
1.16     1 -1.0963967
1.2      1 -1.0684793
2.29     2 -1.6601486
2.21     2 -1.2633632
2.22     2 -0.9875626
2.38     2 -0.9515301
2.30     2 -0.8638903

Unlike Dalgaard who arrived at a similar result via a different route and called the row names "silly", I thought they were informative. But maybe the sobriquet was directed at his second solution. I couldn't tell.

--
David.


Josh


                                      - Phil Spector
Statistical Computing Facility
                                       Department of Statistics
                                       UC Berkeley
                                       spec...@stat.berkeley.edu



On Mon, 20 Sep 2010, Doran, Harold wrote:

Suppose I have a data frame, such as the one below:

tmp <- data.frame(index = gl(2,20), foo = rnorm(40))

And further assume it is sorted by index and then by the variable foo.

tmp <- tmp[order(tmp$index, tmp$foo) , ]

Now, I want to grab the first N rows of tmp for each index. In the end,
what I want is the data frame 'result'

tmp1 <- subset(tmp, index == 1)
tmp2 <- subset(tmp, index == 2)

tmp1 <- tmp1[1:5,]
tmp2 <- tmp2[1:5,]
result <- rbind(tmp1, tmp2)

Does anyone see a way to subset and subsequently bind without a loop?

Harold



      [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to