Richard Tan asked a very similar question last week ('get top n rows group by a column from a dataframe'). You could use ave() to make a sequence-number-within-group vector and choose rows with a small enough value there: tmp[ave(integer(nrow(tmp)), tmp$index, FUN=seq_along)<=N, ] If there are fewer than N rows for a given index this returns all of them but does not pad their number up to N.
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Doran, Harold > Sent: Monday, September 20, 2010 10:16 AM > To: R-help > Subject: [R] Sorting and subsetting > > Suppose I have a data frame, such as the one below: > > tmp <- data.frame(index = gl(2,20), foo = rnorm(40)) > > And further assume it is sorted by index and then by the variable foo. > > tmp <- tmp[order(tmp$index, tmp$foo) , ] > > Now, I want to grab the first N rows of tmp for each index. > In the end, what I want is the data frame 'result' > > tmp1 <- subset(tmp, index == 1) > tmp2 <- subset(tmp, index == 2) > > tmp1 <- tmp1[1:5,] > tmp2 <- tmp2[1:5,] > result <- rbind(tmp1, tmp2) > > Does anyone see a way to subset and subsequently bind without a loop? > > Harold > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.