Very nice! This is almost duplicates the SAS first.var and last.var ability to choose the first and last observations by group(s). Substituting the head function in where Marc has the tail function below will adapt it to the first n. It is more flexible than the SAS approach because it can do the first/last n rather than just the single first or last.
Let's say we want to choose the last observation in a county, and counties have duplicate names in different states. You could sort by state, then county, then use only county where Marc uses score$id in his last example below, and it would get the last record for *every* county regardless of duplicates. Does this sound correct? That's a handy bit of code! Cheers, Bob ========================================================= Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html ========================================================= > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:r-help- > [EMAIL PROTECTED] On Behalf Of Marc Schwartz > Sent: Tuesday, March 20, 2007 10:59 AM > To: Lauri Nikkinen > Cc: [email protected] > Subject: Re: [R] Select the last two rows by id group > > On Tue, 2007-03-20 at 16:33 +0200, Lauri Nikkinen wrote: > > Hi R-users, > > > > Following this post > http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html , > > how do I get last two rows (or six or ten) by id group out of the > data > > frame? Here the example gives just the last row. > > > > Sincere thanks, > > Lauri > > A slight modification to Gabor's solution: > > > score > id reading math > 1 1 65 80 > 2 1 70 75 > 3 1 88 70 > 4 2 NA 65 > 5 3 90 65 > 6 3 NA 70 > > # Return the last '2' rows > # Note the addition of unlist() > > > score[unlist(tapply(rownames(score), score$id, tail, 2)), ] > id reading math > 2 1 70 75 > 3 1 88 70 > 4 2 NA 65 > 5 3 90 65 > 6 3 NA 70 > > > Note that when tail() returns more than one value, tapply() will create > a list rather than a vector: > > > tapply(rownames(score), score$id, tail, 2) > $`1` > [1] "2" "3" > > $`2` > [1] "4" > > $`3` > [1] "5" "6" > > > Thus, we need to unlist() the indices to use them in the subsetting > process that Gabor used in his solution. > > Another alternative, if the rownames do not correspond to the > sequential > row indices as they do in this example: > > > do.call("rbind", lapply(split(score, score$id), tail, 2)) > id reading math > 1.2 1 70 75 > 1.3 1 88 70 > 2 2 NA 65 > 3.5 3 90 65 > 3.6 3 NA 70 > > > This uses split() to create a list of data frames from score, where > each > data frame is 'split' by the 'id' column values. tail() is then applied > to each data frame using lapply(), the results of which are then > rbind()ed back to a single data frame. > > HTH, > > Marc Schwartz > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
