Marc, thanks for so many great variations! I especially like: tail(sort(table(DF$County)))
I often have frequency tables that are of interest only towards the end. Cheers, Bob ========================================================= Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html ========================================================= > -----Original Message----- > From: Marc Schwartz [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 20, 2007 8:58 PM > To: Muenchen, Robert A (Bob) > Cc: [email protected] > Subject: Re: [R] Select the last two rows by id group > > On Tue, 2007-03-20 at 11:53 -0400, Muenchen, Robert A (Bob) wrote: > > Very nice! This is almost duplicates the SAS first.var and last.var > > ability to choose the first and last observations by group(s). > > Substituting the head function in where Marc has the tail function > below > > will adapt it to the first n. It is more flexible than the SAS > approach > > because it can do the first/last n rather than just the single first > or > > last. > > > > Let's say we want to choose the last observation in a county, and > > counties have duplicate names in different states. You could sort by > > state, then county, then use only county where Marc uses score$id in > his > > last example below, and it would get the last record for *every* > county > > regardless of duplicates. Does this sound correct? > > > > That's a handy bit of code! > > > > Cheers, > > Bob > > Bob, > > You can test it using data here: > > DF <- read.csv("http://www.nws.noaa.gov/nwr/SameCode.txt", > header = FALSE) > > colnames(DF) <- c("Code", "County", "State") > > > str(DF) > 'data.frame': 3288 obs. of 3 variables: > $ Code : int 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ... > $ County: Factor w/ 1996 levels "Abbeville","Acadia",..: 97 105 116 > 169 186 249 259 272 326 348 ... > $ State : Factor w/ 60 levels "AK","AL","AR",..: 2 2 2 2 2 2 2 2 2 2 > ... > > > The data is already sorted by State and then County. > > > > system.time(DF.tail <- do.call("rbind", lapply(split(DF, DF$County), > tail, 1))) > [1] 6.851 0.085 7.085 0.000 0.000 > > > > str(DF.tail) > 'data.frame': 1996 obs. of 3 variables: > $ Code : int 45001 22001 16001 40001 55001 50001 72001 72003 72005 > 72007 ... > $ County: Factor w/ 1996 levels "Abbeville","Acadia",..: 1 2 3 4 5 6 7 > 8 9 10 ... > $ State : Factor w/ 60 levels "AK","AL","AR",..: 48 22 17 42 58 56 45 > 45 45 45 ... > > > # How many unique county names in the source dataset? > > > length(unique(DF$County)) > [1] 1996 > > > # Are they all the same unique counties? > > > all(DF.tail$County == sort(unique(DF$County))) > [1] TRUE > > > It is curious to see just how many duplicates there are. For example: > > > tail(sort(table(DF$County))) > > Madison Jackson Lincoln Franklin Jefferson Washington > 20 24 24 25 26 31 > > > > subset(DF, County == "Washington") > Code County State > 65 1129 Washington AL > 181 5143 Washington AR > 304 8121 Washington CO > 385 12133 Washington FL > 535 13303 Washington GA > 593 16087 Washington ID > 688 17189 Washington IL > 783 18175 Washington IN > 879 19183 Washington IA > 987 20201 Washington KS > 1106 21229 Washington KY > 1167 22117 Washington LA > 1189 23029 Washington ME > 1211 24043 Washington MD > 1393 27163 Washington MN > 1474 28151 Washington MS > 1590 29221 Washington MO > 1740 31177 Washington NE > 1883 36115 Washington NY > 1981 37187 Washington NC > 2124 39167 Washington OH > 2202 40147 Washington OK > 2239 41067 Washington OR > 2304 42125 Washington PA > 2313 44009 Washington RI > 2515 47179 Washington TN > 2759 48477 Washington TX > 2800 49053 Washington UT > 2814 50023 Washington VT > 2904 51191 Washington VA > 3108 55131 Washington WI > > > # The last state with Washington County (my neighbors, the > "Cheeseheads") was in the result set > > > subset(DF.tail, County == "Washington") > Code County State > Washington 55131 Washington WI > > > > > subset(DF, County == "Allen") > Code County State > 697 18003 Allen IN > 887 20001 Allen KS > 993 21003 Allen KY > 1113 22003 Allen LA > 2042 39003 Allen OH > > > # The last state with Allen County (OH) was in the result set > > > subset(DF.tail, County == "Allen") > Code County State > Allen 39003 Allen OH > > > Just noticed a Big Ten theme there...Go Gophers! ;-) > > > So, it would seem that your hypothesis is correct, at least in this > limited testing. I would want to validate it more rigorously of > course. > > HTH, > > Marc Schwartz > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
