Re: [R] by group problem
Perfect, except for one little bit... topN.2 is missing one comma... It should read as follows topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n,] Thank you very much. cn From: Petr PIKAL [mailto:[EMAIL PROTECTED] Sent: Mon 9/3/2007 3:51 AM To: Cory Nissen Cc: r-help@stat.math.ethz.ch Subject: RE: [R] by group problem Hi now I understand better what you want topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n] # I presume data is data frame with 3 columns and the third is percent lapply(split(data,data$state), topN.2) Regards Petr [EMAIL PROTECTED] Cory Nissen [EMAIL PROTECTED] napsal dne 31.08.2007 17:21:01: That didn't work for me... Here's some data to help with a solution. data - NULL data$state - c(rep(Illinois, 10), rep(Wisconsin, 10)) data$county - c(Adams, Brown, Bureau, Cass, Champaign, Christian, Coles, De Witt, Douglas, Edgar, Adams, Ashland, Barron, Bayfield, Buffalo, Burnett, Chippewa, Clark, Columbia, Crawford) data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242, 8.743823, 17.862746, 13.747967, 16.626302, 15.258940, 18.984435, 19.347022, 17.814436, 16.903067, 17.632781, 16.659305, 20.337817, 14.293354, 17.252820, 15.647179, 16.825596) return something like this... $Illinois Edgar 18.984435 Bureau 18.196593 ... $Wisconsin Burnett 20.33782 Adams 19.34702 ... My Solution gives... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } tapply(data$percentOld, data$state, topN) $Illinois [1] 18.98444 18.19659 17.86275 17.55485 17.13924 $Wisconsin [1] 20.33782 19.34702 17.81444 17.63278 17.25282 I get an error with this try... aggregate(data$percentOld, list(data$state, data$county), topN) Error in aggregate.data.frame(as.data.frame(x), ...) : 'FUN' must always return a scalar Thanks cn From: Petr PIKAL [mailto:[EMAIL PROTECTED] Sent: Fri 8/31/2007 8:15 AM To: Cory Nissen Cc: r-help@stat.math.ethz.ch Subject: Odp: [R] by group problem Hi I am working with census data. My columns of interest are... PercentOld - the percentage of people in each county that are over 65 County - the county in each state State - the state in the US There are about 3100 rows, with each row corresponding to a county within a state. I want to return the top five PercentOld by state. But I want the County and the Value. I tried this... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } top5PerState - tapply(data$percentOld, data$STATE, topN) Try aggregate(data$PercentOld, list(data$State, data$County), topN) Regards Petr But this only returns the value for percentOld per state, I also want the corresponding County. I think I'm close, but I just can't get it... Thanks cn [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by group problem
Hi now I understand better what you want topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n] # I presume data is data frame with 3 columns and the third is percent lapply(split(data,data$state), topN.2) Regards Petr [EMAIL PROTECTED] Cory Nissen [EMAIL PROTECTED] napsal dne 31.08.2007 17:21:01: That didn't work for me... Here's some data to help with a solution. data - NULL data$state - c(rep(Illinois, 10), rep(Wisconsin, 10)) data$county - c(Adams, Brown, Bureau, Cass, Champaign, Christian, Coles, De Witt, Douglas, Edgar, Adams, Ashland, Barron, Bayfield, Buffalo, Burnett, Chippewa, Clark, Columbia, Crawford) data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242, 8.743823, 17.862746, 13.747967, 16.626302, 15.258940, 18.984435, 19.347022, 17.814436, 16.903067, 17.632781, 16.659305, 20.337817, 14.293354, 17.252820, 15.647179, 16.825596) return something like this... $Illinois Edgar 18.984435 Bureau 18.196593 ... $Wisconsin Burnett 20.33782 Adams 19.34702 ... My Solution gives... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } tapply(data$percentOld, data$state, topN) $Illinois [1] 18.98444 18.19659 17.86275 17.55485 17.13924 $Wisconsin [1] 20.33782 19.34702 17.81444 17.63278 17.25282 I get an error with this try... aggregate(data$percentOld, list(data$state, data$county), topN) Error in aggregate.data.frame(as.data.frame(x), ...) : 'FUN' must always return a scalar Thanks cn From: Petr PIKAL [mailto:[EMAIL PROTECTED] Sent: Fri 8/31/2007 8:15 AM To: Cory Nissen Cc: r-help@stat.math.ethz.ch Subject: Odp: [R] by group problem Hi I am working with census data. My columns of interest are... PercentOld - the percentage of people in each county that are over 65 County - the county in each state State - the state in the US There are about 3100 rows, with each row corresponding to a county within a state. I want to return the top five PercentOld by state. But I want the County and the Value. I tried this... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } top5PerState - tapply(data$percentOld, data$STATE, topN) Try aggregate(data$PercentOld, list(data$State, data$County), topN) Regards Petr But this only returns the value for percentOld per state, I also want the corresponding County. I think I'm close, but I just can't get it... Thanks cn [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by group problem
Perhaps you want this? data - NULL data$state - c(rep(Illinois, 10), rep(Wisconsin, 10)) data$county - c(Adams, Brown, Bureau, Cass, Champaign, Christian, Coles, De Witt, Douglas, Edgar, Adams, Ashland, Barron, Bayfield, Buffalo, Burnett, Chippewa, Clark, Columbia, Crawford) data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242, 8.743823, 17.862746, 13.747967, 16.626302, 15.258940, 18.984435, 19.347022, 17.814436, 16.903067, 17.632781, 16.659305, 20.337817, 14.293354, 17.252820, 15.647179, 16.825596) data-data.frame(data,stringsAsFactors=FALSE) rankWithinState-unlist(tapply(-data$percentOld,data$state,rank)) names(rankWithinState)-NULL data-data.frame(data,rankWithinState) highCounties-data[data$rankWithinState=5,] highCountiesSorted-highCounties[order(highCounties$state,-highCounties$percentOld),] Cory Nissen wrote: I am working with census data. My columns of interest are... PercentOld - the percentage of people in each county that are over 65 County - the county in each state State - the state in the US -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by group problem
See the examples labelled head in the examples section near the bottom of: http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd These show show to do it using order as well as using SQL via sqldf. On 8/31/07, Cory Nissen [EMAIL PROTECTED] wrote: I am working with census data. My columns of interest are... PercentOld - the percentage of people in each county that are over 65 County - the county in each state State - the state in the US There are about 3100 rows, with each row corresponding to a county within a state. I want to return the top five PercentOld by state. But I want the County and the Value. I tried this... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } top5PerState - tapply(data$percentOld, data$STATE, topN) But this only returns the value for percentOld per state, I also want the corresponding County. I think I'm close, but I just can't get it... Thanks cn [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by group problem
That didn't work for me... Here's some data to help with a solution. data - NULL data$state - c(rep(Illinois, 10), rep(Wisconsin, 10)) data$county - c(Adams, Brown, Bureau, Cass, Champaign, Christian, Coles, De Witt, Douglas, Edgar, Adams, Ashland, Barron, Bayfield, Buffalo, Burnett, Chippewa, Clark, Columbia, Crawford) data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242, 8.743823, 17.862746, 13.747967, 16.626302, 15.258940, 18.984435, 19.347022, 17.814436, 16.903067, 17.632781, 16.659305, 20.337817, 14.293354, 17.252820, 15.647179, 16.825596) return something like this... $Illinois Edgar 18.984435 Bureau 18.196593 ... $Wisconsin Burnett 20.33782 Adams 19.34702 ... My Solution gives... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } tapply(data$percentOld, data$state, topN) $Illinois [1] 18.98444 18.19659 17.86275 17.55485 17.13924 $Wisconsin [1] 20.33782 19.34702 17.81444 17.63278 17.25282 I get an error with this try... aggregate(data$percentOld, list(data$state, data$county), topN) Error in aggregate.data.frame(as.data.frame(x), ...) : 'FUN' must always return a scalar Thanks cn From: Petr PIKAL [mailto:[EMAIL PROTECTED] Sent: Fri 8/31/2007 8:15 AM To: Cory Nissen Cc: r-help@stat.math.ethz.ch Subject: Odp: [R] by group problem Hi I am working with census data. My columns of interest are... PercentOld - the percentage of people in each county that are over 65 County - the county in each state State - the state in the US There are about 3100 rows, with each row corresponding to a county within a state. I want to return the top five PercentOld by state. But I want the County and the Value. I tried this... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } top5PerState - tapply(data$percentOld, data$STATE, topN) Try aggregate(data$PercentOld, list(data$State, data$County), topN) Regards Petr But this only returns the value for percentOld per state, I also want the corresponding County. I think I'm close, but I just can't get it... Thanks cn [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by group problem
Hi, try this: by(data$percentOld, list(data$state, data$county), FUN=topN) is this you want? -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 31/08/2007, Cory Nissen [EMAIL PROTECTED] wrote: That didn't work for me... Here's some data to help with a solution. data - NULL data$state - c(rep(Illinois, 10), rep(Wisconsin, 10)) data$county - c(Adams, Brown, Bureau, Cass, Champaign, Christian, Coles, De Witt, Douglas, Edgar, Adams, Ashland, Barron, Bayfield, Buffalo, Burnett, Chippewa, Clark, Columbia, Crawford) data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242, 8.743823 , 17.862746, 13.747967, 16.626302, 15.258940, 18.984435 , 19.347022, 17.814436, 16.903067, 17.632781, 16.659305 , 20.337817, 14.293354, 17.252820, 15.647179, 16.825596 ) return something like this... $Illinois Edgar 18.984435 Bureau 18.196593 ... $Wisconsin Burnett 20.33782 Adams 19.34702 ... My Solution gives... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } tapply(data$percentOld, data$state, topN) $Illinois [1] 18.98444 18.19659 17.86275 17.55485 17.13924 $Wisconsin [1] 20.33782 19.34702 17.81444 17.63278 17.25282 I get an error with this try... aggregate(data$percentOld, list(data$state, data$county), topN) Error in aggregate.data.frame(as.data.frame(x), ...) : 'FUN' must always return a scalar Thanks cn From: Petr PIKAL [mailto:[EMAIL PROTECTED] Sent: Fri 8/31/2007 8:15 AM To: Cory Nissen Cc: r-help@stat.math.ethz.ch Subject: Odp: [R] by group problem Hi I am working with census data. My columns of interest are... PercentOld - the percentage of people in each county that are over 65 County - the county in each state State - the state in the US There are about 3100 rows, with each row corresponding to a county within a state. I want to return the top five PercentOld by state. But I want the County and the Value. I tried this... topN - function(column, n=5) { column - sort(column, decreasing=T) return(column[1:n]) } top5PerState - tapply(data$percentOld, data$STATE, topN) Try aggregate(data$PercentOld, list(data$State, data$County), topN) Regards Petr But this only returns the value for percentOld per state, I also want the corresponding County. I think I'm close, but I just can't get it... Thanks cn [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.