Re: [R] by group problem

2007-09-04 Thread Cory Nissen
Perfect, except for one little bit...
 
topN.2 is missing one comma...
 
It should read as follows
 
topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n,]
 
Thank you very much.
 
cn




From: Petr PIKAL [mailto:[EMAIL PROTECTED]
Sent: Mon 9/3/2007 3:51 AM
To: Cory Nissen
Cc: r-help@stat.math.ethz.ch
Subject: RE: [R] by group problem



Hi

now I understand better what you want

topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n]

# I presume data is data frame with 3 columns and the third is percent

lapply(split(data,data$state), topN.2)

Regards

Petr

[EMAIL PROTECTED]

Cory Nissen [EMAIL PROTECTED] napsal dne 31.08.2007 17:21:01:

 That didn't work for me...

 Here's some data to help with a solution.

 data - NULL
 data$state - c(rep(Illinois, 10), rep(Wisconsin, 10))
 data$county - c(Adams, Brown, Bureau, Cass, Champaign,
  Christian, Coles, De Witt, Douglas, Edgar,
  Adams, Ashland, Barron, Bayfield, Buffalo,
  Burnett, Chippewa, Clark, Columbia, Crawford)
 data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242,
8.743823,
  17.862746, 13.747967, 16.626302, 15.258940,
18.984435,
  19.347022, 17.814436, 16.903067, 17.632781,
16.659305,
  20.337817, 14.293354, 17.252820, 15.647179,
16.825596)

 return something like this...
 $Illinois
 Edgar
 18.984435
 Bureau
 18.196593
 ...
 $Wisconsin
 Burnett
 20.33782
 Adams
 19.34702
 ...

 My Solution gives...
 topN - function(column, n=5)
   {
 column - sort(column, decreasing=T)
 return(column[1:n])
   }
 tapply(data$percentOld, data$state, topN)

 $Illinois
 [1] 18.98444 18.19659 17.86275 17.55485 17.13924
 $Wisconsin
 [1] 20.33782 19.34702 17.81444 17.63278 17.25282

 I get an error with this try...
 aggregate(data$percentOld, list(data$state, data$county), topN)

 Error in aggregate.data.frame(as.data.frame(x), ...) :
  'FUN' must always return a scalar

 Thanks

 cn



 From: Petr PIKAL [mailto:[EMAIL PROTECTED]
 Sent: Fri 8/31/2007 8:15 AM
 To: Cory Nissen
 Cc: r-help@stat.math.ethz.ch
 Subject: Odp: [R] by group problem

 Hi

  I am working with census data.  My columns of interest are...
 
  PercentOld - the percentage of people in each county that are over 65
  County - the county in each state
  State - the state in the US
 
  There are about 3100 rows, with each row corresponding to a county
 within a state.
 
  I want to return the top five PercentOld by state.  But I want the
 County
  and the Value.
 
  I tried this...
 
  topN - function(column, n=5)
{
  column - sort(column, decreasing=T)
  return(column[1:n])
}
  top5PerState - tapply(data$percentOld, data$STATE, topN)

 Try

 aggregate(data$PercentOld, list(data$State, data$County), topN)

 Regards
 Petr


 
  But this only returns the value for percentOld per state, I also
want
 the
  corresponding County.
 
  I think I'm close, but I just can't get it...
 
  Thanks
 
  cn
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group problem

2007-09-03 Thread Petr PIKAL
Hi

now I understand better what you want

topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n]

# I presume data is data frame with 3 columns and the third is percent

lapply(split(data,data$state), topN.2)

Regards

Petr

[EMAIL PROTECTED]

Cory Nissen [EMAIL PROTECTED] napsal dne 31.08.2007 17:21:01:

 That didn't work for me...
 
 Here's some data to help with a solution.
 
 data - NULL
 data$state - c(rep(Illinois, 10), rep(Wisconsin, 10))
 data$county - c(Adams, Brown, Bureau, Cass, Champaign, 
  Christian, Coles, De Witt, Douglas, Edgar,
  Adams, Ashland, Barron, Bayfield, Buffalo, 
  Burnett, Chippewa, Clark, Columbia, Crawford)
 data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242, 
8.743823,
  17.862746, 13.747967, 16.626302, 15.258940, 
18.984435,
  19.347022, 17.814436, 16.903067, 17.632781, 
16.659305,
  20.337817, 14.293354, 17.252820, 15.647179, 
16.825596)
 
 return something like this...
 $Illinois
 Edgar
 18.984435
 Bureau
 18.196593
 ...
 $Wisconsin
 Burnett
 20.33782
 Adams
 19.34702
 ...
 
 My Solution gives...
 topN - function(column, n=5)
   {
 column - sort(column, decreasing=T)
 return(column[1:n])
   }
 tapply(data$percentOld, data$state, topN)
 
 $Illinois
 [1] 18.98444 18.19659 17.86275 17.55485 17.13924
 $Wisconsin
 [1] 20.33782 19.34702 17.81444 17.63278 17.25282
 
 I get an error with this try...
 aggregate(data$percentOld, list(data$state, data$county), topN)
 
 Error in aggregate.data.frame(as.data.frame(x), ...) : 
  'FUN' must always return a scalar
 
 Thanks
 
 cn
 
 
 
 From: Petr PIKAL [mailto:[EMAIL PROTECTED]
 Sent: Fri 8/31/2007 8:15 AM
 To: Cory Nissen
 Cc: r-help@stat.math.ethz.ch
 Subject: Odp: [R] by group problem

 Hi
 
  I am working with census data.  My columns of interest are...
 
  PercentOld - the percentage of people in each county that are over 65
  County - the county in each state
  State - the state in the US
 
  There are about 3100 rows, with each row corresponding to a county
 within a state.
 
  I want to return the top five PercentOld by state.  But I want the
 County
  and the Value.
 
  I tried this...
 
  topN - function(column, n=5)
{
  column - sort(column, decreasing=T)
  return(column[1:n])
}
  top5PerState - tapply(data$percentOld, data$STATE, topN)
 
 Try
 
 aggregate(data$PercentOld, list(data$State, data$County), topN)
 
 Regards
 Petr
 
 
 
  But this only returns the value for percentOld per state, I also 
want
 the
  corresponding County.
 
  I think I'm close, but I just can't get it...
 
  Thanks
 
  cn
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group problem

2007-09-01 Thread Erich Neuwirth
Perhaps you want this?

data - NULL
data$state - c(rep(Illinois, 10), rep(Wisconsin, 10))
data$county - c(Adams, Brown, Bureau, Cass, Champaign,
 Christian, Coles, De Witt, Douglas, Edgar,
 Adams, Ashland, Barron, Bayfield, Buffalo,
 Burnett, Chippewa, Clark, Columbia, Crawford)
data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242,  8.743823,
 17.862746, 13.747967, 16.626302, 15.258940, 18.984435,
 19.347022, 17.814436, 16.903067, 17.632781, 16.659305,
 20.337817, 14.293354, 17.252820, 15.647179, 16.825596)

data-data.frame(data,stringsAsFactors=FALSE)
rankWithinState-unlist(tapply(-data$percentOld,data$state,rank))
names(rankWithinState)-NULL
data-data.frame(data,rankWithinState)
highCounties-data[data$rankWithinState=5,]
highCountiesSorted-highCounties[order(highCounties$state,-highCounties$percentOld),]

Cory Nissen wrote:
 I am working with census data.  My columns of interest are...
  
 PercentOld - the percentage of people in each county that are over 65
 County - the county in each state
 State - the state in the US
-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group problem

2007-08-31 Thread Gabor Grothendieck
See the examples labelled head in the examples section near the bottom of:

http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd

These show show to do it using order as well as using SQL via sqldf.

On 8/31/07, Cory Nissen [EMAIL PROTECTED] wrote:
 I am working with census data.  My columns of interest are...

 PercentOld - the percentage of people in each county that are over 65
 County - the county in each state
 State - the state in the US

 There are about 3100 rows, with each row corresponding to a county within a 
 state.

 I want to return the top five PercentOld by state.  But I want the County 
 and the Value.

 I tried this...

 topN - function(column, n=5)
  {
column - sort(column, decreasing=T)
return(column[1:n])
  }
 top5PerState - tapply(data$percentOld, data$STATE, topN)

 But this only returns the value for percentOld per state, I also want the 
 corresponding County.

 I think I'm close, but I just can't get it...

 Thanks

 cn

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group problem

2007-08-31 Thread Cory Nissen
That didn't work for me...
 
Here's some data to help with a solution.
 
data - NULL
data$state - c(rep(Illinois, 10), rep(Wisconsin, 10))
data$county - c(Adams, Brown, Bureau, Cass, Champaign,  
 Christian, Coles, De Witt, Douglas, Edgar,
 Adams, Ashland, Barron, Bayfield, Buffalo,   
 Burnett, Chippewa, Clark, Columbia, Crawford)
data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242,  8.743823,
 17.862746, 13.747967, 16.626302, 15.258940, 18.984435,
 19.347022, 17.814436, 16.903067, 17.632781, 16.659305,
 20.337817, 14.293354, 17.252820, 15.647179, 16.825596)

return something like this...
$Illinois
Edgar
18.984435
Bureau
18.196593
...
$Wisconsin
Burnett
20.33782
Adams
19.34702
...
 
My Solution gives...
topN - function(column, n=5)
  {
column - sort(column, decreasing=T)
return(column[1:n])
  }
tapply(data$percentOld, data$state, topN)
 
$Illinois
[1] 18.98444 18.19659 17.86275 17.55485 17.13924
$Wisconsin
[1] 20.33782 19.34702 17.81444 17.63278 17.25282
 
I get an error with this try...
aggregate(data$percentOld, list(data$state, data$county), topN)

Error in aggregate.data.frame(as.data.frame(x), ...) : 
 'FUN' must always return a scalar
 
Thanks
 
cn
 
 



From: Petr PIKAL [mailto:[EMAIL PROTECTED]
Sent: Fri 8/31/2007 8:15 AM
To: Cory Nissen
Cc: r-help@stat.math.ethz.ch
Subject: Odp: [R] by group problem



Hi

 I am working with census data.  My columns of interest are...

 PercentOld - the percentage of people in each county that are over 65
 County - the county in each state
 State - the state in the US

 There are about 3100 rows, with each row corresponding to a county
within a state.

 I want to return the top five PercentOld by state.  But I want the
County
 and the Value.

 I tried this...

 topN - function(column, n=5)
   {
 column - sort(column, decreasing=T)
 return(column[1:n])
   }
 top5PerState - tapply(data$percentOld, data$STATE, topN)

Try

aggregate(data$PercentOld, list(data$State, data$County), topN)

Regards
Petr



 But this only returns the value for percentOld per state, I also want
the
 corresponding County.

 I think I'm close, but I just can't get it...

 Thanks

 cn

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group problem

2007-08-31 Thread Henrique Dallazuanna
Hi, try this:

by(data$percentOld, list(data$state, data$county), FUN=topN)

is this you want?

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

On 31/08/2007, Cory Nissen [EMAIL PROTECTED] wrote:

 That didn't work for me...

 Here's some data to help with a solution.

 data - NULL
 data$state - c(rep(Illinois, 10), rep(Wisconsin, 10))
 data$county - c(Adams, Brown, Bureau, Cass, Champaign,
  Christian, Coles, De Witt, Douglas, Edgar,
  Adams, Ashland, Barron, Bayfield, Buffalo,
  Burnett, Chippewa, Clark, Columbia, Crawford)
 data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242,  8.743823
 ,
  17.862746, 13.747967, 16.626302, 15.258940, 18.984435
 ,
  19.347022, 17.814436, 16.903067, 17.632781, 16.659305
 ,
  20.337817, 14.293354, 17.252820, 15.647179, 16.825596
 )

 return something like this...
 $Illinois
 Edgar
 18.984435
 Bureau
 18.196593
 ...
 $Wisconsin
 Burnett
 20.33782
 Adams
 19.34702
 ...

 My Solution gives...
 topN - function(column, n=5)
   {
 column - sort(column, decreasing=T)
 return(column[1:n])
   }
 tapply(data$percentOld, data$state, topN)

 $Illinois
 [1] 18.98444 18.19659 17.86275 17.55485 17.13924
 $Wisconsin
 [1] 20.33782 19.34702 17.81444 17.63278 17.25282

 I get an error with this try...
 aggregate(data$percentOld, list(data$state, data$county), topN)

 Error in aggregate.data.frame(as.data.frame(x), ...) :
 'FUN' must always return a scalar

 Thanks

 cn



 

 From: Petr PIKAL [mailto:[EMAIL PROTECTED]
 Sent: Fri 8/31/2007 8:15 AM
 To: Cory Nissen
 Cc: r-help@stat.math.ethz.ch
 Subject: Odp: [R] by group problem



 Hi

  I am working with census data.  My columns of interest are...
 
  PercentOld - the percentage of people in each county that are over 65
  County - the county in each state
  State - the state in the US
 
  There are about 3100 rows, with each row corresponding to a county
 within a state.
 
  I want to return the top five PercentOld by state.  But I want the
 County
  and the Value.
 
  I tried this...
 
  topN - function(column, n=5)
{
  column - sort(column, decreasing=T)
  return(column[1:n])
}
  top5PerState - tapply(data$percentOld, data$STATE, topN)

 Try

 aggregate(data$PercentOld, list(data$State, data$County), topN)

 Regards
 Petr


 
  But this only returns the value for percentOld per state, I also want
 the
  corresponding County.
 
  I think I'm close, but I just can't get it...
 
  Thanks
 
  cn
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.