[R] FW: variable format

2007-09-07 Thread Cory Nissen
 

Anybody?  




From: Cory Nissen
Sent: Tue 9/4/2007 9:30 AM
To: r-help@stat.math.ethz.ch
Subject: variable format


Okay, I want to do something similar to SAS proc format.

I usually do this...

a - NULL
a$divisionOld - c(1,2,3,4,5)
divisionTable - matrix(c(1, New England,
  2, Middle Atlantic,
  3, East North Central,
  4, West North Central,
  5, South Atlantic),
ncol=2, byrow=T)
a$divisionNew[match(a$divisionOld, divisionTable[,1])] - divisionTable[,2]

But how do I handle the case where...
a$divisionOld - c(0,1,2,3,4,5)   #no format available for 0, this throws an 
error.
OR
divisionTable - matrix(c(1, New England,
  2, Middle Atlantic,
  3, East North Central,
  4, West North Central,
  5, South Atlantic,
  6, East South Central,
  7, West South Central,
  8, Mountain,
  9, Pacific),
ncol=2, byrow=T)   
There are extra formats available... this throws a warning.

Thanks

Cory

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: variable format

2007-09-07 Thread Cory Nissen
This was what I was looking for.  I figured factor was the way to go, but I 
wasn't sure how to implement it.  
 
The car recommendation looks good too, but I want to try to stay away from 
having to download another package if I can.
 
Thanks
 
cn



From: Martin Becker [mailto:[EMAIL PROTECTED]
Sent: Fri 9/7/2007 10:55 AM
To: Cory Nissen
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] FW: variable format



Dear Cory,

I am not familiar with SAS, but is this what you are looking for?

divisionTable - matrix(c(1, New England,
  2, Middle Atlantic,
  3, East North Central,
  4, West North Central,
  5, South Atlantic,
  6, East South Central,
  7, West South Central,
  8, Mountain,
  9, Pacific),
ncol=2, byrow=T)
a - NULL
a$divisionOld - c(0,1,2,3,4,5)
a$divisionNew -
as.character(factor(a$divisionOld,levels=divisionTable[,1],labels=divisionTable[,2]))
a$divisionNew

[1] NA   New EnglandMiddle Atlantic 
[4] East North Central West North Central South Atlantic


Kind regards,

  Martin


Cory Nissen schrieb:
   

 Anybody? 


 

 From: Cory Nissen
 Sent: Tue 9/4/2007 9:30 AM
 To: r-help@stat.math.ethz.ch
 Subject: variable format


 Okay, I want to do something similar to SAS proc format.

 I usually do this...

 a - NULL
 a$divisionOld - c(1,2,3,4,5)
 divisionTable - matrix(c(1, New England,
   2, Middle Atlantic,
   3, East North Central,
   4, West North Central,
   5, South Atlantic),
 ncol=2, byrow=T)
 a$divisionNew[match(a$divisionOld, divisionTable[,1])] - divisionTable[,2]

 But how do I handle the case where...
 a$divisionOld - c(0,1,2,3,4,5)   #no format available for 0, this throws an 
 error.
 OR
 divisionTable - matrix(c(1, New England,
   2, Middle Atlantic,
   3, East North Central,
   4, West North Central,
   5, South Atlantic,
   6, East South Central,
   7, West South Central,
   8, Mountain,
   9, Pacific),
 ncol=2, byrow=T)  
 There are extra formats available... this throws a warning.

 Thanks

 Cory

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group problem

2007-09-04 Thread Cory Nissen
Perfect, except for one little bit...
 
topN.2 is missing one comma...
 
It should read as follows
 
topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n,]
 
Thank you very much.
 
cn




From: Petr PIKAL [mailto:[EMAIL PROTECTED]
Sent: Mon 9/3/2007 3:51 AM
To: Cory Nissen
Cc: r-help@stat.math.ethz.ch
Subject: RE: [R] by group problem



Hi

now I understand better what you want

topN.2 - function(data,n=5) data[order(data[,3], decreasing=T),][1:n]

# I presume data is data frame with 3 columns and the third is percent

lapply(split(data,data$state), topN.2)

Regards

Petr

[EMAIL PROTECTED]

Cory Nissen [EMAIL PROTECTED] napsal dne 31.08.2007 17:21:01:

 That didn't work for me...

 Here's some data to help with a solution.

 data - NULL
 data$state - c(rep(Illinois, 10), rep(Wisconsin, 10))
 data$county - c(Adams, Brown, Bureau, Cass, Champaign,
  Christian, Coles, De Witt, Douglas, Edgar,
  Adams, Ashland, Barron, Bayfield, Buffalo,
  Burnett, Chippewa, Clark, Columbia, Crawford)
 data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242,
8.743823,
  17.862746, 13.747967, 16.626302, 15.258940,
18.984435,
  19.347022, 17.814436, 16.903067, 17.632781,
16.659305,
  20.337817, 14.293354, 17.252820, 15.647179,
16.825596)

 return something like this...
 $Illinois
 Edgar
 18.984435
 Bureau
 18.196593
 ...
 $Wisconsin
 Burnett
 20.33782
 Adams
 19.34702
 ...

 My Solution gives...
 topN - function(column, n=5)
   {
 column - sort(column, decreasing=T)
 return(column[1:n])
   }
 tapply(data$percentOld, data$state, topN)

 $Illinois
 [1] 18.98444 18.19659 17.86275 17.55485 17.13924
 $Wisconsin
 [1] 20.33782 19.34702 17.81444 17.63278 17.25282

 I get an error with this try...
 aggregate(data$percentOld, list(data$state, data$county), topN)

 Error in aggregate.data.frame(as.data.frame(x), ...) :
  'FUN' must always return a scalar

 Thanks

 cn



 From: Petr PIKAL [mailto:[EMAIL PROTECTED]
 Sent: Fri 8/31/2007 8:15 AM
 To: Cory Nissen
 Cc: r-help@stat.math.ethz.ch
 Subject: Odp: [R] by group problem

 Hi

  I am working with census data.  My columns of interest are...
 
  PercentOld - the percentage of people in each county that are over 65
  County - the county in each state
  State - the state in the US
 
  There are about 3100 rows, with each row corresponding to a county
 within a state.
 
  I want to return the top five PercentOld by state.  But I want the
 County
  and the Value.
 
  I tried this...
 
  topN - function(column, n=5)
{
  column - sort(column, decreasing=T)
  return(column[1:n])
}
  top5PerState - tapply(data$percentOld, data$STATE, topN)

 Try

 aggregate(data$PercentOld, list(data$State, data$County), topN)

 Regards
 Petr


 
  But this only returns the value for percentOld per state, I also
want
 the
  corresponding County.
 
  I think I'm close, but I just can't get it...
 
  Thanks
 
  cn
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] variable format

2007-09-04 Thread Cory Nissen
Okay, I want to do something similar to SAS proc format.
 
I usually do this...
 
a - NULL
a$divisionOld - c(1,2,3,4,5)
divisionTable - matrix(c(1, New England,
  2, Middle Atlantic,
  3, East North Central,
  4, West North Central,
  5, South Atlantic),
ncol=2, byrow=T)
a$divisionNew[match(a$divisionOld, divisionTable[,1])] - divisionTable[,2]
 
But how do I handle the case where...
a$divisionOld - c(0,1,2,3,4,5)   #no format available for 0, this throws an 
error.
OR
divisionTable - matrix(c(1, New England,
  2, Middle Atlantic,
  3, East North Central,
  4, West North Central,
  5, South Atlantic,
  6, East South Central,
  7, West South Central,
  8, Mountain,
  9, Pacific),
ncol=2, byrow=T)   
There are extra formats available... this throws a warning.
 
Thanks
 
Cory

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] by group problem

2007-08-31 Thread Cory Nissen
I am working with census data.  My columns of interest are...
 
PercentOld - the percentage of people in each county that are over 65
County - the county in each state
State - the state in the US
 
There are about 3100 rows, with each row corresponding to a county within a 
state.
 
I want to return the top five PercentOld by state.  But I want the County and 
the Value.
 
I tried this...
 
topN - function(column, n=5)
  {
column - sort(column, decreasing=T)
return(column[1:n])
  }
top5PerState - tapply(data$percentOld, data$STATE, topN)
 
But this only returns the value for percentOld per state, I also want the 
corresponding County.
 
I think I'm close, but I just can't get it...
 
Thanks
 
cn

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group problem

2007-08-31 Thread Cory Nissen
That didn't work for me...
 
Here's some data to help with a solution.
 
data - NULL
data$state - c(rep(Illinois, 10), rep(Wisconsin, 10))
data$county - c(Adams, Brown, Bureau, Cass, Champaign,  
 Christian, Coles, De Witt, Douglas, Edgar,
 Adams, Ashland, Barron, Bayfield, Buffalo,   
 Burnett, Chippewa, Clark, Columbia, Crawford)
data$percentOld - c(17.554849, 16.826594, 18.196593, 17.139242,  8.743823,
 17.862746, 13.747967, 16.626302, 15.258940, 18.984435,
 19.347022, 17.814436, 16.903067, 17.632781, 16.659305,
 20.337817, 14.293354, 17.252820, 15.647179, 16.825596)

return something like this...
$Illinois
Edgar
18.984435
Bureau
18.196593
...
$Wisconsin
Burnett
20.33782
Adams
19.34702
...
 
My Solution gives...
topN - function(column, n=5)
  {
column - sort(column, decreasing=T)
return(column[1:n])
  }
tapply(data$percentOld, data$state, topN)
 
$Illinois
[1] 18.98444 18.19659 17.86275 17.55485 17.13924
$Wisconsin
[1] 20.33782 19.34702 17.81444 17.63278 17.25282
 
I get an error with this try...
aggregate(data$percentOld, list(data$state, data$county), topN)

Error in aggregate.data.frame(as.data.frame(x), ...) : 
 'FUN' must always return a scalar
 
Thanks
 
cn
 
 



From: Petr PIKAL [mailto:[EMAIL PROTECTED]
Sent: Fri 8/31/2007 8:15 AM
To: Cory Nissen
Cc: r-help@stat.math.ethz.ch
Subject: Odp: [R] by group problem



Hi

 I am working with census data.  My columns of interest are...

 PercentOld - the percentage of people in each county that are over 65
 County - the county in each state
 State - the state in the US

 There are about 3100 rows, with each row corresponding to a county
within a state.

 I want to return the top five PercentOld by state.  But I want the
County
 and the Value.

 I tried this...

 topN - function(column, n=5)
   {
 column - sort(column, decreasing=T)
 return(column[1:n])
   }
 top5PerState - tapply(data$percentOld, data$STATE, topN)

Try

aggregate(data$PercentOld, list(data$State, data$County), topN)

Regards
Petr



 But this only returns the value for percentOld per state, I also want
the
 corresponding County.

 I think I'm close, but I just can't get it...

 Thanks

 cn

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] First elements of a list.

2006-08-29 Thread cory
Suppose I have the following list:

a - strsplit(c(John;Smith, Jane;Doe, koda, gunner), ;)

I want to get to these two vectors without looping...

firstNames:c(John, Jane, koda, gunner)
lastNames:c(Jane, Doe, NA, NA)

Thanks

cn

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.