A couple follow up questions:

 

1. Is there any way to modify this so that non-numeric values are ignored? (As 
it is, length seems to "count" the NA values.)



  

2. In order fro the cbind function  "x <- cbind(x, do.call("rbind", r))"   to 
work as intended, does the data need to be “Ordered” by State and Year? 

 

e.g. "x <- x[order(x$State,x$Year), ]"

 

Here is some sample data, with non-numeric values included:

 

Year,State,Subject,Income

2000,TX,1,30776

2000,AL,1,81240

2000,TX,2,28035

2000,AL,2,35947

2000,TX,3,42010

2000,AL,3,48830

2000,TX,4,18040

2000,AL,4,77758

2000,TX,5,20771

2000,AL,5,59132

2000,TX,6,46370

2000,AL,6,45573

2000,TX,7,57256

2000,AL,7,83402

2000,TX,8,3780

2000,AL,8,90695

2000,TX,9,51745

2000,AL,9,4105

2000,TX,10,1154

2000,AL,10,96598

2001,TX,1,25767

2001,AL,1,37032

2001,TX,2,39848

2001,AL,2,69029

2001,TX,3,17142

2001,AL,3,92850

2001,TX,4,62939

2001,AL,4,82730

2001,TX,5,30708

2001,AL,5,25339

2001,TX,6,64710

2001,AL,6,44541

2001,TX,7,96699

2001,AL,7,9151

2001,TX,8,57793

2001,AL,8,20981

2001,TX,9,12523

2001,AL,9,36139

2001,TX,10,53553

2001,AL,10,3767

2002,TX,1,55232

2002,AL,1,54655

2002,TX,2,76255

2002,AL,2,53581

2002,TX,3,77030

2002,AL,3,34869

2002,TX,4,98956

2002,AL,4,60332

2002,TX,5,33052

2002,AL,5,12348

2002,TX,6,96057

2002,AL,6,24509

2002,TX,7,66177

2002,AL,7,45952

2002,TX,8,73331

2002,AL,8,35813

2002,TX,9,3014

2002,AL,9,57097

2002,TX,10,83657

2002,AL,10,91640

2003,TX,1,5638

2003,AL,1,17026

2003,TX,2,66902

2003,AL,2,71080

2003,TX,3,88195

2003,AL,3,95415

2003,TX,4,13028

2003,AL,4,49123

2003,TX,5,19867

2003,AL,5,22990

2003,TX,6,67639

2003,AL,6,69435

2003,TX,7,62469

2003,AL,7,59939

2003,TX,8,24874

2003,AL,8,44829

2003,TX,9,77180

2003,AL,9,68488

2003,TX,10,80686

2003,AL,10,72622

2004,TX,1,46854

2004,AL,1,62499

2004,TX,2,20461

2004,AL,2,53834

2004,TX,3,54909

2004,AL,3,69527

2004,TX,4,33066

2004,AL,4,78035

2004,TX,5,23569

2004,AL,5,59757

2004,TX,6,44514

2004,AL,6,41223

2004,TX,7,85665

2004,AL,7,91972

2004,TX,8,30073

2004,AL,8,90642

2004,TX,9,32741

2004,AL,9,97111

2004,TX,10,8093

2004,AL,10,20077

2005,TX,1,48377

2005,AL,1,88216

2005,TX,2,35752

2005,AL,2,74897

2005,TX,3,27772

2005,AL,3,88945

2005,TX,4,86512

2005,AL,4,88422

2005,TX,5,27488

2005,AL,5,21140

2005,TX,6,35777

2005,AL,6,32772

2005,TX,7,77477

2005,AL,7,98282

2005,TX,8,73346

2005,AL,8,38943

2005,TX,9,38947

2005,AL,9,70195

2005,TX,10,23890

2005,AL,10,84020

2000,TX,11,na

2005,AL,11,null

 

 

 

 

 


Sundar Dorai-Raj <[EMAIL PROTECTED]> wrote: 

t c wrote:
> What is the easiest way to calculate a percent rank “by” an index key?
> 
> 
> 
> Foe example, I have a dataset with 3 fields:
> 
> 
> 
> Year, State, Income ,
> 
> 
> 
> I wish to calculate the rank, by year, by state.
> 
> I also wish to calculate the “percent rank”, where I define percent rank as 
> rank/n.
> 
> 
> 
> (n is the number of numeric data points within each date-state grouping.)
> 
> 
> 
> 
> 
> This is what I am currently doing:
> 
> 
> 
> 1. I create a “group by” field by using the paste function to combine date 
> and state into a field called date_state. I then use the rank function to 
> calculate the rank by date, by state. 
> 
> 
> 
> 2. I then add a field called “one” that I set to 1 if the value in income is 
> numeric and to 0 if it is not.
> 
> 
> 
> 3. I then take an aggregate sum of “one”. This gives me a count (n) for each 
> date-state grouping.
> 
> 
> 
> 
> 
> 4. I next use merge to add this count to the table.
> 
> 
> 
> 5. Finally, I calculate the percent rank.
> 
> 
> 
> Pr<-rank/n
> 
> 
> 
> The merge takes quite a bit of time to process. 
> 
> 
> 
> Is there an easier/more efficient way to calculate the percent rank?
> 

How about using ?by:

set.seed(100)
# fake data set, replace with your own
# "Subject" is just a dummy to produce replicates
x <- expand.grid(Year = 2000:2005,
State = c("TX", "AL"),
Subject = 1:10)
x$Income <- floor(runif(NROW(x)) * 100000)

r <- by(x$Income, x[c("Year", "State")],
function(x) {
r <- rank(x)
n <- length(x)
cbind(Rank = r, PRank = r/n)
})
x <- cbind(x, do.call("rbind", r))

HTH,

--sundar



Sundar Dorai-Raj <[EMAIL PROTECTED]> wrote:

t c wrote:
> What is the easiest way to calculate a percent rank “by” an index key?
> 
> 
> 
> Foe example, I have a dataset with 3 fields:
> 
> 
> 
> Year, State, Income ,
> 
> 
> 
> I wish to calculate the rank, by year, by state.
> 
> I also wish to calculate the “percent rank”, where I define percent rank as 
> rank/n.
> 
> 
> 
> (n is the number of numeric data points within each date-state grouping.)
> 
> 
> 
> 
> 
> This is what I am currently doing:
> 
> 
> 
> 1. I create a “group by” field by using the paste function to combine date 
> and state into a field called date_state. I then use the rank function to 
> calculate the rank by date, by state. 
> 
> 
> 
> 2. I then add a field called “one” that I set to 1 if the value in income is 
> numeric and to 0 if it is not.
> 
> 
> 
> 3. I then take an aggregate sum of “one”. This gives me a count (n) for each 
> date-state grouping.
> 
> 
> 
> 
> 
> 4. I next use merge to add this count to the table.
> 
> 
> 
> 5. Finally, I calculate the percent rank.
> 
> 
> 
> Pr<-rank/n
> 
> 
> 
> The merge takes quite a bit of time to process. 
> 
> 
> 
> Is there an easier/more efficient way to calculate the percent rank?
> 

How about using ?by:

set.seed(100)
# fake data set, replace with your own
# "Subject" is just a dummy to produce replicates
x <- expand.grid(Year = 2000:2005,
State = c("TX", "AL"),
Subject = 1:10)
x$Income <- floor(runif(NROW(x)) * 100000)

r <- by(x$Income, x[c("Year", "State")],
function(x) {
r <- rank(x)
n <- length(x)
cbind(Rank = r, PRank = r/n)
})
x <- cbind(x, do.call("rbind", r))

HTH,

--sundar

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


                
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to