Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-28 Thread Greg Snow
The problem is that tapply is expecting a vector for the first argument, your 
first argument is a list or data frame, so the length that it sees is the 
number of list elements (columns of the data frame).  You need to either pass a 
single vector, or use functions like aggregate or the plyr package to work on 
all the columns in a data frame.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of RaoulD
> Sent: Saturday, June 26, 2010 10:47 PM
> To: r-help@r-project.org
> Subject: Re: [R] Calculating Summaries for each level of a Categorical
> variable
> 
> 
> Hi Corey,
> 
> Thanks so much for this. However, I get this error for tapply - "Error
> in
> tapply(RT, RT$R, fun=WA):
>   arguments must have same length". Any idea how to get around this?
> 
> Thanks again,
> Raoul
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-
> a-Categorical-variable-tp2269349p2269815.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-27 Thread Christos Argyropoulos

Hi Raoul, 
I presume you need these summaries for a table of descriptive statistics for a 
thesis/report/paper 
("Table 1" as known informally by medical researchers). If this is the case, 
then specify 
method="reverse" to summary.formula. In the following small example, I create 4 
groups of patients 
and specify 2 characteristics per patient (age and gender) and use 
summary.formula to summarize 
characteristics by group. Running the stats on patient characteristics by group 
is optional but 
is included for completeness. If you are looking for something like this I 
strongly advise you spent
some time fiddling around with summary.formula and read: 

Harrell FE (2004): Statistical tables and plots using S and LaTeX 
(available from 
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatReport/summary.pdf

The 2-3 hours you are going to need to familiarize yourself with this package 
are really worth spending 
for (especially if you are going to use call LaTEX on the output). If you are a 
Windows user, copy and
paste the output of the print function into Excel or OpenOffice and use the 
Text to Columns facilities
of the two programs to format the output into a table that can be used inside a 
manuscript.

Christos

## R-code follows

library(Hmisc)
## One baseline factor (e.g. patient group)
grp<-round(runif(20,1,4))
grp<-factor(grp,labels=paste("Group",1:4))

## Another factor (e.g. sex)
sex<-round(runif(20,1,2))
sex<-factor(sex,labels=c("Male","Female"))

## A continuous variable (e.g. age)
age<-rlnorm(20,4,.1)

## A data frame
data<-data.frame(age=age,grp=grp,sex=sex)

## Table 1
sm<-summary(grp~sex+age,method="reverse",overall=T,test=T)
print(sm,dig=2,exclude1=F)

Descriptive Statistics by grp

+--+--+--+--+--+--++
|  |Group 1   |Group 2   |Group 3   |Group 
4   |Combined  |  Test  |
|  |(N=3) |(N=6) |(N=8) 
|(N=3) |(N=20)    |Statistic   |
+--+--+--+--+--+--++
|sex : Male|  67% ( 2)|  67% ( 4)|  25% ( 2)|  
67% ( 2)|  50% (10)|Chi-square=3.3 d.f.=3 P=0.34|
+--+--+--+--+--+--++
|    Female|  33% ( 1)|  33% ( 2)|  75% ( 6)|  
33% ( 1)|  50% (10)|    |
+--+--+--+--+--+--++
|age   |  60/62/65|  51/55/60|  46/51/57|  
46/48/52|  49/54/60|   F=2.9 d.f.=3,16 P=0.068  |
+--+--+--+--+--+--++

  

> Date: Sat, 26 Jun 2010 21:48:05 -0700
> From: raoul.t.dso...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] Calculating Summaries for each level of a Categorical
> variable
> 
> 
> Hi Christos,
> 
> Thanks for this. I had a look at Summary.Forumla in the Hmisc package and it
> is extremely complicated for me. Still trying to decipher how I could use
> it.
> 
> Regards,
> Raoul
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
_
Hotmail: Free, trusted and rich email service.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-27 Thread David W Freedman

I'd suggest using the functions within the Hmisc and plyr packages:

library(Hmisc); library(plyr)
df=data.frame(v1=rnorm(10), v2=rnorm(10),
wt=sample(1:5,rep=T),sex=rep(0:1,each=5)); df
ddply(df,~sex,summarise,v1.avg=wtd.mean(v1,wt),v2.avg=wtd.mean(v2,wt))

hope this helps, david freedman

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2270001.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-27 Thread Corey Sparks

the variable you want to analyze (first argument to tapply) and the  
variable you want to analyze by (the factor, second arg to tapply)  
both must have the same number of rows, that' s how I read this.
CS

Corey Sparks
Assistant Professor
Department of Demography and Organization Studies
College of Public Policy
501 West Durango Blvd
Monterrey Building 2.270C
San Antonio, TX 78207
corey.sparks 'at' utsa.edu
210 458 3166

On Jun 26, 2010, at 11:46 PM, RaoulD [via R] wrote:

> Hi Corey,
>
> Thanks so much for this. However, I get this error for tapply -  
> "Error in tapply(RT, RT$R, fun=WA):
>   arguments must have same length". Any idea how to get around this?
>
> Thanks again,
> Raoul
>
> View message @ 
> http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269815.html
> To unsubscribe from Re: Calculating Summaries for each level of a  
> Categorical variable, click here.
>




-
Corey Sparks, PhD
Assistant Professor
Department of Demography and Organization Studies
University of Texas at San Antonio
501 West Durango Blvd
Monterey Building 2.270C
San Antonio, TX 78207
210-458-3166
corey.sparks 'at' utsa.edu
https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269988.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-27 Thread David Hajage
You could try the remix function in remix package.

David

Le 27 juin 2010 à 06:48, RaoulD  a écrit :

>
> Hi Christos,
>
> Thanks for this. I had a look at Summary.Forumla in the Hmisc
> package and it
> is extremely complicated for me. Still trying to decipher how I
> could use
> it.
>
> Regards,
> Raoul
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-27 Thread RaoulD

Hi Corey,

Thanks so much for this. However, I get this error for tapply - "Error in
tapply(RT, RT$R, fun=WA): 
  arguments must have same length". Any idea how to get around this?

Thanks again,
Raoul
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269815.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-27 Thread RaoulD

Hi Christos,

Thanks for this. I had a look at Summary.Forumla in the Hmisc package and it
is extremely complicated for me. Still trying to decipher how I could use
it.

Regards,
Raoul
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-26 Thread Christos Argyropoulos

Look at the summary.formula function inside package Hmisc

Christos

> Date: Sat, 26 Jun 2010 05:17:34 -0700
> From: raoul.t.dso...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Calculating Summaries for each level of a Categorical variable
> 
> 
> Hi,
> 
> I have a dataset which has a categorical variable "R",a count variable C
> (integer) and 4 or more numeric variables (A,T,W,H - integers) containing
> measures for "R". I would like to summarize each level of the variable R by
> the average for A,T,W and H. 
> 
> I have written a function to calculate weighted averages using C as the
> weight and this is given below. The function works perfectly but how do I
> add the additional dimension I require to this function?
> 
> Dataset: RT=
> R A  T   W   H
> R1   10 20 20  10
> R2   60 20 50  10
> R3   45 10 20  50
> R4   68 50 20  10
> R1   73 20 40  46
> R3   25 30 10  54
> R3   36 90 20  10
> R2   29 10 30  30
> 
> # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C
> WA<-function(A,C) {
>  sp_A<-c(A %*% C)
>  sum_C<-sum(C)
>  WA<-sp_A/sum_C   
>  return(WA)  
>  }
> 
> I am trying to incorporate the additional step of calculating the weighted
> average of A,T,W and H for each level of R. Need help with this.
> 
> Thanks in advance!
> Raoul
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
_
Hotmail: Powerful Free email with security by Microsoft.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating Summaries for each level of a Categorical variable

2010-06-26 Thread RaoulD

Hi,

I have a dataset which has a categorical variable "R",a count variable C
(integer) and 4 or more numeric variables (A,T,W,H - integers) containing
measures for "R". I would like to summarize each level of the variable R by
the average for A,T,W and H. 

I have written a function to calculate weighted averages using C as the
weight and this is given below. The function works perfectly but how do I
add the additional dimension I require to this function?

Dataset: RT=
R A  T   W   H
R1   10 20 20  10
R2   60 20 50  10
R3   45 10 20  50
R4   68 50 20  10
R1   73 20 40  46
R3   25 30 10  54
R3   36 90 20  10
R2   29 10 30  30

# FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C
WA<-function(A,C) {
 sp_A<-c(A %*% C)
 sum_C<-sum(C)
 WA<-sp_A/sum_C   
 return(WA)  
 }

I am trying to incorporate the additional step of calculating the weighted
average of A,T,W and H for each level of R. Need help with this.

Thanks in advance!
Raoul
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-26 Thread Corey Sparks

Did you try tapply?
?tapply

tapply(RT, RT$R, fun=WA)

or something like that

-
Corey Sparks, PhD
Assistant Professor
Department of Demography and Organization Studies
University of Texas at San Antonio
501 West Durango Blvd
Monterey Building 2.270C
San Antonio, TX 78207
210-458-3166
corey.sparks 'at' utsa.edu
https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269444.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.