Re: [R] Conditional mean for groups, new variables

2014-06-02 Thread arun


Hi,
Regarding your first comment, you didn't provide any reproducible example. So I 
created one with SCHOOLID's as alphabets.  According to your original post, you 
had a read dataset with 36000 SCHOOLIDs.  Suppose, if I created the SCHOOLIDs 
using:
 length(outer(LETTERS,1:2000,paste,sep=))
#[1] 52000

#Please note that I am creating only 6 columns as an example
set.seed(42)
rev1 - data.frame(SCHOOLID = sample(outer(LETTERS,1:1000,paste,sep=),36e3, 
replace=TRUE), matrix(sample(180, 36e3*5,replace=TRUE), ncol=5, 
dimnames=list(NULL, c(MATH, AGE, STO2Q01, BFMJ, 
BMMJ))),stringsAsFactors=FALSE)      
 dim(rev1)
#[1] 36000 6


res1 - aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
 dim(res1)
#[1] 26010 6
 head(res1,2)
# SCHOOLID  MATH AGE STO2Q01 BFMJ BMMJ
#1   A1 107.5  30    41.5   75  149
#2 A100 159.5 132   107.0   66   15
colMeans(rev1[rev1$SCHOOLID==A1,-1])
#   MATH AGE STO2Q01    BFMJ    BMMJ 
#  107.5    30.0    41.5    75.0   149.0 


#I am not following the second statement.  Please provide a reproducible 
example using ?dput().
May be you want results in this form:

rev2 - data.frame(SCHOOLID=rev1[,1], sapply(rev1[-1],function(x) ave(x, 
rev1[,1], FUN= mean, na.rm=TRUE)))

A.K.


I'm sorry, but it does not :(
It gives results maximum only for first 26 schools (according to the number of 
letters in the alphabet). And according to the result it counts not an avreage 
values of the factors. 


On Sunday, June 1, 2014 8:37 PM, arun smartpink...@yahoo.com wrote:
Hi,
May be this helps:


set.seed(42)
rev1 - data.frame(SCHOOLID=sample(LETTERS[1:4],20,replace=TRUE), 
matrix(sample(25, 20*5,replace=TRUE), ncol=5, dimnames=list(NULL, c(MATH, 
AGE, STO2Q01, BFMJ, BMMJ))),stringsAsFactors=FALSE)  
res1 - aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
res1
#if you need to change the names
res2 - setNames(aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), 
mean,na.rm=TRUE), c(SCHOOLID, paste(colnames(rev1)[-1], MEAN,sep=_)))
res2

A.K.


Hello! I have a problem, I want to calculate conditional mean for my dataset. 
First, I attach it:
rev-read.csv(MATH1.csv, header=T, sep=;, dec=,)
attach(rev)
I have 65 observations (test score) and 36000 groups (schoolid)
I need to calculate the mean for every group (schoolid) for the all my 
variables (MATH, AGE, ST02Q01,BFMJ,BMMJ. Actually, I have 34 varables, I just 
don't want to list them here)  and then to create new variables for obtained 
new columns, because I want to estimate a new regression for the new obtained 
average values.
The following method is not appropriate for me, because it gives me in result a 
table with schoolid and the average for one variables, and I don't know how to 
extract the MATH coulmn with average values from the table with results to the 
worklist separately(environment).
aggregate( MATH~SCHOOLID, rev, mean)
How can I solve this problem? Thank for help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional mean for groups, new variables

2014-06-02 Thread arun
Hi,
If you want to extract only particular variables, check ?subset, ?Extract.
Using my first example:
aggregate(MATH~SCHOOLID,rev1, mean)[,-1,drop=FALSE]
#  MATH
#1 14.5
#2 17.2
#3 13.71429
#4 13.8
# more than one variable
res1 - 

aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE) ##Column1 is 
SCHOOLID
res1New - res1[,-1] 
res1New
#  MATH  AGE   STO2Q01  BFMJ BMMJ
#1 14.5 10.5 15.50  8.00 14.0
#2 17.2  7.6 10.20 18.60 12.8
#3 13.71429 17.28571  9.142857  9.857143 17.85714
#4 13.8 15.3 13.67 11.67 11.0
#or
 res1[!grepl(SCHOOLID, colnames(res1))]
A.K.


I tried to explain all the things that I want to do in this picture :) Sorry, 
if it's not so understandable, but I tried :) 




On Monday, June 2, 2014 4:02 AM, arun smartpink...@yahoo.com wrote:


Hi,
Regarding your first comment, you didn't provide any reproducible example. So I 
created one with SCHOOLID's as alphabets.  According to your original post, you 
had a read dataset with 36000 SCHOOLIDs.  Suppose, if I created the SCHOOLIDs 
using:
 length(outer(LETTERS,1:2000,paste,sep=))
#[1] 52000

#Please note that I am creating only 6 columns as an example
set.seed(42)
rev1 - data.frame(SCHOOLID = sample(outer(LETTERS,1:1000,paste,sep=),36e3, 
replace=TRUE), matrix(sample(180, 36e3*5,replace=TRUE), ncol=5, 
dimnames=list(NULL, c(MATH, AGE, STO2Q01, BFMJ, 
BMMJ))),stringsAsFactors=FALSE)      
 dim(rev1)
#[1] 36000 6


res1 - aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
 dim(res1)
#[1] 26010 6
 head(res1,2)
# SCHOOLID  MATH AGE STO2Q01 BFMJ BMMJ
#1   A1 107.5  30    41.5   75  149
#2 A100 159.5 132   107.0   66   15
colMeans(rev1[rev1$SCHOOLID==A1,-1])
#   MATH AGE STO2Q01    BFMJ    BMMJ 
#  107.5    30.0    41.5    75.0   149.0 


#I am not following the second statement.  Please provide a reproducible 
example using ?dput().
May be you want results in this form:

rev2 - data.frame(SCHOOLID=rev1[,1], sapply(rev1[-1],function(x) ave(x, 
rev1[,1], FUN= mean, na.rm=TRUE)))

A.K.


I'm sorry, but it does not :(
It gives results maximum only for first 26 schools (according to the number of 
letters in the alphabet). And according to the result it counts not an avreage 
values of the factors. 





On Sunday, June 1, 2014 8:37 PM, arun smartpink...@yahoo.com wrote:
Hi,
May be this helps:


set.seed(42)
rev1 - data.frame(SCHOOLID=sample(LETTERS[1:4],20,replace=TRUE), 
matrix(sample(25, 20*5,replace=TRUE), ncol=5, dimnames=list(NULL, c(MATH, 
AGE, STO2Q01, BFMJ, BMMJ))),stringsAsFactors=FALSE)  
res1 - aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
res1
#if you need to change the names
res2 - setNames(aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), 
mean,na.rm=TRUE), c(SCHOOLID, paste(colnames(rev1)[-1], MEAN,sep=_)))
res2

A.K.


Hello! I have a problem, I want to calculate conditional mean for my dataset. 
First, I attach it:
rev-read.csv(MATH1.csv, header=T, sep=;, dec=,)
attach(rev)
I have 65 observations (test score) and 36000 groups (schoolid)
I need to calculate the mean for every group (schoolid) for the all my 
variables (MATH, AGE, ST02Q01,BFMJ,BMMJ. Actually, I have 34 varables, I just 
don't want to list them here)  and then to create new variables for obtained 
new columns, because I want to estimate a new regression for the new obtained 
average values.
The following method is not appropriate for me, because it gives me in result a 
table with schoolid and the average for one variables, and I don't know how to 
extract the MATH coulmn with average values from the table with results to the 
worklist separately(environment).
aggregate( MATH~SCHOOLID, rev, mean)
How can I solve this problem? Thank for help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional mean for groups, new variables

2014-06-01 Thread arun
Hi,
May be this helps:


set.seed(42)
rev1 - data.frame(SCHOOLID=sample(LETTERS[1:4],20,replace=TRUE), 
matrix(sample(25, 20*5,replace=TRUE), ncol=5, dimnames=list(NULL, c(MATH, 
AGE, STO2Q01, BFMJ, BMMJ))),stringsAsFactors=FALSE)  
res1 - aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
res1
#if you need to change the names
res2 - setNames(aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), 
mean,na.rm=TRUE), c(SCHOOLID, paste(colnames(rev1)[-1], MEAN,sep=_)))
res2

A.K.


Hello! I have a problem, I want to calculate conditional mean for my dataset. 
First, I attach it:
rev-read.csv(MATH1.csv, header=T, sep=;, dec=,)
attach(rev)
I have 65 observations (test score) and 36000 groups (schoolid)
I need to calculate the mean for every group (schoolid) for the all my 
variables (MATH, AGE, ST02Q01,BFMJ,BMMJ. Actually, I have 34 varables, I just 
don't want to list them here)  and then to create new variables for obtained 
new columns, because I want to estimate a new regression for the new obtained 
average values.
The following method is not appropriate for me, because it gives me in result a 
table with schoolid and the average for one variables, and I don't know how to 
extract the MATH coulmn with average values from the table with results to the 
worklist separately(environment).
aggregate( MATH~SCHOOLID, rev, mean)
How can I solve this problem? Thank for help! 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.