Dear R experts,
I would like to please ask for your help with repeating steps in an apply
statement.
I have a dataframe that lists multiple variables for a given id and visit,
as well as drug treatment.
> head(exp)
id visit variable1 variable2 variable3 variable4 drug
1 3 1 13 10 7 11 0
2 3 5 10 15 9 9 0
3 3 12 9 10 8 8 0
4 7 1 12 8 9 8 1
5 7 5 16 9 3 10 1
6 7 12 5 11 9 14 1
I would like process these variables to find the difference between visit 5
and 1 for each id, then summarize this data in terms of means and errors.
Thus far, with your brilliant advice to employ do.call and lapply, I have
been able to process one variable at a time, but I would much prefer to
loop or repeat the process for each variable in order to create an
efficiently stored set of data. I would like to get a data set such as:
> exp1
id variable drug d5.3
3 3 variable1 0 -3
7 7 variable1 1 4
13 13 variable1 0 -5
56 56 variable1 0 4
78 78 variable1 0 7
109 109 variable1 0 -3
145 145 variable1 0 -2
173 173 variable1 0 9
212 212 variable1 1 -7
3 3 variable2 ? ?
7 7 variable2 ? ?
13 13 variable2 ? ?
56 56 variable2 ? ?
78 78 variable2 ? ?
109 109 variable2 ? ?
145 145 variable2 ? ?
173 173 variable2 ? ?
212 212 variable2 ? ?
3 3 variable3 ? ?
etc...
> exp2
variable difference gel mean sd n se X95ci mean.sd
0 variable1 d5.1 0 1.0 5.567764 7 2.104417 5.149323 0.1796053
1 variable1 d5.1 1 -1.5 7.778175 2 5.500000 69.884126 -0.1928473
se.sd X95ci.sd
0 0.3779645 0.9248457
1 0.7071068 8.9846435
But, I have only been able to get the data for the first variable, despite
having attempted loop statements, ie (for i in
c('variable1','variable2','variable3','variable4')), for the variable
names. Would you please have any thoughts about how to repeat lapply
across many column variables? I greatly appreciate your thoughts. I have
supplied the code for the example and my work thus far below:
exp <- data.frame(id= rep(c(3,7,13,56,78,109,145,173,212),each=3)
, visit = rep(c(1,5,12), times = 9 )
, variable1 = round (rnorm ( mean =10,sd = 3, n = 27),0)
, variable2 = round (rnorm ( mean =10,sd = 3, n = 27),0)
, variable3 = round (rnorm ( mean =10,sd = 3, n = 27),0)
, variable4 = round (rnorm ( mean =10,sd = 3, n = 27),0)
, drug = rep ( round ( rnorm ( mean = 0.5, sd=0.1, n=9),0),each = 3 ) )
exp [exp[,'visit'] == 1 & exp[,'id']==3 ,]$variable <- NA
exp [exp[,'visit'] == 5 & exp[,'id']==56 ,]$variable <- NA
exp1 <- do.call (rbind
,lapply (split (exp, exp$id), function (.grp) {
data.frame ('id'=.grp$id[1L], 'variable'= 'variable1', 'drug'=.grp$drug[1L
], 'd5-3'= .grp [.grp [['visit']]==5,]$variable1 - .grp[.grp[['visit']]==1
,]$variable1 )
}))
exp2 <- do.call (rbind
,lapply ( split (exp1,exp1$drug), function (.grp) {
a<- na.omit(.grp$d5.3)
data.frame('variable'='variable1',
'difference'='d5.1',
'gel'=.grp$drug[1L],
'mean'=mean(a),
'sd'=sd(a),
'n'=length(a),
'se'=sd(a)/sqrt(length(a)),
'95ci'= qt(0.975, (length(a)-1)) * sd(a)/sqrt(length(a)),
'mean/sd'=mean(a)/sd(a),
'se/sd'=(sd(a)/sqrt(length(a)))/sd(a),
'95ci/sd'=(qt(0.975,(length(a)-1))*sd(a)/sqrt(length(a)))/sd(a)
)}
)
)
Thanks again for your help, Matt
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.