Dear R experts,
I would like to please ask for your help with repeating steps in an apply
statement.
I have a dataframe that lists multiple variables for a given id and visit,
as well as drug treatment.

> head(exp)
  id visit variable1 variable2 variable3 variable4 drug
1  3     1        13        10         7        11    0
2  3     5        10        15         9         9    0
3  3    12         9        10         8         8    0
4  7     1        12         8         9         8    1
5  7     5        16         9         3        10    1
6  7    12         5        11         9        14    1

I would like process these variables to find the difference between visit 5
and 1 for each id, then summarize this data in terms of means and errors.
 Thus far, with your brilliant advice to employ do.call and lapply, I have
been able to process one variable at a time, but I would much prefer to
loop or repeat the process for each variable in order to create an
efficiently stored set of data.  I would like to get a data set such as:

> exp1
     id  variable drug d5.3
3     3 variable1    0   -3
7     7 variable1    1    4
13   13 variable1    0   -5
56   56 variable1    0    4
78   78 variable1    0    7
109 109 variable1    0   -3
145 145 variable1    0   -2
173 173 variable1    0    9
212 212 variable1    1   -7
3     3 variable2    ?  ?
7     7 variable2    ?  ?
13   13 variable2   ?  ?
56   56 variable2    ?  ?
78   78 variable2   ?  ?
109 109 variable2    ?  ?
145 145 variable2    ?  ?
173 173 variable2   ?  ?
212 212 variable2    ?  ?
3     3 variable3    ?  ?
etc...

> exp2
   variable difference gel mean       sd n       se     X95ci    mean.sd
0 variable1       d5.1   0  1.0 5.567764 7 2.104417  5.149323  0.1796053
1 variable1       d5.1   1 -1.5 7.778175 2 5.500000 69.884126 -0.1928473
      se.sd  X95ci.sd
0 0.3779645 0.9248457
1 0.7071068 8.9846435

But, I have only been able to get the data for the first variable, despite
having attempted loop statements, ie (for i in
c('variable1','variable2','variable3','variable4')), for the variable
names.  Would you please have any thoughts about how to repeat lapply
across many column variables?  I greatly appreciate your thoughts.  I have
supplied the code for the example and my work thus far below:

exp <- data.frame(id= rep(c(3,7,13,56,78,109,145,173,212),each=3)

, visit = rep(c(1,5,12), times = 9 )

, variable1 = round (rnorm ( mean =10,sd = 3, n = 27),0)

, variable2 =  round (rnorm ( mean =10,sd = 3, n = 27),0)

, variable3 =  round (rnorm ( mean =10,sd = 3, n = 27),0)

, variable4 = round (rnorm ( mean =10,sd = 3, n = 27),0)

, drug = rep ( round ( rnorm ( mean = 0.5, sd=0.1, n=9),0),each = 3 ) )

exp [exp[,'visit'] == 1 & exp[,'id']==3 ,]$variable <- NA

exp [exp[,'visit'] == 5 & exp[,'id']==56 ,]$variable <- NA


exp1 <- do.call (rbind

,lapply (split (exp, exp$id), function (.grp) {

data.frame ('id'=.grp$id[1L], 'variable'= 'variable1',  'drug'=.grp$drug[1L
], 'd5-3'= .grp [.grp [['visit']]==5,]$variable1 -  .grp[.grp[['visit']]==1
,]$variable1 )

}))



exp2 <- do.call (rbind

,lapply ( split (exp1,exp1$drug), function (.grp) {

a<- na.omit(.grp$d5.3)

data.frame('variable'='variable1',

'difference'='d5.1',

'gel'=.grp$drug[1L],

'mean'=mean(a),

'sd'=sd(a),

'n'=length(a),

'se'=sd(a)/sqrt(length(a)),

'95ci'= qt(0.975, (length(a)-1)) * sd(a)/sqrt(length(a)),

'mean/sd'=mean(a)/sd(a),

'se/sd'=(sd(a)/sqrt(length(a)))/sd(a),

'95ci/sd'=(qt(0.975,(length(a)-1))*sd(a)/sqrt(length(a)))/sd(a)

)}

)

)


Thanks again for your help, Matt

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to