Dear R Helpers,
   I have a pretty large dataframe (150,000 variables, 10,000 entries for
each) and have to run a regression on each of the variables. Recorded are
the pvals.
 I wrote a function and use sapply. The function looks something like this:
calcpval<-function(x){
        modela <- lm(apples~age,data=m)
        modelb <- lm(apples~age+ageSquared,data=m)
        modelc <- lm(apples~age+ageSquared+bmi,data=m)
        p_main <- anova(modela,modelb)$P[2]
        p_main_i <- anova(modela,modelc)$P[2]
        p_i <- anova(modelb,modelc)$P[2]
        return(c(p_main,p_main_i,p_i))
}

This whole thing is terribly slow... I observed that it's faster when
breaking down the file. But other suggestions could you please make to make
it run faster (say days instead of weeks).
Thank you and best regards, Georg.
*****************
Georg Ehret, Johns Hopkins Medicine

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to