Re: [R] Help with big data and parallel computing: 500, 000 x 4 linear models

2016-08-08 Thread Charles C. Berry
On Mon, 8 Aug 2016, Ellis, Alicia M wrote: I have a large dataset with ~500,000 columns and 1264 rows. Each column represents the percent methylation at a given location in the genome. I need to run 500,000 linear models for each of 4 predictors of interest in the form of:

Re: [R] Help with big data and parallel computing: 500, 000 x 4 linear models

2016-08-08 Thread Aaron Mackey
Don't run 500K separate models. Use the limma package to fit one model that can learn the variance parameters jointly. Run it on your laptop. And don't use %methylation as your Y variable, use logit(percent), i.e. the Beta value. -Aaron On Mon, Aug 8, 2016 at 2:49 PM, Ellis, Alicia M

[R] Help with big data and parallel computing: 500, 000 x 4 linear models

2016-08-08 Thread Ellis, Alicia M
I have a large dataset with ~500,000 columns and 1264 rows. Each column represents the percent methylation at a given location in the genome. I need to run 500,000 linear models for each of 4 predictors of interest in the form of: Methylation.stie1 ~ predictor1 + covariate1+ covariate2 + ...