Check out the biglm package for some tools that may be useful.

-----Original Message-----
From: "Eric Doviak" <[EMAIL PROTECTED]>
To: "r-help@stat.math.ethz.ch" <r-help@stat.math.ethz.ch>
Sent: 7/30/07 9:54 AM
Subject: [R] the large dataset problem

Dear useRs,

I recently began a job at a very large and heavily bureaucratic organization. 
We're setting up a research office and statistical analysis will form the 
backbone of our work. We'll be working with large datasets such the SIPP as 
well as our own administrative data.

Due to the bureaucracy, it will take some time to get the licenses for 
proprietary software like Stata. Right now, R is the only statistical software 
package on my computer. 

This, of course, is a huge limitation because R loads data directly into RAM 
making it difficult (if not impossible) to work with large datasets. My 
computer only has 1000 MB of RAM, of which Microsucks Winblows devours 400 MB. 
To make my memory issues even worse, my computer has a virus scanner that runs 
everyday and I do not have the administrative rights to turn the damn thing 
off. 

I need to find some way to overcome these constraints and work with large 
datasets. Does anyone have any suggestions?

I've read that I should "carefully vectorize my code." What does that mean ??? 
!!!

The "Introduction to R" manual suggests modifying input files with Perl. Any 
tips on how to get started? Would Perl Data Language (PDL) be a good choice?  
http://pdl.perl.org/index_en.html

I wrote a script which loads large datasets a few lines at a time, writes the 
dozen or so variables of interest to a CSV file, removes the loaded data and 
then (via a "for" loop) loads the next few lines .... I managed to get it to 
work with one of the SIPP core files, but it's SLOOOOW. Worse, if I discover 
later that I omitted a relevant variable, then I'll have to run the whole 
script all over again.

Any suggestions?

Thanks,
- Eric

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to