On Jan 3, 2008 9:00 AM, BEP <[EMAIL PROTECTED]> wrote: > Hello all, > > I am working with a very large data set into R, and I have no interest in > reviving my SAS skills. To do this, I will need to drop unwanted variables > given the size of the data file. The most common strategy seems to be > subsetting the data after it is read into R. Unfortunately, given the size > of the data set, I can't get the file read and then subsquently do the > subset procedure. I would be appreciative of help on the following: > > 1. What are the possibilities of reading in just a small set of variables > during the <read.table> statement (or another 'read' statement)? That is, > is it possible specify just the variables that I want to keep?
read.table can skip columns. Specify the releveant component of colClasses as NULL. > > 2. Can I randomly select a set of observations during the 'read' statement? > > > I have searched various R resources for this information, so if I am simply > overlooking a key resource on this issue, pointing that out to me would be > greatly appreciated. > The development version of sqldf can do all of the above (i.e. read in a subset of columns, a subset of rows or a random subset of rows) subject to certain limitations on the input format. See Example 6 on the home page: http://sqldf.googlecode.com readTable in the R.utils package can also read in a subset of rows and columns. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

