Laura Quinn wrote:
Hi,

I want to perform some analysis on subsets of huge data files. There are
20 of the files and I want to select the same subsets of each one (each
subset is a chunk of 1500 or so consecutive rows from several million). To
save time and processing power is there a method to tell R to *only* read
in these rows, rather than reading in the entire dataset then selecting
subsets and deleting the extraneous data? This method takes a rather silly
amount of time and results in memory problems.

I am using R 1.9.0 on SuSe 9.0

Thanks in advance!


Hi Laura,

I guess if you knew which row of the file your subset started from and you knew how many lines you wanted to read in you could use scan with arguments skip and nlines (see ?scan)

A better way that gets recommended a lot on the list is to store your data in a database and use the various R packages and/or tools available that can connect to your database and only extract the rows you need.

See the R Data Import/Export manual for more on scan and using relational databases with R.

Hope this helps,

Gav
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] [EMAIL PROTECTED]
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to