On Mon, 25 Aug 2003, Murray Jorgensen wrote: > At 08:12 25/08/2003 +0100, Prof Brian Ripley wrote: > >I think that is only a medium-sized file. > > "Large" for my purposes means "more than I really want to read into memory" > which in turn means "takes more than 30s". I'm at home now and the file > isn't so I'm not sure if the file is large or not. > > More responses interspesed below. BTW, I forgot to mention that I'm using > Windows and so do not have nice unix tools readily available.
But you do, thanks to me, as you need them to installed R packages. > >On Mon, 25 Aug 2003, Murray Jorgensen wrote: > > > >> I'm wondering if anyone has written some functions or code for handling > >> very large files in R. I am working with a data file that is 41 > >> variables times who knows how many observations making up 27MB altogether. > >> > >> The sort of thing that I am thinking of having R do is > >> > >> - count the number of lines in a file > > > >You can do that without reading the file into memory: use > >system(paste("wc -l", filename)) > > Don't think that I can do that in Windows XL. I presume you mean Windows XP? Of course you can, and wc.exe is in Rtools.zip! > or read in blocks of lines via a > >connection > > But that does sound promising! > > > > >> - form a data frame by selecting all cases whose line numbers are in a > >> supplied vector (which could be used to extract random subfiles of > >> particular sizes) > > > >R should handle that easily in today's memory sizes. Buy some more RAM if > >you don't already have 1/2Gb. As others have said, for a real large file, > >use a RDBMS to do the selection for you. > > It's just that R is so good in reading in initial segments of a file that I > can't believe that it can't be effective in reading more general > (pre-specified) subsets. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help