That looks like a perfect job for (g)awk which is in every Linux distribution but also available for Windows. It can be called with something like
system( "awk -f script.awk inputfile.txt" ) and does its job silently and very fast. 650MB should not be an issue. I'm not proficient in awk but would offer my help anyway (off-list...). Rgds, Rainer On Wednesday 14 September 2011 13:08:14 Stefan McKinnon Høj-Edwards wrote: > Dear R-help, > > I have a very large ascii data file, of which I only want to read in > selected lines (e.g. on fourth of the lines); determining which lines > depends on the lines content. So far, I have found two approaches for doing > this in R; 1) Read the file line by line using a repeat-loop and save the > result in a temporary file or a variable, and 2) Read the entire file and > filter/reshape it using *apply methods. To my understanding, the use of > repeat{}-loops are quite slow in R, and reading an entire file to discard 3 > quarters of the data is a bit of an overkill. Not to mention loading an > 650MB text file into memory. > > What I am looking for is a function, that works like the first approach, but > avoiding do- or repeat-loops, so I imagine it is implemented in a > lower-level language, to be more efficient. Naturally, when calling the > function, one would provide a function that determines if/how the line > should be appended to a variable. Alternatively, an object working as an > generator (in Python terms), could be used with the normal *apply > functions. I imagine this working differently from e.g. > sapply(readLines("myfile.txt"), FUN=selector), in that "readLines" would be > executed first, loading the entire file into memory and supplying it to > sapply, whereas the generator-object only reads a line when sapply requests > the next element. > > Are there options for this kind of operation? > > Kind regards, > > Stefan McKinnon Høj-Edwards Dept. of Genetics and Biotechnology > PhD student Faculty of Agricultural Sciences > stefan.hoj-edwa...@agrsci.dk Aarhus University > Tel.: +45 8999 1291 Blichers Allé 20, Postboks 50 > Web: www.iysik.com DK-8830 Tjele > Tel.: +45 8999 1900 > Web: www.agrsci.au.dk > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.