On Thu, 2 Jun 2005, Peter Dalgaard wrote:

"Andy Bunn" <[EMAIL PROTECTED]> writes:

Hi all:

I have acquired a 100s of data files that I need to preprocess to get them
usable in R. The files are fixed width (to a point) and contain 1 to 3 lines
of header, followed by a variable number of fixed width data lines (that I
can read with read.fwf). I want to read through the files and remove every
_line_ where characters column 83-86 do not equal "STD". If I can do that
and store it in a text file, then I can get the data I need using read.fwf.
I can't figure out how to do this because of the irregularity of the header
info buried in the file. It seems like the kind of thing perl or emacs would
be good at but I'd like to do it all in R if possible. Any pointers
appreciated.

How large are the files? With today's RAM sizes, it could be feasible
to do something along the lines of

1) x <- readLines(....), i <- read.fwf(...col83-86...)
2) read.fwf(textConnection(x[I %in% "STD"]),......)

or use a file() (no file= argument) connection, which will be faster for large files (and read.fwf should probably use internally).

I would have used

x <- readLines(...)
tmp <- file()
writeLines(x[substr(x, 83, 86) == "STD"], tmp)
read.fwf(tmp, ...)


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to