On Mon, 16 Jul 2007, Pat Carroll wrote: > Hello, all. > > I am working on a project with a large (~350Mb, about 5800 rows) > insurance claims dataset. It was supplied in a tilde(~)-delimited > format. I imported it into a data frame in R by setting memory.limit to > maximum (4Gb) for my computer and using read.table. > > The resulting data frame had 10 bad rows. The errors appear due to > read.table missing delimiter characters, with multiple data being > imported into the same cell, then the remainder of the row and the next > run together and garbled due to the reading frame shift (example: a > single cell might contain: <datum>~ ~ <datum> ~<datum>, after which all > the cells of the row and the next are wrong). > > To replicate, I tried the same import procedure on a smaller > demographics data set from the same supplier- only about 1Mb, and got > the same kinds of errors (5 bad rows in about 3500). I also imported as > much of the file as Excel would hold and cross-checked, Excel did not > produce the same errors but can't handle the entire file. I have used > read.table on a number of other formats (mainly csv and tab-delimited) > without such problems; so far it appears there's something different > about these files that produces the errors but I can't see what it would > be.
The usual cause is that the user forgot about quotes and comment characters. Try quote="", comment.char="" If that does not work, please follow the request in the footer of every message on this list. > Does anyone have any thoughts about what is going wrong? And is there a > way, short of manual correction, for fixing it? > > Thanks for all help, > ~Pat. > > > Pat Carroll. > what matters most is how well you walk through the fire. > bukowski. > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
