Thanks, yeah, I think scan is more promising. I'll check it out. On 4/8/24 11:49, Bert Gunter wrote: > No idea, but have you tried using ?scan to read those next 5 rows? It > might give you a better idea of the pathologies that are causing > problems. For example, an unmatched quote might result in some huge > number of characters trying to be read into a single element of a > character variable. As your previous respondent said, resolving such > problems can be a challenge. > > Cheers, > Bert > > > > On Mon, Apr 8, 2024 at 8:06 AM Dave Dixon <ddi...@swcp.com> wrote: > > Greetings, > > I have a csv file of 76 fields and about 4 million records. I know > that > some of the records have errors - unmatched quotes, specifically. > Reading the file with readLines and parsing the lines with > read.csv(text > = ...) is really slow. I know that the first 2459465 records are > good. > So I try this: > > > startTime <- Sys.time() > > first_records <- read.csv(file_name, nrows = 2459465) > > endTime <- Sys.time() > > cat("elapsed time = ", endTime - startTime, "\n") > > elapsed time = 24.12598 > > > startTime <- Sys.time() > > second_records <- read.csv(file_name, skip = 2459465, nrows = 5) > > endTime <- Sys.time() > > cat("elapsed time = ", endTime - startTime, "\n") > > This appears to never finish. I have been waiting over 20 minutes. > > So why would (skip = 2459465, nrows = 5) take orders of magnitude > longer > than (nrows = 2459465) ? > > Thanks! > > -dave > > PS: readLines(n=2459470) takes 10.42731 seconds. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.