No idea, but have you tried using ?scan to read those next 5 rows? It might give you a better idea of the pathologies that are causing problems. For example, an unmatched quote might result in some huge number of characters trying to be read into a single element of a character variable. As your previous respondent said, resolving such problems can be a challenge.
Cheers, Bert On Mon, Apr 8, 2024 at 8:06 AM Dave Dixon <ddi...@swcp.com> wrote: > Greetings, > > I have a csv file of 76 fields and about 4 million records. I know that > some of the records have errors - unmatched quotes, specifically. > Reading the file with readLines and parsing the lines with read.csv(text > = ...) is really slow. I know that the first 2459465 records are good. > So I try this: > > > startTime <- Sys.time() > > first_records <- read.csv(file_name, nrows = 2459465) > > endTime <- Sys.time() > > cat("elapsed time = ", endTime - startTime, "\n") > > elapsed time = 24.12598 > > > startTime <- Sys.time() > > second_records <- read.csv(file_name, skip = 2459465, nrows = 5) > > endTime <- Sys.time() > > cat("elapsed time = ", endTime - startTime, "\n") > > This appears to never finish. I have been waiting over 20 minutes. > > So why would (skip = 2459465, nrows = 5) take orders of magnitude longer > than (nrows = 2459465) ? > > Thanks! > > -dave > > PS: readLines(n=2459470) takes 10.42731 seconds. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.