[R] Exceptional slowness with read.csv

Dave Dixon Mon, 08 Apr 2024 08:06:17 -0700

Greetings,

I have a csv file of 76 fields and about 4 million records. I know thatsome of the records have errors - unmatched quotes, specifically. Reading the file with readLines and parsing the lines with read.csv(text= ...) is really slow. I know that the first 2459465 records are good.So I try this:


> startTime <- Sys.time()
> first_records <- read.csv(file_name, nrows = 2459465)
> endTime <- Sys.time()
> cat("elapsed time = ", endTime - startTime, "\n")

elapsed time =   24.12598

> startTime <- Sys.time()
> second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
> endTime <- Sys.time()
> cat("elapsed time = ", endTime - startTime, "\n")

This appears to never finish. I have been waiting over 20 minutes.

So why would (skip = 2459465, nrows = 5) take orders of magnitude longerthan (nrows = 2459465) ?


Thanks!

-dave

PS: readLines(n=2459470) takes 10.42731 seconds.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Exceptional slowness with read.csv

Reply via email to