Hi Dave,

That's rather frustrating. I've found vroom (from the package vroom) to be
helpful with large files like this.

Does the following give you any better luck?

vroom(file_name, delim = ",", skip = 2459465, n_max = 5)

Of course, when you know you've got errors & the files are big like that it
can take a bit of work resolving things. The command line tools awk & sed
might even be a good plan for finding lines that have errors & figuring out
a fix, but I certainly don't envy you.

All the best

Stevie

On Tue, 9 Apr 2024 at 00:36, Dave Dixon <ddi...@swcp.com> wrote:

> Greetings,
>
> I have a csv file of 76 fields and about 4 million records. I know that
> some of the records have errors - unmatched quotes, specifically.
> Reading the file with readLines and parsing the lines with read.csv(text
> = ...) is really slow. I know that the first 2459465 records are good.
> So I try this:
>
>  > startTime <- Sys.time()
>  > first_records <- read.csv(file_name, nrows = 2459465)
>  > endTime <- Sys.time()
>  > cat("elapsed time = ", endTime - startTime, "\n")
>
> elapsed time =   24.12598
>
>  > startTime <- Sys.time()
>  > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
>  > endTime <- Sys.time()
>  > cat("elapsed time = ", endTime - startTime, "\n")
>
> This appears to never finish. I have been waiting over 20 minutes.
>
> So why would (skip = 2459465, nrows = 5) take orders of magnitude longer
> than (nrows = 2459465) ?
>
> Thanks!
>
> -dave
>
> PS: readLines(n=2459470) takes 10.42731 seconds.
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to