Re: [R-pkg-devel] How to decrease time to import files in xlsx format?

Jeff Newmiller Tue, 04 Oct 2022 12:59:43 -0700

It looks like you are reading directly from URLs? How do you know the delay is 
not network I/O delay?


Parallel computation is not a panacea. It allows tasks _that are CPU-bound_ to 
get through the CPU-intensive work faster. You need to be certain that your 
tasks actually can benefit from parallelism before using it... there is a 
significant overhead and added complexity to using parallel processing that 
will lead to SLOWER processing if mis-used.

On October 4, 2022 11:29:54 AM PDT, Igor L <igorlal...@gmail.com> wrote:
>Hello all,
>
>I'm developing an R package that basically downloads, imports, cleans and
>merges nine files in xlsx format updated monthly from a public institution.
>
>The problem is that importing files in xlsx format is time consuming.
>
>My initial idea was to parallelize the execution of the read_xlsx function
>according to the number of cores in the user's processor, but apparently it
>didn't make much difference, since when trying to parallelize it the
>execution time went from 185.89 to 184.12 seconds:
>
># not parallelized code
>y <- purrr::map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
>               readxl::read_excel, sheet = 1, skip = 4, col_types =
>c(rep('text', 30)))
>
># parallelized code
>plan(strategy = future::multicore(workers = 4))
>y <- furrr::future_map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
>                             readxl::read_excel, sheet = 1, skip = 4,
>col_types = c(rep('text', 30)))
>
> Any suggestions to reduce the import processing time?
>
>Thanks in advance!
>

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] How to decrease time to import files in xlsx format?

Reply via email to