It looks like you are reading directly from URLs? How do you know the delay is not network I/O delay?
Parallel computation is not a panacea. It allows tasks _that are CPU-bound_ to get through the CPU-intensive work faster. You need to be certain that your tasks actually can benefit from parallelism before using it... there is a significant overhead and added complexity to using parallel processing that will lead to SLOWER processing if mis-used. On October 4, 2022 11:29:54 AM PDT, Igor L <igorlal...@gmail.com> wrote: >Hello all, > >I'm developing an R package that basically downloads, imports, cleans and >merges nine files in xlsx format updated monthly from a public institution. > >The problem is that importing files in xlsx format is time consuming. > >My initial idea was to parallelize the execution of the read_xlsx function >according to the number of cores in the user's processor, but apparently it >didn't make much difference, since when trying to parallelize it the >execution time went from 185.89 to 184.12 seconds: > ># not parallelized code >y <- purrr::map_dfr(paste0(dir.temp, '/', lista.arquivos.locais), > readxl::read_excel, sheet = 1, skip = 4, col_types = >c(rep('text', 30))) > ># parallelized code >plan(strategy = future::multicore(workers = 4)) >y <- furrr::future_map_dfr(paste0(dir.temp, '/', lista.arquivos.locais), > readxl::read_excel, sheet = 1, skip = 4, >col_types = c(rep('text', 30))) > > Any suggestions to reduce the import processing time? > >Thanks in advance! > -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel