Dear R Users, I have developed the following code for importing a series of zipped CSV by parallel computing.
My problems are that: A) Some ZIP Files (Which contain CSVs inside) are corrupted, and cannot be opened. B) After executing parRapply I can only see the last.warning variable error, for knowing which CSV have failed in each node, but I cannot see all warnings, only 1 at a time. So: * For showing a list of all warnings in all nodes, I was thinking of using the following function in the code: warnings(DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function)) Would this work? * And also, How could I check that a CSV can be opened before applying the function, and create an empty data.frame for those CSV. Thank you, Juan CODE ################################################################################ ## DISPOIN Data Import Into MariaDB ################################################################################ ## ----------------------------------------------------------------------------- ## Packages ## ----------------------------------------------------------------------------- # update.packages("RODBC") # update.packages("tidyverse") ## ----------------------------------------------------------------------------- ## Libraries ## ----------------------------------------------------------------------------- suppressMessages(require(RODBC)) suppressMessages(require(tidyverse)) suppressMessages(require(parallel)) ## ----------------------------------------------------------------------------- ## CMD: Command for DISPOIN's Directory Acquisition ## ----------------------------------------------------------------------------- # shell(cmd = 'pushd "\\srvdiscsv\data" && dir *AL*.zip /b /s > D:\DISPOIN_Data_Directories.csv && popd') ## ----------------------------------------------------------------------------- ## RODBC ## ----------------------------------------------------------------------------- ## A) MariaDB Connection String con <- odbcConnect("MariaDB_Tornado24") invisible(sqlQuery(con, "USE dispoin;")) # B) Import R Data Directories from MariaDB DISPOIN_DIR_REL <- as_tibble(sqlFetch(con, "dispoin.t_DISPOIN_DIR_REL")) odbcClose(con) # C) Import Zipped CSV data into List of Dataframes, which latter on are compiled as a single dataframe by # means of rbind # C.1) parRapply Function Initialization: parRaplly_Function <- function (DISPOIN_CSV_Row) { return(read_csv2( file = DISPOIN_CSV_Row, col_names = c( "SCADA", "TAG", "ID_del_AEG", "Descripcion", "Time_ON", "Time_OFF", "Delta_Time", "Comentario", "Es_Alarma", "Es_Ultima", "Comentarios"), col_types = cols( "SCADA" = "c", "TAG" = "c", "ID_del_AEG" = "c", "Descripcion" = "c", "Time_ON" = "c", "Time_OFF" = "c", "Delta_Time" = "c", "Comentario" = "c", "Es_Alarma" = "c", "Es_Ultima" = "c", "Comentarios" = "c"), locale = default_locale(), na = c("", " "), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = FALSE)) } # C.2) parallel Package: Environment Settings no_cores <- detectCores() c1 <- makeCluster(no_cores) invisible(clusterEvalQ(c1, library(readr))) setDefaultCluster(c1) # C.3) parRapply Function Application: DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function) suppressWarnings(stopCluster(c1)) # D) List's Tibbles Compilation into a single Tibble: DISPOIN_CSV <- do.call(rbind, DISPOIN_CSV_List) # E) Write Compiled Table into CSV: write_csv( DISPOIN_CSV, path = file.path("D:/MySQL/R", "DISPOIN_CSV.csv"), na = "\\N", append = FALSE, col_names = TRUE) # F) Data Cleaning: Environment Variable Removal rm(list=ls()) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.