Hello, I have a package where parallel computing is slower than sequential computing with cross-validation. For instance, please see the following reproducible example with 10 cores and the relevant lines in the corresponding R function. The R package is also attached. I thought parallel computing should improve the speed. However, it is not obvious to me where it went wrong.
Many thanks in advance. Zhu Wang library("mpath") library("pscl") data("bioChemists", package = "pscl") >system.time(cv.zipath(art ~ . | ., data = bioChemists, family = "negbin", >nlambda=100, parallel=FALSE)) user system elapsed 77.430 0.031 79.547 > system.time(cv.zipath(art ~ . | ., data = bioChemists, family = "negbin", > nlambda=100, parallel=TRUE, n.cores=10)) user system elapsed 95.694 0.517 106.072 R> sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS The relevant lines in cv.zipath regarding parallel argument are below: if(parallel){ registerDoParallel(cores=n.cores) i <- 1 ###needed to pass R CMD check with parallel code below residmat <- foreach(i=seq(K), .combine=cbind, .packages='mpath') %dopar% { omit <- all.folds[[i]] fitcv <- do.call("zipath", list(formula, data[-omit,], weights[-omit], lambda.count=lambda.count, lambda.zero=lambda.zero, nlambda=nlambda, ...)) logLik(fitcv, newdata=data[omit,, drop=FALSE], Y[omit], weights=weights[omit]) } stopImplicitCluster() } else{ residmat <- matrix(NA, nlambda, K) for(i in seq(K)) { if(trace) cat("\n CV Fold", i, "\n\n") omit <- all.folds[[i]] fitcv <- do.call("zipath", list(formula, data[-omit,], weights[-omit], lambda.count=lambda.count, lambda.zero=lambda.zero, nlambda=nlambda, ...)) residmat[, i] <- logLik(fitcv, newdata=data[omit,, drop=FALSE], Y[omit], weights=weights[omit]) } }
mpath_0.3-13.tar.gz
Description: mpath_0.3-13.tar.gz
______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel