To follow (tangently) on Klaus' message, I've released a book last year on some advanced topics in R programming and development:
https://hal.ird.fr/ird-03850685 Chapter 9 is on parallelization and HPC. There are a few (detailed) examples showing when multi-core is benefitial and when it is not. I've also written a list of rules on this topic that overlap with Klaus' advices. Cheers, Emmanuel ----- Le 7 Mar 23, à 19:01, Klaus Schliep klaus.schl...@gmail.com a écrit : > Dear Vojtěch, > nice work. Just a few random comments: > Parallelization is often not straightforward as it depends on the hardware > and the operating system. My preference is using the future package for > parallelization as it does some nice abstraction for the different R > packages, so you can try different things and no need to write different > versions of the code. chatGPT seems to have missed this. > Defaulting to "cores = detectCores()" is not a good idea. The CRAN > Repository Policy (https://cran.r-project.org/web/packages/policies.html) > states: "If running a package uses multiple threads/cores it must never use > more than two simultaneously: the check farm is a shared resource and will > typically be running many checks simultaneously." In short such a default > can cause serious problems on a cluster and I got properly ripleyed for > this years ago, but that's another story. So the default should be > something like min(2L, detectCores()). So with great power comes great > responsibility and users should read the man pages. > Kind regards, > Klaus > > > On Mon, Mar 6, 2023 at 2:43 PM Vojtěch Zeisek <vo...@trapa.cz> wrote: > >> Hello dear colleagues, >> I use often ape::dist.topo (see here dist.topo.r), which is doing the >> calculations sequentially, which is very slow for large data sets. > > > What is large? Large number of trees or large trees? > > I'm sorry, >> I haven't found any relevant Git repository or so, so I hope Emmanuel >> won't >> mind if I discuss it here. >> I discussed various options with ChatGPT and dist.topo.par1.r is the >> simplest >> solution, basically using mc.lapply instead of 2 for loops. Good study >> material for how to do it in general. Little enhancements are in >> dist.topo.par2.r, which should be slightly better in case some pair of >> comparisons would return NA or so, but from my tests there doesn't seem to >> be >> any difference. >> > > I think there is room for improvement with preprocessing trees. If you have > e.g. bootstrap trees with short edges you might want to use di2multi() > first to avoid spurious differences. Filtering out duplicated trees could > also speed up things, if there are many. This will depend on the trees you > want to compare and the methods. > > And finally there is dist.topo.par3.r which doesn't load parallel (and uses >> plain lapply) for cores==1, while parallel and doParallel for multiple >> cores. >> It also contains some checks and error handling. From my testing it works >> well. I'm not sure if tryCatch is really needed there. > > > I am not sure if it is necessary, but you don't want to have it in the > inner loop ;) > > In any case, >> improvements welcomed. :-) >> So, what do You think? Is this usable improvement of ape::dist.topo? >> Sincerely, >> V. >> >> -- >> Vojtěch Zeisek >> https://trapa.cz/en/ >> >> Department of Botany, Faculty of Science >> Charles University, Prague, Czech Republic >> https://www.natur.cuni.cz/biology/botany/ >> https://lab-allience.natur.cuni.cz/ >> >> Institute of Botany, Czech Academy of Sciences >> Průhonice, Czech Republic >> https://www.ibot.cas.cz/en/ >> Computing cluster >> https://sorbus.ibot.cas.cz/en/start >> _______________________________________________ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> > > > -- > Klaus Schliep > > Senior Scientist > Institute of Molecular Biotechnology > TU Graz > https://www.imbt.tugraz.at > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/