To follow (tangently) on Klaus' message, I've released a book last year on some 
advanced topics in R programming and development:

https://hal.ird.fr/ird-03850685

Chapter 9 is on parallelization and HPC. There are a few (detailed) examples 
showing when multi-core is benefitial and when it is not. I've also written a 
list of rules on this topic that overlap with Klaus' advices.

Cheers,

Emmanuel

----- Le 7 Mar 23, à 19:01, Klaus Schliep klaus.schl...@gmail.com a écrit :

> Dear Vojtěch,
> nice work. Just a few random comments:
> Parallelization is often not straightforward as it depends on the hardware
> and the operating system. My preference is using the future package for
> parallelization as it does some nice abstraction for the different R
> packages, so you can try different things and no need to write different
> versions of the code. chatGPT seems to have missed this.
> Defaulting to "cores = detectCores()" is not a good idea. The CRAN
> Repository Policy (https://cran.r-project.org/web/packages/policies.html)
> states: "If running a package uses multiple threads/cores it must never use
> more than two simultaneously: the check farm is a shared resource and will
> typically be running many checks simultaneously." In short such a default
> can cause serious problems on a cluster and I got properly ripleyed for
> this years ago, but that's another story. So the default should be
> something like min(2L, detectCores()). So with great power comes great
> responsibility and users should read the man pages.
> Kind regards,
> Klaus
> 
> 
> On Mon, Mar 6, 2023 at 2:43 PM Vojtěch Zeisek <vo...@trapa.cz> wrote:
> 
>> Hello dear colleagues,
>> I use often ape::dist.topo (see here dist.topo.r), which is doing the
>> calculations sequentially, which is very slow for large data sets.
> 
> 
> What is large? Large number of trees or large trees?
> 
> I'm sorry,
>> I haven't found any relevant Git repository or so, so I hope Emmanuel
>> won't
>> mind if I discuss it here.
>> I discussed various options with ChatGPT and dist.topo.par1.r is the
>> simplest
>> solution, basically using mc.lapply instead of 2 for loops. Good study
>> material for how to do it in general. Little enhancements are in
>> dist.topo.par2.r, which should be slightly better in case some pair of
>> comparisons would return NA or so, but from my tests there doesn't seem to
>> be
>> any difference.
>>
> 
> I think there is room for improvement with preprocessing trees. If you have
> e.g. bootstrap trees with short edges you might want to use di2multi()
> first to avoid spurious differences. Filtering out duplicated trees could
> also speed up things, if there are many. This will depend on the trees you
> want to compare and the methods.
> 
> And finally there is dist.topo.par3.r which doesn't load parallel (and uses
>> plain lapply) for cores==1, while parallel and doParallel for multiple
>> cores.
>> It also contains some checks and error handling. From my testing it works
>> well. I'm not sure if tryCatch is really needed there.
> 
> 
> I am not sure if it is necessary, but you don't want to have it in the
> inner loop ;)
> 
> In any case,
>> improvements welcomed. :-)
>> So, what do You think? Is this usable improvement of ape::dist.topo?
>> Sincerely,
>> V.
>>
>> --
>> Vojtěch Zeisek
>> https://trapa.cz/en/
>>
>> Department of Botany, Faculty of Science
>> Charles University, Prague, Czech Republic
>> https://www.natur.cuni.cz/biology/botany/
>> https://lab-allience.natur.cuni.cz/
>>
>> Institute of Botany, Czech Academy of Sciences
>> Průhonice, Czech Republic
>> https://www.ibot.cas.cz/en/
>> Computing cluster
>> https://sorbus.ibot.cas.cz/en/start
>> _______________________________________________
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>
> 
> 
> --
> Klaus Schliep
> 
> Senior Scientist
> Institute of Molecular Biotechnology
> TU Graz
> https://www.imbt.tugraz.at
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to