[OMPI users] MPI_Bcast performance doesn't improve after enabling tree implementation

2017-10-17 Thread Konstantinos Konstantinidis
I have implemented some algorithms in C++ which are greatly affected by shuffling time among nodes which is done by some broadcast calls. Up to now, I have been testing them by running something like mpirun -mca btl ^openib -mca plm_rsh_no_tree_spawn 1 ./my_test which I think make MPI_Bcast to

Re: [OMPI users] MPI_Bcast performance doesn't improve after enabling tree implementation

2017-10-17 Thread Gilles Gouaillardet
Konstantinos, I am afraid there is some confusion here. the plm_rsh_no_tree_spawn is only used at startup time (e.g. when remote launching one orted daemon per node but the one running mpirun). there is zero impact on the performances of MPI communications such as MPI_Bcast() the

Re: [OMPI users] MPI_Bcast performance doesn't improve after enabling tree implementation

2017-10-17 Thread Konstantinos Konstantinidis
Thanks for clarifying that Gilles. Now I have seen that omitting "-mca plm_rsh_no_tree_spawn 1" requires establishing passwordless SSH among the machines but this is not required for setting "--mca coll_tuned_bcast_algo". Is this correct or am I missing something? Also, among all possible

Re: [OMPI users] MPI_Bcast performance doesn't improve after enabling tree implementation

2017-10-17 Thread Gilles Gouaillardet
If you use the rsh tree spawn mechanism, then yes, any node must be able to SSH passwordless to any node. This is only used to spawn one orted per node. when the number of nodes is important, a tree spawn is faster and avoids having all the SSH connections issued and maintained from the node