Re: [OMPI devel] Memory performance with Bcast

2019-03-24 Thread Gilles Gouaillardet
Marcin, Based on your description, you might want to investigate non blocking collectives (e.g. MPI_Ibcast) or even upcoming persistent collectives (e.g. MPIX_Bcast_init). If you know the address of the receive buffer, then you can MPI_Ibcast() on non root ranks very early, and then

Re: [OMPI devel] Memory performance with Bcast

2019-03-22 Thread marcin.krotkiewski
21, 2019 7:31 PM *To:* Open MPI Developers *Subject:* Re: [OMPI devel] Memory performance with Bcast Marcin, I am not sure I understand your question, a bcast is a collective operation that must be posted by all participants. Independently at what level the bcast is serviced, if some

Re: [OMPI devel] Memory performance with Bcast

2019-03-22 Thread marcin.krotkiewski
On 3/21/19 5:31 PM, George Bosilca wrote: I am not sure I understand your question, a bcast is a collective operation that must be posted by all participants. Independently at what level the bcast is serviced, if some of the participants have not posted their participation to the collective,

Re: [OMPI devel] Memory performance with Bcast

2019-03-22 Thread Valentin Petrov
/get semantics) but this will require some study if you are not familiar with it. From: devel On Behalf Of George Bosilca Sent: Thursday, March 21, 2019 7:31 PM To: Open MPI Developers Subject: Re: [OMPI devel] Memory performance with Bcast Marcin, I am not sure I understand your question

Re: [OMPI devel] Memory performance with Bcast

2019-03-21 Thread George Bosilca
Marcin, I am not sure I understand your question, a bcast is a collective operation that must be posted by all participants. Independently at what level the bcast is serviced, if some of the participants have not posted their participation to the collective, only partial progress can be made.

Re: [OMPI devel] Memory performance with Bcast

2019-03-21 Thread Joshua Ladd
Marcin, HPC-X implements the MPI BCAST operation by leveraging hardware multicast capabilities. Starting with HPC-X v2.3 we introduced a new multicast based algorithm for large messages as well. Hardware multicast scales as O(1) modulo switch hops. It is the most efficient way to broadcast a

Re: [OMPI devel] Memory performance with Bcast

2019-03-21 Thread marcin.krotkiewski
Thanks, George! So, the function you mentioned is used when I turn off HCOLL and use OpenMPI's tuned coll instead. That helps a lot. Another thing that makes me think is that in my case the data is sent to the targets asynchronously, or rather - it is a 'put' operation in nature, and the

Re: [OMPI devel] Memory performance with Bcast

2019-03-20 Thread George Bosilca
If you have support for FCA then it might happen that the collective will use the hardware support. In any case, most of the bcast algorithms have a logarithmic behavior, so there will be at most O(log(P)) memory accesses on the root. If you want to take a look at the code in OMPI to understand

[OMPI devel] Memory performance with Bcast

2019-03-20 Thread marcin.krotkiewski
Hi! I'm wondering about the details of Bcast implementation in OpenMPI. I'm specifically interested in IB interconnects, but information about other architectures (and OpenMPI in general) would also be very useful. I am working with a code, which sends the sameĀ  (large) message to a bunch