Re: [OMPI devel] barrier before calling del_procs
My understanding is that both of these clauses are based on the fact that there are ongoing communications between two processes when one of them decide to shut down. From an MPI perspective, I can hardly see a case where this is legit. George. On Wed, Jul 23, 2014 at 8:33 AM, Yossi Etigin <yos...@mellanox.com> wrote: > 1. If the barrier is before del_proc, it does guarantee all MPI > calls have been completed by all other ranks, but it does not guarantee all > ACKs have been delivered. For MXM, closing the connection (del_procs call > completed) guarantees that my rank got all ACKs. So we need a barrier > between del_procs and pml_finalize, because only when all other ranks > closed their connection it’s safe to destroy the global pml resources. > > > > 2. In order to avoid a situation when rankA starts disconnecting > from rankB, while rankB is still doing MPI work. In this case rankB will > not be able to communicate with rankA any more, while it still has work to > do. > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George > Bosilca > *Sent:* Monday, July 21, 2014 9:11 PM > > *To:* Open MPI Developers > *Subject:* Re: [OMPI devel] barrier before calling del_procs > > > > On Mon, Jul 21, 2014 at 1:41 PM, Yossi Etigin <yos...@mellanox.com> wrote: > > Right, but: > > 1. IMHO the rte_barrier in the wrong place (in the trunk) > > > > In the trunk we have the rte_barrier prior to del_proc, which is what I > would have expected: quiescence the BTLs by reaching a point where > everybody agree that no more MPI messages will be exchanged, and then > delete the BTLs. > > > > 2. In addition to the rte_barrier, need also mpi_barrier > > Care for providing a reasoning for this barrier? Why and where should it > be placed? > > > > George. > > > > > > > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George > Bosilca > *Sent:* Monday, July 21, 2014 8:19 PM > *To:* Open MPI Developers > > > *Subject:* Re: [OMPI devel] barrier before calling del_procs > > > > There was a long thread of discussion on why we must use an rte_barrier > and not an mpi_barrier during the finalize. Basically, we long as we have > connectionless unreliable BTLs we need an external mechanism to ensure > complete tear-down of the entire infrastructure. Thus, we need to rely on > an rte_barrier not because it guarantees the correctness of the code, but > because it provides enough time to all processes to flush all HPC traffic. > > > > George. > > > > > > On Mon, Jul 21, 2014 at 1:10 PM, Yossi Etigin <yos...@mellanox.com> wrote: > > I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved > del_procs after the barrier: > "Revert r31851 until we can resolve how to close these leaks without > causing the usnic BTL to fail during disconnect of intercommunicators >Refs #4643" > Also, we need an rte barrier after del_procs - because otherwise rankA > could call pml_finalize() before rankB finishes disconnecting from rankA. > > I think the order in finalize should be like this: > 1. mpi_barrier(world) > 2. del_procs() > 3. rte_barrier() > 4. pml_finalize() > > > -Original Message- > From: Nathan Hjelm [mailto:hje...@lanl.gov] > Sent: Monday, July 21, 2014 8:01 PM > To: Open MPI Developers > Cc: Yossi Etigin > Subject: Re: [OMPI devel] barrier before calling del_procs > > I should add that it is an rte barrier and not an MPI barrier for > technical reasons. > > -Nathan > > On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote: > >We already have an rte barrier before del procs > > > >Sent from my iPhone > >On Jul 21, 2014, at 8:21 AM, Yossi Etigin <yos...@mellanox.com> > wrote: > > > > Hi, > > > > > > > > We get occasional hangs with MTL/MXM during finalize, because a > global > > synchronization is needed before calling del_procs. > > > > e.g rank A may call del_procs() and disconnect from rank B, while > rank B > > is still working. > > > > What do you think about adding an MPI barrier on COMM_WORLD before > > calling del_procs()? > > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/07/15204.php > > _
Re: [OMPI devel] barrier before calling del_procs
Right, but: 1. IMHO the rte_barrier in the wrong place (in the trunk) 2. In addition to the rte_barrier, need also mpi_barrier From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Monday, July 21, 2014 8:19 PM To: Open MPI Developers Subject: Re: [OMPI devel] barrier before calling del_procs There was a long thread of discussion on why we must use an rte_barrier and not an mpi_barrier during the finalize. Basically, we long as we have connectionless unreliable BTLs we need an external mechanism to ensure complete tear-down of the entire infrastructure. Thus, we need to rely on an rte_barrier not because it guarantees the correctness of the code, but because it provides enough time to all processes to flush all HPC traffic. George. On Mon, Jul 21, 2014 at 1:10 PM, Yossi Etigin <yos...@mellanox.com<mailto:yos...@mellanox.com>> wrote: I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved del_procs after the barrier: "Revert r31851 until we can resolve how to close these leaks without causing the usnic BTL to fail during disconnect of intercommunicators Refs #4643" Also, we need an rte barrier after del_procs - because otherwise rankA could call pml_finalize() before rankB finishes disconnecting from rankA. I think the order in finalize should be like this: 1. mpi_barrier(world) 2. del_procs() 3. rte_barrier() 4. pml_finalize() -Original Message- From: Nathan Hjelm [mailto:hje...@lanl.gov<mailto:hje...@lanl.gov>] Sent: Monday, July 21, 2014 8:01 PM To: Open MPI Developers Cc: Yossi Etigin Subject: Re: [OMPI devel] barrier before calling del_procs I should add that it is an rte barrier and not an MPI barrier for technical reasons. -Nathan On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote: >We already have an rte barrier before del procs > >Sent from my iPhone >On Jul 21, 2014, at 8:21 AM, Yossi Etigin > <yos...@mellanox.com<mailto:yos...@mellanox.com>> wrote: > > Hi, > > > > We get occasional hangs with MTL/MXM during finalize, because a global > synchronization is needed before calling del_procs. > > e.g rank A may call del_procs() and disconnect from rank B, while rank B > is still working. > > What do you think about adding an MPI barrier on COMM_WORLD before > calling del_procs()? > > > ___ > devel mailing list > de...@open-mpi.org<mailto:de...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15204.php ___ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15206.php
Re: [OMPI devel] barrier before calling del_procs
There was a long thread of discussion on why we must use an rte_barrier and not an mpi_barrier during the finalize. Basically, we long as we have connectionless unreliable BTLs we need an external mechanism to ensure complete tear-down of the entire infrastructure. Thus, we need to rely on an rte_barrier not because it guarantees the correctness of the code, but because it provides enough time to all processes to flush all HPC traffic. George. On Mon, Jul 21, 2014 at 1:10 PM, Yossi Etigin <yos...@mellanox.com> wrote: > I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved > del_procs after the barrier: > "Revert r31851 until we can resolve how to close these leaks without > causing the usnic BTL to fail during disconnect of intercommunicators >Refs #4643" > Also, we need an rte barrier after del_procs - because otherwise rankA > could call pml_finalize() before rankB finishes disconnecting from rankA. > > I think the order in finalize should be like this: > 1. mpi_barrier(world) > 2. del_procs() > 3. rte_barrier() > 4. pml_finalize() > > -Original Message- > From: Nathan Hjelm [mailto:hje...@lanl.gov] > Sent: Monday, July 21, 2014 8:01 PM > To: Open MPI Developers > Cc: Yossi Etigin > Subject: Re: [OMPI devel] barrier before calling del_procs > > I should add that it is an rte barrier and not an MPI barrier for > technical reasons. > > -Nathan > > On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote: > >We already have an rte barrier before del procs > > > >Sent from my iPhone > >On Jul 21, 2014, at 8:21 AM, Yossi Etigin <yos...@mellanox.com> > wrote: > > > > Hi, > > > > > > > > We get occasional hangs with MTL/MXM during finalize, because a > global > > synchronization is needed before calling del_procs. > > > > e.g rank A may call del_procs() and disconnect from rank B, while > rank B > > is still working. > > > > What do you think about adding an MPI barrier on COMM_WORLD before > > calling del_procs()? > > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/07/15204.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15206.php >
Re: [OMPI devel] barrier before calling del_procs
I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved del_procs after the barrier: "Revert r31851 until we can resolve how to close these leaks without causing the usnic BTL to fail during disconnect of intercommunicators Refs #4643" Also, we need an rte barrier after del_procs - because otherwise rankA could call pml_finalize() before rankB finishes disconnecting from rankA. I think the order in finalize should be like this: 1. mpi_barrier(world) 2. del_procs() 3. rte_barrier() 4. pml_finalize() -Original Message- From: Nathan Hjelm [mailto:hje...@lanl.gov] Sent: Monday, July 21, 2014 8:01 PM To: Open MPI Developers Cc: Yossi Etigin Subject: Re: [OMPI devel] barrier before calling del_procs I should add that it is an rte barrier and not an MPI barrier for technical reasons. -Nathan On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote: >We already have an rte barrier before del procs > >Sent from my iPhone >On Jul 21, 2014, at 8:21 AM, Yossi Etigin <yos...@mellanox.com> wrote: > > Hi, > > > > We get occasional hangs with MTL/MXM during finalize, because a global > synchronization is needed before calling del_procs. > > e.g rank A may call del_procs() and disconnect from rank B, while rank B > is still working. > > What do you think about adding an MPI barrier on COMM_WORLD before > calling del_procs()? > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15204.php
Re: [OMPI devel] barrier before calling del_procs
I should add that it is an rte barrier and not an MPI barrier for technical reasons. -Nathan On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote: >We already have an rte barrier before del procs > >Sent from my iPhone >On Jul 21, 2014, at 8:21 AM, Yossi Etiginwrote: > > Hi, > > > > We get occasional hangs with MTL/MXM during finalize, because a global > synchronization is needed before calling del_procs. > > e.g rank A may call del_procs() and disconnect from rank B, while rank B > is still working. > > What do you think about adding an MPI barrier on COMM_WORLD before > calling del_procs()? > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15204.php pgp11KpmSGSOt.pgp Description: PGP signature
Re: [OMPI devel] barrier before calling del_procs
We already have an rte barrier before del procs Sent from my iPhone > On Jul 21, 2014, at 8:21 AM, Yossi Etiginwrote: > > Hi, > > We get occasional hangs with MTL/MXM during finalize, because a global > synchronization is needed before calling del_procs. > e.g rank A may call del_procs() and disconnect from rank B, while rank B is > still working. > What do you think about adding an MPI barrier on COMM_WORLD before calling > del_procs()? >