subject:"\[OMPI users\] Program hangs in mpi

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users

Charles, Having implemented some of the underlying collective algorithms, I am puzzled by the need to force the sync to 1 to have things flowing. I would definitely appreciate a reproducer so that I can identify (and hopefully) fix the underlying problem. Thanks, George. On Tue, Oct 29, 2019

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread Garrett, Charles via users

Last time I did a reply on here, it created a new thread. Sorry about that everyone. I just hit the Reply via email button. Hopefully this one will work. To Gilles Gouaillardet: My first thread has a reproducer that causes the problem. To Beorge Bosilca: I had to set

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users

Charles, There is a known issue with calling collectives on a tight loop, due to lack of control flow at the network level. It results in a significant slow-down, that might appear as a deadlock to users. The work around this is to enable the sync collective module, that will insert a fake

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-28 Thread Gilles Gouaillardet via users

Charles, unless you expect yes or no answers, can you please post a simple program that evidences the issue you are facing ? Cheers, Gilles On 10/29/2019 6:37 AM, Garrett, Charles via users wrote: Does anyone have any idea why this is happening? Has anyone seen this problem before?

[OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-28 Thread Garrett, Charles via users

Does anyone have any idea why this is happening? Has anyone seen this problem before?

[OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-08 Thread Garrett, Charles via users

I have a problem where MPI_Bcast hangs when called rapidly over and over again. This problem manifests itself on our new cluster, but not on our older one. The new cluster has Cascade Lake processors. Each node contains 2 sockets with 18 cores per socket. Cluster size is 128 nodes with an

Re: [OMPI users] Program hangs in mpi_bcast

2011-12-09 Thread Alex A. Granovsky

mething else? Yes, this is with regards to collective hang issue. All the best, Alex - Original Message - From: "Jeff Squyres" <jsquy...@cisco.com> To: "Alex A. Granovsky" <g...@classic.chem.msu.su>; Sent: Saturday, December 03, 2011 3:36 PM Subject: Re:

Re: [OMPI users] Program hangs in mpi_bcast

2011-12-02 Thread Alex A. Granovsky

very close to the hardware limits does not make us happy at all. Kind regards, Alex Granovsky - Original Message - From: "Jeff Squyres" <jsquy...@cisco.com> To: "Open MPI Users" <us...@open-mpi.org> Sent: Wednesday, November 30, 2011 11:45 PM Subject: Re: [OMPI

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-30 Thread Jeff Squyres

Fair enough. Thanks anyway! On Nov 30, 2011, at 3:39 PM, Tom Rosmond wrote: > Jeff, > > I'm afraid trying to produce a reproducer of this problem wouldn't be > worth the effort. It is a legacy code that I wasn't involved in > developing and will soon be discarded, so I can't justify spending

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-30 Thread Tom Rosmond

Jeff, I'm afraid trying to produce a reproducer of this problem wouldn't be worth the effort. It is a legacy code that I wasn't involved in developing and will soon be discarded, so I can't justify spending time trying to understand its behavior better. The bottom line is that it works

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-30 Thread Jeff Squyres

Yes, but I'd like to see a reproducer that requires setting the sync_barrier_before=5. Your reproducers allowed much higher values, IIRC. I'm curious to know what makes that code require such a low value (i.e., 5)... On Nov 30, 2011, at 1:50 PM, Ralph Castain wrote: > FWIW: we already have a

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-30 Thread Ralph Castain

Oh - and another one at orte/test/mpi/reduce-hang.c On Nov 30, 2011, at 11:50 AM, Ralph Castain wrote: > FWIW: we already have a reproducer from prior work I did chasing this down a > couple of years ago. See orte/test/mpi/bcast_loop.c > > > On Nov 29, 2011, at 9:35 AM, Jeff Squyres wrote: >

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-30 Thread Ralph Castain

FWIW: we already have a reproducer from prior work I did chasing this down a couple of years ago. See orte/test/mpi/bcast_loop.c On Nov 29, 2011, at 9:35 AM, Jeff Squyres wrote: > That's quite weird/surprising that you would need to set it down to *5* -- > that's really low. > > Can you

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-29 Thread Jeff Squyres

That's quite weird/surprising that you would need to set it down to *5* -- that's really low. Can you share a simple reproducer code, perchance? On Nov 15, 2011, at 11:49 AM, Tom Rosmond wrote: > Ralph, > > Thanks for the advice. I have to set 'coll_sync_barrier_before=5' to do > the job.

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-15 Thread Tom Rosmond

Ralph, Thanks for the advice. I have to set 'coll_sync_barrier_before=5' to do the job. This is a big change from the default value (1000), so our application seems to be a pretty extreme case. T. Rosmond On Mon, 2011-11-14 at 16:17 -0700, Ralph Castain wrote: > Yes, this is well documented

Re: [OMPI users] Program hangs in mpi_bcast

2011-11-14 Thread Ralph Castain

Yes, this is well documented - may be on the FAQ, but certainly has been in the user list multiple times. The problem is that one process falls behind, which causes it to begin accumulating "unexpected messages" in its queue. This causes the matching logic to run a little slower, thus making

[OMPI users] Program hangs in mpi_bcast

2011-11-14 Thread Tom Rosmond

Hello: A colleague and I have been running a large F90 application that does an enormous number of mpi_bcast calls during execution. I deny any responsibility for the design of the code and why it needs these calls, but it is what we have inherited and have to work with. Recently we ported the

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

[OMPI users] Program hangs when MPI_Bcast is called rapidly

[OMPI users] Program hangs when MPI_Bcast is called rapidly

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

Re: [OMPI users] Program hangs in mpi_bcast

[OMPI users] Program hangs in mpi_bcast

17 matches

Site Navigation

Mail list logo

Footer information