Re: [OMPI users] Deadlock on large numbers of processors

Justin Thu, 11 Dec 2008 17:31:02 -0500

The more I look at this bug the more I'm convinced it is with openMPIand not our code. Here is why: Our code generates acommunication/execution schedule. At each timestep this schedule isexecuted and all communication and execution is performed. Our problemis AMR which means the communication schedule may change from time totime. In this case the schedule has not changed in many timestepsmeaning the same communication schedule is being used as the last X (xbeing around 20 in this case) timesteps.Our code does have a very large communication problem. I have been ableto reduce the hang down to 16 processors and it seems to me the hangoccurs when he have lots of work per processor. Meaning if I add moreprocessors it may not hang but reducing processors makes it more likelyto hang.

What is the status on the fix for this particular freelist deadlock?


Thanks,
Justin

Jeff Squyres wrote:

George --

Is this the same issue that you're working on?

(we have a "blocker" bug for v1.3 about deadlock at heavy messagingvolume -- on Tuesday, it looked like a bug in our freelist...)



On Dec 9, 2008, at 10:28 AM, Justin wrote:

I have tried disabling the shared memory by running with thefollowing parameters to mpirun

--mca btl openib,self --mca btl_openib_ib_timeout 23 --mcabtl_openib_use_srq 1 --mca btl_openib_use_rd_max 2048

Unfortunately this did not get rid of any hangs and has seemed tomake them more common. I have now been able to reproduce thedeadlock at 32 processors. I am now working with an mpi deadlockdetection research code which will hopefully be able to tell me ifthere are any deadlocks in our code. At the same time if any of youhave any suggestions of parameters to openmpi that might alleviatethese deadlocks I would be grateful.



Thanks,
Justin




Rolf Vandevaart wrote:

The current version of Open MPI installed on ranger is 1.3a1r19685which is from early October. This version has a fix for ticket#1378. Ticket #1449 is not an issue is this case because each nodehas 16 processors and #1449 is for larger SMPs.

However, I am wondering if this is because of tickethttps://svn.open-mpi.org/trac/ompi/ticket/1468 which was not yetfixed in the version running on ranger.

As was suggested earlier, running without the sm btl would be a clueif this is the problem.


mpirun --mca btl ^sm a.out

Another way to potentially work around the issue is to increase thesize of the shared memory backing file.


mpirun --mca 1073741824 -mca mpool_sm_max_size 1073741824 a.out

We will also work with TACC to get an upgraded version of Open MPI1.3 on there.


Let us know what you find.

Rolf


On 12/09/08 08:05, Lenny Verkhovsky wrote:

also see https://svn.open-mpi.org/trac/ompi/ticket/1449

On 12/9/08, *Lenny Verkhovsky* <lenny.verkhov...@gmail.com<mailto:lenny.verkhov...@gmail.com>> wrote:

maybe it's related tohttps://svn.open-mpi.org/trac/ompi/ticket/1378 ??



   On 12/5/08, *Justin* <luitj...@cs.utah.edu
   <mailto:luitj...@cs.utah.edu>> wrote:

       The reason i'd like to disable these eager buffers is to help
       detect the deadlock better.  I would not run with this for a
       normal run but it would be useful for debugging.  If the
       deadlock is indeed due to our code then disabling any shared

buffers or eager sends would make that deadlockreproduceable. In addition we might be able to lower thenumber of processors

       down.  Right now determining which processor is deadlocks when
       we are using 8K cores and each processor has hundreds of
       messages sent out would be quite difficult.

       Thanks for your suggestions,
       Justin

       Brock Palen wrote:

OpenMPI has differnt eager limits for all the networktypes,

           on your system run:

           ompi_info --param btl all

           and look for the eager_limits
           You can set these values to 0 using the syntax I showed you
           before. That would disable eager messages.
           There might be a better way to disable eager messages.
           Not sure why you would want to disable them, they are there
           for performance.

           Maybe you would still see a deadlock if every message was
           below the threshold. I think there is a limit of the number
           of eager messages a receving cpus will accept. Not sure
           about that though.  I still kind of doubt it though.

           Try tweaking your buffer sizes,  make the openib  btl eager

limit the same as shared memory. and see if you getlocks up

           between hosts and not just shared memory.

           Brock Palen
           www.umich.edu/~brockp <http://www.umich.edu/~brockp>
           Center for Advanced Computing
           bro...@umich.edu <mailto:bro...@umich.edu>
           (734)936-1985



           On Dec 5, 2008, at 2:10 PM, Justin wrote:

               Thank you for this info.  I should add that our code
               tends to post a lot of sends prior to the other side
               posting receives.  This causes a lot of unexpected
               messages to exist.  Our code explicitly matches up all
               tags and processors (that is we do not use MPI wild
               cards).  If we had a dead lock I would think we would
               see it regardless of weather or not we cross the
               roundevous threshold.  I guess one way to test this

would be to to set this threshold to 0. If it thendead

               locks we would likely be able to track down the
               deadlock.  Are there any other parameters we can send
               mpi that will turn off buffering?

               Thanks,
               Justin

               Brock Palen wrote:

                   When ever this happens we found the code to have a
                   deadlock.  users never saw it until they cross the
                   eager->roundevous threshold.

                   Yes you can disable shared memory with:

                   mpirun --mca btl ^sm

                   Or you can try increasing the eager limit.

                   ompi_info --param btl sm

MCA btl: parameter "btl_sm_eager_limit" (currentvalue:

                                            "4096")

                   You can modify this limit at run time,  I think
                   (can't test it right now) it is just:

                   mpirun --mca btl_sm_eager_limit 40960

                   I think you can also in tweaking these values use

env Vars in place of putting it all in thempirun line:


                   export OMPI_MCA_btl_sm_eager_limit=40960

                   See:
                   http://www.open-mpi.org/faq/?category=tuning


                   Brock Palen

www.umich.edu/~brockp<http://www.umich.edu/~brockp>

                   Center for Advanced Computing
                   bro...@umich.edu <mailto:bro...@umich.edu>
                   (734)936-1985



                   On Dec 5, 2008, at 12:22 PM, Justin wrote:

                       Hi,

We are currently using OpenMPI 1.3 on Rangerforlarge processor jobs (8K+). Our codeappears to

                       be occasionally deadlocking at random within
                       point to point communication (see stacktrace
                       below).  This code has been tested on many
                       different MPI versions and as far as we know it
                       does not contain a deadlock.  However, in the
                       past we have ran into problems with shared
                       memory optimizations within MPI causing
                       deadlocks.  We can usually avoid these by
                       setting a few environment variables to either
                       increase the size of shared memory buffers or
                       disable shared memory optimizations all
                       together.   Does OpenMPI have any known
                       deadlocks that might be causing our deadlocks?
                        If are there any work arounds?  Also how do we
                       disable shared memory within OpenMPI?

Here is an example of where processors arehanging:


                       #0  0x00002b2df3522683 in
                       mca_btl_sm_component_progress () from

/opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_btl_sm.so#1 0x00002b2df2cb46bf inmca_bml_r2_progress ()

                       from

/opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_bml_r2.so

                       #2  0x00002b2df0032ea4 in opal_progress () from

/opt/apps/intel10_1/openmpi/1.3/lib/libopen-pal.so.0

                       #3  0x00002b2ded0d7622 in
                       ompi_request_default_wait_some () from

/opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0

                       #4  0x00002b2ded109e34 in PMPI_Waitsome () from

/opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0



                       Thanks,
                       Justin
                       _______________________________________________
                       users mailing list
                       us...@open-mpi.org <mailto:us...@open-mpi.org>

http://www.open-mpi.org/mailman/listinfo.cgi/users




                   _______________________________________________
                   users mailing list
                   us...@open-mpi.org <mailto:us...@open-mpi.org>
                   http://www.open-mpi.org/mailman/listinfo.cgi/users


               _______________________________________________
               users mailing list
               us...@open-mpi.org <mailto:us...@open-mpi.org>
               http://www.open-mpi.org/mailman/listinfo.cgi/users



           _______________________________________________
           users mailing list
           us...@open-mpi.org <mailto:us...@open-mpi.org>
           http://www.open-mpi.org/mailman/listinfo.cgi/users


       _______________________________________________
       users mailing list
       us...@open-mpi.org <mailto:us...@open-mpi.org>
       http://www.open-mpi.org/mailman/listinfo.cgi/users

------------------------------------------------------------------------


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Deadlock on large numbers of processors

Reply via email to