So to be clear: each parent launches 10 children, and no other parents 
participate in that spawn?

And there is no threading in the app, yes?


> On Jun 11, 2015, at 12:53 PM, Leiter, Kenneth W CIV USARMY ARL (US) 
> <kenneth.w.leiter2....@mail.mil> wrote:
> 
> Howard,
> 
> I do not run into a problem when I have one parent spawning many children 
> (tested up to 100 children ranks), but am seeing the problem when I have, for 
> example, 8 parents launching 10 children each.
> 
> - Ken
> From: users [users-boun...@open-mpi.org] on behalf of Howard Pritchard 
> [hpprit...@gmail.com]
> Sent: Thursday, June 11, 2015 2:36 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] orted segmentation fault in pmix on master
> 
> Hi Ken,
> 
> Could you post the output of your ompi_info?
> 
> I have PrgEnv-gnu/5.2.56 and gcc/4.9.2 loaded in my env on nersc system.  
> Following configure line:
> 
> ./configure --enable-mpi-java --prefix=my_favorite_install_location
> 
> The general rule of thumb on cray's with master (not with older versions 
> though) is you should be able to
> do a ./configure (install location) and you're ready to go, no need for 
> complicated platform files, etc.
> to just build vanilla.
> 
> As you're probably guessing, I'm going to say it works for me, at least up to 
> 68 slave ranks.
> 
> I do notice there's some glitch with the mapping of the ranks though.  The 
> binding logic seems
> to think there's oversubscription of cores even when there should not be.  I 
> had to use the
> 
> --bind-to none
> 
> option on the command line once I asked for more than 22 slave ranks.  edison 
> system has
> has 24 cores/node.
> 
> Howard
> 
> 
> 
> 2015-06-11 12:10 GMT-06:00 Leiter, Kenneth W CIV USARMY ARL (US) 
> <kenneth.w.leiter2....@mail.mil <mailto:kenneth.w.leiter2....@mail.mil>>:
> I will try on a non-cray machine as well.
> 
> - Ken
> 
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org 
> <mailto:users-boun...@open-mpi.org>] On Behalf Of Howard Pritchard
> Sent: Thursday, June 11, 2015 12:21 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] orted segmentation fault in pmix on master
> 
> Hello Ken,
> 
> Could you give the details of the allocation request (qsub args) as well as 
> the mpirun command line args? I'm trying to reproduce on the nersc system.
> 
> It would be interesting if you have access to a similar size non-cray cluster 
> if you get the same problems.
> 
> Howard
> 
> 
> 2015-06-11 9:13 GMT-06:00 Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org> <mailto:r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> >:
> 
> 
>         I don’t have a Cray, but let me see if I can reproduce this on 
> something else
> 
>         > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US) 
> <kenneth.w.leiter2....@mail.mil <mailto:kenneth.w.leiter2....@mail.mil> 
> <mailto:kenneth.w.leiter2....@mail.mil 
> <mailto:kenneth.w.leiter2....@mail.mil>> > wrote:
>         >
>         > Hello,
>         >
>         > I am attempting to use the openmpi development master for a code 
> that uses
>         > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 
> at the
>         > Army Research Laboratory. After reading through the mailing list I 
> came to
>         > the conclusion that the master branch is the only hope for getting 
> this to
>         > work on the newer Cray machines.
>         >
>         > To test I am using the cpi-master.c cpi-worker.c example. The test 
> works
>         > when executing on a small number of processors, five or less, but 
> begins to
>         > fail with segmentation faults in orted when using more processors. 
> Even with
>         > five or fewer processors, I am spreading the computation to more 
> than one
>         > node. I am using the cray ugni btl through the alps scheduler.
>         >
>         > I get a core file from orted and have the seg fault tracked down to
>         > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have 
> tried
>         > reading the code to understand how this happens, but am unsure. I 
> do see
>         > that in the if statement where I take the else branch, the other 
> branch
>         > specifically checks "if (NULL == req->proxy)" - however, no such 
> check is
>         > done the the else branch.
>         >
>         > I have debug output dumped for the failing runs. I can provide the 
> output
>         > along with ompi_info output and config.log to anyone who is 
> interested.
>         >
>         > - Ken Leiter
>         >
>         > _______________________________________________
>         > users mailing list
>         > us...@open-mpi.org <mailto:us...@open-mpi.org> 
> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>         > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>         > Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27094.php 
> <http://www.open-mpi.org/community/lists/users/2015/06/27094.php>
> 
>         _______________________________________________
>         users mailing list
>         us...@open-mpi.org <mailto:us...@open-mpi.org> 
> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>         Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>         Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27095.php 
> <http://www.open-mpi.org/community/lists/users/2015/06/27095.php>
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27103.php 
> <http://www.open-mpi.org/community/lists/users/2015/06/27103.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27110.php

Reply via email to