The ompi_info is attached.

I can try with a vanilla configure, here is what I configured with:


./configure --with-alps --with-ugni --without-verbs --with-cray-xpmem 
--with-cray-pmi --with-udreg --without-tm --enable-debug


I am using PrgEnv-intel/5.2.40

________________________________
From: users [users-boun...@open-mpi.org] on behalf of Howard Pritchard 
[hpprit...@gmail.com]
Sent: Thursday, June 11, 2015 2:36 PM
To: Open MPI Users
Subject: Re: [OMPI users] orted segmentation fault in pmix on master

Hi Ken,

Could you post the output of your ompi_info?

I have PrgEnv-gnu/5.2.56 and gcc/4.9.2 loaded in my env on nersc system.  
Following configure line:

./configure --enable-mpi-java --prefix=my_favorite_install_location

The general rule of thumb on cray's with master (not with older versions 
though) is you should be able to
do a ./configure (install location) and you're ready to go, no need for 
complicated platform files, etc.
to just build vanilla.

As you're probably guessing, I'm going to say it works for me, at least up to 
68 slave ranks.

I do notice there's some glitch with the mapping of the ranks though.  The 
binding logic seems
to think there's oversubscription of cores even when there should not be.  I 
had to use the

--bind-to none

option on the command line once I asked for more than 22 slave ranks.  edison 
system has
has 24 cores/node.

Howard



2015-06-11 12:10 GMT-06:00 Leiter, Kenneth W CIV USARMY ARL (US) 
<kenneth.w.leiter2....@mail.mil<mailto:kenneth.w.leiter2....@mail.mil>>:
I will try on a non-cray machine as well.

- Ken

-----Original Message-----
From: users 
[mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On 
Behalf Of Howard Pritchard
Sent: Thursday, June 11, 2015 12:21 PM
To: Open MPI Users
Subject: Re: [OMPI users] orted segmentation fault in pmix on master

Hello Ken,

Could you give the details of the allocation request (qsub args) as well as the 
mpirun command line args? I'm trying to reproduce on the nersc system.

It would be interesting if you have access to a similar size non-cray cluster 
if you get the same problems.

Howard


2015-06-11 9:13 GMT-06:00 Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org> 
<mailto:r...@open-mpi.org<mailto:r...@open-mpi.org>> >:


        I don’t have a Cray, but let me see if I can reproduce this on 
something else

        > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US) 
<kenneth.w.leiter2....@mail.mil<mailto:kenneth.w.leiter2....@mail.mil> 
<mailto:kenneth.w.leiter2....@mail.mil<mailto:kenneth.w.leiter2....@mail.mil>> 
> wrote:
        >
        > Hello,
        >
        > I am attempting to use the openmpi development master for a code that 
uses
        > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 at 
the
        > Army Research Laboratory. After reading through the mailing list I 
came to
        > the conclusion that the master branch is the only hope for getting 
this to
        > work on the newer Cray machines.
        >
        > To test I am using the cpi-master.c cpi-worker.c example. The test 
works
        > when executing on a small number of processors, five or less, but 
begins to
        > fail with segmentation faults in orted when using more processors. 
Even with
        > five or fewer processors, I am spreading the computation to more than 
one
        > node. I am using the cray ugni btl through the alps scheduler.
        >
        > I get a core file from orted and have the seg fault tracked down to
        > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have tried
        > reading the code to understand how this happens, but am unsure. I do 
see
        > that in the if statement where I take the else branch, the other 
branch
        > specifically checks "if (NULL == req->proxy)" - however, no such 
check is
        > done the the else branch.
        >
        > I have debug output dumped for the failing runs. I can provide the 
output
        > along with ompi_info output and config.log to anyone who is 
interested.
        >
        > - Ken Leiter
        >
        > _______________________________________________
        > users mailing list
        > us...@open-mpi.org<mailto:us...@open-mpi.org> 
<mailto:us...@open-mpi.org<mailto:us...@open-mpi.org>>
        > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        > Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27094.php

        _______________________________________________
        users mailing list
        us...@open-mpi.org<mailto:us...@open-mpi.org> 
<mailto:us...@open-mpi.org<mailto:us...@open-mpi.org>>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27095.php



_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27103.php

Attachment: ompi_info_output.tar.bz2
Description: ompi_info_output.tar.bz2

Reply via email to