Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
My previous response was composed too quickly.
I should have said "successfully built and RUN".

-Paul


On Wed, Aug 24, 2016 at 9:04 PM, Gilles Gouaillardet 
wrote:

> Thanks Paul !
>
>
> yes, this snapshot does include the patch i posted earlier.
>
> btw, the issue was a runtime error, not a build error.
>
>
> Cheers,
>
>
> Gilles
>
> On 8/25/2016 12:00 PM, Paul Hargrove wrote:
>
> Giles,
>
> I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's nightly
> tarball) on Solaris 11.3 with both the Gnu and Studio compilers.  Based on
> Ralph's previous email, I assume that included the patch you had directed
> me to (though I did not attempt to verify that myself).
>
> -Paul
>
> On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove 
> wrote:
>
>> Ralph,
>>
>> That will allow me to test much sooner.
>>
>> -Paul
>>
>> On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org 
>> wrote:
>>
>>> When you do, that PR has already been committed, so you can just pull
>>> the next nightly 2.x tarball and test from there
>>>
>>> On Aug 24, 2016, at 10:39 AM, Paul Hargrove  wrote:
>>>
>>> I am afraid it might take a day or two before I can get to testing that
>>> patch.
>>>
>>> -Paul
>>>
>>> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet >> > wrote:
>>>
 Paul,


 you can download a patch at https://patch-diff.githubuserc
 ontent.com/raw/open-mpi/ompi-release/pull/1336.patch

 (note you need recent autotools in order to use it)


 Cheers,


 Gilles

 On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:

 Looks like Solaris has a “getupeercred” - can you take a look at it,
 Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
 sec component.


 On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:

 I took a quick glance at this one, and the only way I can see to get
 that error is from this block of code:

 #if defined(HAVE_STRUCT_UCRED_UID)
 euid = ucred.uid;
 gid = ucred.gid;
 #else
 euid = ucred.cr_uid;
 gid = ucred.cr_gid;
 #endif

 #elif defined(HAVE_GETPEEREID)
 pmix_output_verbose(2, pmix_globals.debug_output,
 "sec:native checking getpeereid for peer
 credentials");
 if (0 != getpeereid(peer->sd, , )) {
 pmix_output_verbose(2, pmix_globals.debug_output,
 "sec: getsockopt getpeereid failed: %s",
 strerror (pmix_socket_errno));
 return PMIX_ERR_INVALID_CRED;
 }
 #else
 return PMIX_ERR_NOT_SUPPORTED;
 #endif


 I can only surmise, therefore, that Solaris doesn’t pass either of the
 two #if define’d tests. Is there a Solaris alternative?


 On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:

 Thanks Gilles!

 On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
 gilles.gouaillar...@gmail.com> wrote:

 Thanks Paul,

 at first glance, something is going wrong in the sec module under
 solaris.
 I will keep digging tomorrow

 Cheers,

 Gilles

 On Tuesday, August 23, 2016, Paul Hargrove  wrote:

> On Solaris 11.3 on x86-64:
>
> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
> examples/ring_c'
> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
> at line 529
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line
> 983
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line
> 199
> 
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> 
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
> abort,

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Gilles Gouaillardet

Thanks Paul !


yes, this snapshot does include the patch i posted earlier.

btw, the issue was a runtime error, not a build error.


Cheers,


Gilles


On 8/25/2016 12:00 PM, Paul Hargrove wrote:

Giles,

I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's 
nightly tarball) on Solaris 11.3 with both the Gnu and Studio 
compilers.  Based on Ralph's previous email, I assume that included 
the patch you had directed me to (though I did not attempt to verify 
that myself).


-Paul

On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove > wrote:


Ralph,

That will allow me to test much sooner.

-Paul

On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org
 > wrote:

When you do, that PR has already been committed, so you can
just pull the next nightly 2.x tarball and test from there


On Aug 24, 2016, at 10:39 AM, Paul Hargrove
> wrote:

I am afraid it might take a day or two before I can get to
testing that patch.

-Paul

On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet
> wrote:

Paul,


you can download a patch at

https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch



(note you need recent autotools in order to use it)


Cheers,


Gilles


On 8/23/2016 10:40 PM, r...@open-mpi.org
 wrote:

Looks like Solaris has a “getupeercred” - can you take a
look at it, Gilles? We’d have to add that to our
AC_CHECK_FUNCS and update the native sec component.



On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org
 wrote:

I took a quick glance at this one, and the only way I
can see to get that error is from this block of code:

#if defined(HAVE_STRUCT_UCRED_UID)
euid = ucred.uid;
gid = ucred.gid;
#else
euid = ucred.cr_uid;
gid = ucred.cr_gid;
#endif

#elif defined(HAVE_GETPEEREID)
pmix_output_verbose(2, pmix_globals.debug_output,
"sec:native checking getpeereid for peer credentials");
if (0 != getpeereid(peer->sd, , )) {
pmix_output_verbose(2, pmix_globals.debug_output,
"sec: getsockopt getpeereid failed: %s",
strerror (pmix_socket_errno));
return PMIX_ERR_INVALID_CRED;
}
#else
return PMIX_ERR_NOT_SUPPORTED;
#endif


I can only surmise, therefore, that Solaris doesn’t
pass either of the two #if define’d tests. Is there a
Solaris alternative?



On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org
 wrote:

Thanks Gilles!


On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet
> wrote:

Thanks Paul,

at first glance, something is going wrong in the sec
module under solaris.
I will keep digging tomorrow

Cheers,

Gilles

On Tuesday, August 23, 2016, Paul Hargrove
> wrote:

On Solaris 11.3 on x86-64:

$ mpirun -mca btl sm,self,openib -np 2 -host
pcp-d-3,pcp-d-4 examples/ring_c'
[pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file

/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
at line 529
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file

/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 983
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file

/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 199

--
It looks like MPI_INIT failed for some reason;
your parallel process is
likely to abort. There are many reasons that a
parallel process can
fail during MPI_INIT; some of which are due to

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
Giles,

I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's nightly
tarball) on Solaris 11.3 with both the Gnu and Studio compilers.  Based on
Ralph's previous email, I assume that included the patch you had directed
me to (though I did not attempt to verify that myself).

-Paul

On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove  wrote:

> Ralph,
>
> That will allow me to test much sooner.
>
> -Paul
>
> On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org 
> wrote:
>
>> When you do, that PR has already been committed, so you can just pull the
>> next nightly 2.x tarball and test from there
>>
>> On Aug 24, 2016, at 10:39 AM, Paul Hargrove  wrote:
>>
>> I am afraid it might take a day or two before I can get to testing that
>> patch.
>>
>> -Paul
>>
>> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet 
>> wrote:
>>
>>> Paul,
>>>
>>>
>>> you can download a patch at https://patch-diff.githubuserc
>>> ontent.com/raw/open-mpi/ompi-release/pull/1336.patch
>>>
>>> (note you need recent autotools in order to use it)
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>>>
>>> Looks like Solaris has a “getupeercred” - can you take a look at it,
>>> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
>>> sec component.
>>>
>>>
>>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>>>
>>> I took a quick glance at this one, and the only way I can see to get
>>> that error is from this block of code:
>>>
>>> #if defined(HAVE_STRUCT_UCRED_UID)
>>> euid = ucred.uid;
>>> gid = ucred.gid;
>>> #else
>>> euid = ucred.cr_uid;
>>> gid = ucred.cr_gid;
>>> #endif
>>>
>>> #elif defined(HAVE_GETPEEREID)
>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>> "sec:native checking getpeereid for peer
>>> credentials");
>>> if (0 != getpeereid(peer->sd, , )) {
>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>> "sec: getsockopt getpeereid failed: %s",
>>> strerror (pmix_socket_errno));
>>> return PMIX_ERR_INVALID_CRED;
>>> }
>>> #else
>>> return PMIX_ERR_NOT_SUPPORTED;
>>> #endif
>>>
>>>
>>> I can only surmise, therefore, that Solaris doesn’t pass either of the
>>> two #if define’d tests. Is there a Solaris alternative?
>>>
>>>
>>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>>>
>>> Thanks Gilles!
>>>
>>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>> Thanks Paul,
>>>
>>> at first glance, something is going wrong in the sec module under
>>> solaris.
>>> I will keep digging tomorrow
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Tuesday, August 23, 2016, Paul Hargrove  wrote:
>>>
 On Solaris 11.3 on x86-64:

 $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
 examples/ring_c'
 [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
 /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
 .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
 at line 529
 [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
 /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
 .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
 [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
 /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
 .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
 
 --
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or
 environment
 problems.  This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):

   ompi_mpi_init: ompi_rte_init failed
   --> Returned "(null)" (-43) instead of "Success" (0)
 
 --
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***and potentially your MPI job)
 [pcp-d-4:25078] Local abort before MPI_INIT completed completed
 successfully, but am not able to aggregate error messages, and not able to
 guarantee that all other processes were killed!
 ---
 Primary job  terminated normally, but 1 process returned
 a non-zero exit code.. Per user-direction, the job has been aborted.
 ---
 

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
Ralph,

That will allow me to test much sooner.

-Paul

On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org  wrote:

> When you do, that PR has already been committed, so you can just pull the
> next nightly 2.x tarball and test from there
>
> On Aug 24, 2016, at 10:39 AM, Paul Hargrove  wrote:
>
> I am afraid it might take a day or two before I can get to testing that
> patch.
>
> -Paul
>
> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet 
> wrote:
>
>> Paul,
>>
>>
>> you can download a patch at https://patch-diff.githubuserc
>> ontent.com/raw/open-mpi/ompi-release/pull/1336.patch
>>
>> (note you need recent autotools in order to use it)
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>>
>> Looks like Solaris has a “getupeercred” - can you take a look at it,
>> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
>> sec component.
>>
>>
>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>>
>> I took a quick glance at this one, and the only way I can see to get that
>> error is from this block of code:
>>
>> #if defined(HAVE_STRUCT_UCRED_UID)
>> euid = ucred.uid;
>> gid = ucred.gid;
>> #else
>> euid = ucred.cr_uid;
>> gid = ucred.cr_gid;
>> #endif
>>
>> #elif defined(HAVE_GETPEEREID)
>> pmix_output_verbose(2, pmix_globals.debug_output,
>> "sec:native checking getpeereid for peer
>> credentials");
>> if (0 != getpeereid(peer->sd, , )) {
>> pmix_output_verbose(2, pmix_globals.debug_output,
>> "sec: getsockopt getpeereid failed: %s",
>> strerror (pmix_socket_errno));
>> return PMIX_ERR_INVALID_CRED;
>> }
>> #else
>> return PMIX_ERR_NOT_SUPPORTED;
>> #endif
>>
>>
>> I can only surmise, therefore, that Solaris doesn’t pass either of the
>> two #if define’d tests. Is there a Solaris alternative?
>>
>>
>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>>
>> Thanks Gilles!
>>
>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>> Thanks Paul,
>>
>> at first glance, something is going wrong in the sec module under solaris.
>> I will keep digging tomorrow
>>
>> Cheers,
>>
>> Gilles
>>
>> On Tuesday, August 23, 2016, Paul Hargrove  wrote:
>>
>>> On Solaris 11.3 on x86-64:
>>>
>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
>>> examples/ring_c'
>>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c at
>>> line 529
>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
>>> 
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>>   ompi_mpi_init: ompi_rte_init failed
>>>   --> Returned "(null)" (-43) instead of "Success" (0)
>>> 
>>> --
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***and potentially your MPI job)
>>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed
>>> successfully, but am not able to aggregate error messages, and not able to
>>> guarantee that all other processes were killed!
>>> ---
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>> ---
>>> 
>>> --
>>> mpirun detected that one or more processes exited with non-zero status,
>>> thus causing
>>> the job to be terminated. The first process to do so was:
>>>
>>>   Process name: [[25599,1],1]
>>>   Exit code:1
>>> 
>>> --
>>>
>>> -Paul
>>>
>>> --
>>> Paul H. Hargrove  phhargr...@lbl.gov
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department   Tel: 

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread r...@open-mpi.org
When you do, that PR has already been committed, so you can just pull the next 
nightly 2.x tarball and test from there

> On Aug 24, 2016, at 10:39 AM, Paul Hargrove  wrote:
> 
> I am afraid it might take a day or two before I can get to testing that patch.
> 
> -Paul
> 
> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet  > wrote:
> Paul,
> 
> 
> you can download a patch at 
> https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch
>  
> 
> (note you need recent autotools in order to use it)
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> On 8/23/2016 10:40 PM, r...@open-mpi.org  wrote:
>> Looks like Solaris has a “getupeercred” - can you take a look at it, Gilles? 
>> We’d have to add that to our AC_CHECK_FUNCS and update the native sec 
>> component.
>> 
>> 
>>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org  
>>> wrote:
>>> 
>>> I took a quick glance at this one, and the only way I can see to get that 
>>> error is from this block of code:
>>> 
>>> #if defined(HAVE_STRUCT_UCRED_UID)
>>> euid = ucred.uid;
>>> gid = ucred.gid;
>>> #else
>>> euid = ucred.cr_uid;
>>> gid = ucred.cr_gid;
>>> #endif
>>> 
>>> #elif defined(HAVE_GETPEEREID)
>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>> "sec:native checking getpeereid for peer 
>>> credentials");
>>> if (0 != getpeereid(peer->sd, , )) {
>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>> "sec: getsockopt getpeereid failed: %s",
>>> strerror (pmix_socket_errno));
>>> return PMIX_ERR_INVALID_CRED;
>>> }
>>> #else
>>> return PMIX_ERR_NOT_SUPPORTED;
>>> #endif
>>> 
>>> 
>>> I can only surmise, therefore, that Solaris doesn’t pass either of the two 
>>> #if define’d tests. Is there a Solaris alternative?
>>> 
>>> 
 On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org  
 wrote:
 
 Thanks Gilles!
 
> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet 
> > 
> wrote:
> 
> Thanks Paul,
> 
> at first glance, something is going wrong in the sec module under solaris.
> I will keep digging tomorrow 
> 
> Cheers,
> 
> Gilles
> 
> On Tuesday, August 23, 2016, Paul Hargrove  > wrote:
> On Solaris 11.3 on x86-64:
> 
> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4 
> examples/ring_c'
> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file 
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
>  at line 529
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>  at line 983
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>  at line 199
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or 
> environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [pcp-d-4:25078] Local abort before MPI_INIT completed completed 
> successfully, but am not able to aggregate error messages, and not able 
> to guarantee that all other processes were killed!
> ---
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> ---
> --
> mpirun detected that one or more processes exited with non-zero status, 
> thus causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: 

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
I am afraid it might take a day or two before I can get to testing that
patch.

-Paul

On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet 
wrote:

> Paul,
>
>
> you can download a patch at https://patch-diff.githubusercontent.com/raw/
> open-mpi/ompi-release/pull/1336.patch
>
> (note you need recent autotools in order to use it)
>
>
> Cheers,
>
>
> Gilles
>
> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>
> Looks like Solaris has a “getupeercred” - can you take a look at it,
> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
> sec component.
>
>
> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>
> I took a quick glance at this one, and the only way I can see to get that
> error is from this block of code:
>
> #if defined(HAVE_STRUCT_UCRED_UID)
> euid = ucred.uid;
> gid = ucred.gid;
> #else
> euid = ucred.cr_uid;
> gid = ucred.cr_gid;
> #endif
>
> #elif defined(HAVE_GETPEEREID)
> pmix_output_verbose(2, pmix_globals.debug_output,
> "sec:native checking getpeereid for peer
> credentials");
> if (0 != getpeereid(peer->sd, , )) {
> pmix_output_verbose(2, pmix_globals.debug_output,
> "sec: getsockopt getpeereid failed: %s",
> strerror (pmix_socket_errno));
> return PMIX_ERR_INVALID_CRED;
> }
> #else
> return PMIX_ERR_NOT_SUPPORTED;
> #endif
>
>
> I can only surmise, therefore, that Solaris doesn’t pass either of the two
> #if define’d tests. Is there a Solaris alternative?
>
>
> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>
> Thanks Gilles!
>
> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
> Thanks Paul,
>
> at first glance, something is going wrong in the sec module under solaris.
> I will keep digging tomorrow
>
> Cheers,
>
> Gilles
>
> On Tuesday, August 23, 2016, Paul Hargrove  wrote:
>
>> On Solaris 11.3 on x86-64:
>>
>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
>> examples/ring_c'
>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c at
>> line 529
>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
>> 
>> --
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>>   ompi_mpi_init: ompi_rte_init failed
>>   --> Returned "(null)" (-43) instead of "Success" (0)
>> 
>> --
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***and potentially your MPI job)
>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed
>> successfully, but am not able to aggregate error messages, and not able to
>> guarantee that all other processes were killed!
>> ---
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> ---
>> 
>> --
>> mpirun detected that one or more processes exited with non-zero status,
>> thus causing
>> the job to be terminated. The first process to do so was:
>>
>>   Process name: [[25599,1],1]
>>   Exit code:1
>> 
>> --
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
>
>
> ___
> devel 

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread Gilles Gouaillardet

Paul,


you can download a patch at 
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch


(note you need recent autotools in order to use it)


Cheers,


Gilles


On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
Looks like Solaris has a “getupeercred” - can you take a look at it, 
Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the 
native sec component.



On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org 
 wrote:


I took a quick glance at this one, and the only way I can see to get 
that error is from this block of code:


#if defined(HAVE_STRUCT_UCRED_UID)
euid = ucred.uid;
gid = ucred.gid;
#else
euid = ucred.cr_uid;
gid = ucred.cr_gid;
#endif

#elif defined(HAVE_GETPEEREID)
pmix_output_verbose(2, pmix_globals.debug_output,
"sec:native checking getpeereid for peer 
credentials");

if (0 != getpeereid(peer->sd, , )) {
pmix_output_verbose(2, pmix_globals.debug_output,
"sec: getsockopt getpeereid failed: %s",
strerror (pmix_socket_errno));
return PMIX_ERR_INVALID_CRED;
}
#else
return PMIX_ERR_NOT_SUPPORTED;
#endif


I can only surmise, therefore, that Solaris doesn’t pass either of 
the two #if define’d tests. Is there a Solaris alternative?



On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org 
 wrote:


Thanks Gilles!

On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet 
> wrote:


Thanks Paul,

at first glance, something is going wrong in the sec module under 
solaris.

I will keep digging tomorrow

Cheers,

Gilles

On Tuesday, August 23, 2016, Paul Hargrove > wrote:


On Solaris 11.3 on x86-64:

$ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
examples/ring_c'
[pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file

/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
at line 529
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file

/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 983
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file

/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 199
--
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort.  There are many reasons that a parallel
process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure;
here's some
additional information (which may only be relevant to an Open MPI
developer):

ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will
now abort,
***and potentially your MPI job)
[pcp-d-4:25078] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and
not able to guarantee that all other processes were killed!
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been
aborted.
---
--
mpirun detected that one or more processes exited with non-zero
status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[25599,1],1]
  Exit code:  1
--

-Paul

-- 
Paul H. Hargrove phhargr...@lbl.gov


Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
devel@lists.open-mpi.org 
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


___
devel mailing list
devel@lists.open-mpi.org 
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel






___
devel mailing list
devel@lists.open-mpi.org

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread r...@open-mpi.org
Looks like Solaris has a “getupeercred” - can you take a look at it, Gilles? 
We’d have to add that to our AC_CHECK_FUNCS and update the native sec component.


> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
> 
> I took a quick glance at this one, and the only way I can see to get that 
> error is from this block of code:
> 
> #if defined(HAVE_STRUCT_UCRED_UID)
> euid = ucred.uid;
> gid = ucred.gid;
> #else
> euid = ucred.cr_uid;
> gid = ucred.cr_gid;
> #endif
> 
> #elif defined(HAVE_GETPEEREID)
> pmix_output_verbose(2, pmix_globals.debug_output,
> "sec:native checking getpeereid for peer 
> credentials");
> if (0 != getpeereid(peer->sd, , )) {
> pmix_output_verbose(2, pmix_globals.debug_output,
> "sec: getsockopt getpeereid failed: %s",
> strerror (pmix_socket_errno));
> return PMIX_ERR_INVALID_CRED;
> }
> #else
> return PMIX_ERR_NOT_SUPPORTED;
> #endif
> 
> 
> I can only surmise, therefore, that Solaris doesn’t pass either of the two 
> #if define’d tests. Is there a Solaris alternative?
> 
> 
>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org  
>> wrote:
>> 
>> Thanks Gilles!
>> 
>>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet 
>>> > 
>>> wrote:
>>> 
>>> Thanks Paul,
>>> 
>>> at first glance, something is going wrong in the sec module under solaris.
>>> I will keep digging tomorrow 
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> On Tuesday, August 23, 2016, Paul Hargrove >> > wrote:
>>> On Solaris 11.3 on x86-64:
>>> 
>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4 
>>> examples/ring_c'
>>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file 
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
>>>  at line 529
>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>>>  at line 983
>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>>>  at line 199
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>> 
>>>   ompi_mpi_init: ompi_rte_init failed
>>>   --> Returned "(null)" (-43) instead of "Success" (0)
>>> --
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***and potentially your MPI job)
>>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed 
>>> successfully, but am not able to aggregate error messages, and not able to 
>>> guarantee that all other processes were killed!
>>> ---
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>> ---
>>> --
>>> mpirun detected that one or more processes exited with non-zero status, 
>>> thus causing
>>> the job to be terminated. The first process to do so was:
>>> 
>>>   Process name: [[25599,1],1]
>>>   Exit code:1
>>> --
>>> 
>>> -Paul
>>> 
>>> -- 
>>> Paul H. Hargrove  phhargr...@lbl.gov 
>>> 
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department   Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org 
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 

___
devel mailing list
devel@lists.open-mpi.org

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread r...@open-mpi.org
I took a quick glance at this one, and the only way I can see to get that error 
is from this block of code:

#if defined(HAVE_STRUCT_UCRED_UID)
euid = ucred.uid;
gid = ucred.gid;
#else
euid = ucred.cr_uid;
gid = ucred.cr_gid;
#endif

#elif defined(HAVE_GETPEEREID)
pmix_output_verbose(2, pmix_globals.debug_output,
"sec:native checking getpeereid for peer credentials");
if (0 != getpeereid(peer->sd, , )) {
pmix_output_verbose(2, pmix_globals.debug_output,
"sec: getsockopt getpeereid failed: %s",
strerror (pmix_socket_errno));
return PMIX_ERR_INVALID_CRED;
}
#else
return PMIX_ERR_NOT_SUPPORTED;
#endif


I can only surmise, therefore, that Solaris doesn’t pass either of the two #if 
define’d tests. Is there a Solaris alternative?


> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
> 
> Thanks Gilles!
> 
>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet 
>> > wrote:
>> 
>> Thanks Paul,
>> 
>> at first glance, something is going wrong in the sec module under solaris.
>> I will keep digging tomorrow 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Tuesday, August 23, 2016, Paul Hargrove > > wrote:
>> On Solaris 11.3 on x86-64:
>> 
>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4 examples/ring_c'
>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file 
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
>>  at line 529
>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>>  at line 983
>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>>  at line 199
>> --
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>> 
>>   ompi_mpi_init: ompi_rte_init failed
>>   --> Returned "(null)" (-43) instead of "Success" (0)
>> --
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***and potentially your MPI job)
>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed 
>> successfully, but am not able to aggregate error messages, and not able to 
>> guarantee that all other processes were killed!
>> ---
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> ---
>> --
>> mpirun detected that one or more processes exited with non-zero status, thus 
>> causing
>> the job to be terminated. The first process to do so was:
>> 
>>   Process name: [[25599,1],1]
>>   Exit code:1
>> --
>> 
>> -Paul
>> 
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov 
>> 
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread r...@open-mpi.org
Thanks Gilles!

> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet 
>  wrote:
> 
> Thanks Paul,
> 
> at first glance, something is going wrong in the sec module under solaris.
> I will keep digging tomorrow 
> 
> Cheers,
> 
> Gilles
> 
> On Tuesday, August 23, 2016, Paul Hargrove  > wrote:
> On Solaris 11.3 on x86-64:
> 
> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4 examples/ring_c'
> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file 
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
>  at line 529
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>  at line 983
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file 
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
>  at line 199
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [pcp-d-4:25078] Local abort before MPI_INIT completed completed successfully, 
> but am not able to aggregate error messages, and not able to guarantee that 
> all other processes were killed!
> ---
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> ---
> --
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: [[25599,1],1]
>   Exit code:1
> --
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov 
> 
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread Gilles Gouaillardet
Thanks Paul,

at first glance, something is going wrong in the sec module under solaris.
I will keep digging tomorrow

Cheers,

Gilles

On Tuesday, August 23, 2016, Paul Hargrove  wrote:

> On Solaris 11.3 on x86-64:
>
> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
> examples/ring_c'
> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-
> 2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c at
> line 529
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-
> 2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-
> 2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [pcp-d-4:25078] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
> ---
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> ---
> --
> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
>   Process name: [[25599,1],1]
>   Exit code:1
> --
>
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> 
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel