Re: [OMPI devel] Help needed debugging openmpi 3.1 builds for Fedora

2018-12-17 Thread Jeff Squyres (jsquyres) via devel
Orion --

Thanks for the bug report; I filed it here:

https://github.com/open-mpi/ompi/issues/6200


> On Dec 16, 2018, at 9:55 PM, Orion Poplawski  wrote:
> 
> On 12/15/18 1:13 PM, devel@lists.open-mpi.org wrote:
>> I'm testing out rebuilding Fedora packages with openmpi 3.1 in Fedora COPR:
>> https://copr.fedorainfracloud.org/coprs/g/scitech/openmpi3.1/builds/
>> A number of packages are failing running tests only on Fedora Rawhide x86_64 
>> with processes killed with signal 4 (Illegal instruction).  For example:
>> + 
>> PYTHONPATH=/builddir/build/BUILDROOT/mpi4py-3.0.0-6.git39ca78422646.fc30.x86_64/usr/lib64/python2.7/site-packages/openmpi
>>  + mpiexec -n 1 python2 test/runtests.py -v --no-builddir 
>> --thread-level=serialized -e spawn
>> BUILDSTDERR: 
>> --
>> BUILDSTDERR: Primary job  terminated normally, but 1 process returned
>> BUILDSTDERR: a non-zero exit code. Per user-direction, the job has been 
>> aborted.
>> BUILDSTDERR: 
>> --
>> BUILDSTDERR: 
>> --
>> BUILDSTDERR: mpiexec noticed that process rank 0 with PID 0 on node 
>> 656ae442c6bf45fe9b45c5481f41bc45 exited on signal 4 (Illegal instruction).
>> BUILDSTDERR: 
>> --
>> Unfortunately I have been unable to reproduce this in any local mock builds. 
>>  So I'm left wondering if this is some kind of peculiarity with the COPR 
>> builders or if there is a real problem with openmpi.  Any suggestions for 
>> how to further debug this would be greatly appreciated.
>> (PID 0 seems very odd - that seems to be the same in the different failures)
>> - Orion
> 
> I believe I have tracked this down to the libpsm2 library.  If anyone here is 
> interested, I've filed a bug here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1659852
> 
> -- 
> Orion Poplawski
> Manager of NWRA Technical Systems  720-772-5637
> NWRA, Boulder/CoRA Office FAX: 303-415-9702
> 3380 Mitchell Lane   or...@nwra.com
> Boulder, CO 80301 https://www.nwra.com/
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Help needed debugging openmpi 3.1 builds for Fedora

2018-12-16 Thread Orion Poplawski

On 12/15/18 1:13 PM, devel@lists.open-mpi.org wrote:

I'm testing out rebuilding Fedora packages with openmpi 3.1 in Fedora COPR:

https://copr.fedorainfracloud.org/coprs/g/scitech/openmpi3.1/builds/

A number of packages are failing running tests only on Fedora Rawhide 
x86_64 with processes killed with signal 4 (Illegal instruction).  For 
example:


+ 
PYTHONPATH=/builddir/build/BUILDROOT/mpi4py-3.0.0-6.git39ca78422646.fc30.x86_64/usr/lib64/python2.7/site-packages/openmpi 

+ mpiexec -n 1 python2 test/runtests.py -v --no-builddir 
--thread-level=serialized -e spawn
BUILDSTDERR: 
--

BUILDSTDERR: Primary job  terminated normally, but 1 process returned
BUILDSTDERR: a non-zero exit code. Per user-direction, the job has been 
aborted.
BUILDSTDERR: 
--
BUILDSTDERR: 
--
BUILDSTDERR: mpiexec noticed that process rank 0 with PID 0 on node 
656ae442c6bf45fe9b45c5481f41bc45 exited on signal 4 (Illegal instruction).
BUILDSTDERR: 
--


Unfortunately I have been unable to reproduce this in any local mock 
builds.  So I'm left wondering if this is some kind of peculiarity with 
the COPR builders or if there is a real problem with openmpi.  Any 
suggestions for how to further debug this would be greatly appreciated.


(PID 0 seems very odd - that seems to be the same in the different 
failures)


- Orion


I believe I have tracked this down to the libpsm2 library.  If anyone 
here is interested, I've filed a bug here:

https://bugzilla.redhat.com/show_bug.cgi?id=1659852

--
Orion Poplawski
Manager of NWRA Technical Systems  720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane   or...@nwra.com
Boulder, CO 80301 https://www.nwra.com/
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Help needed debugging openmpi 3.1 builds for Fedora

2018-12-15 Thread Orion Poplawski

I'm testing out rebuilding Fedora packages with openmpi 3.1 in Fedora COPR:

https://copr.fedorainfracloud.org/coprs/g/scitech/openmpi3.1/builds/

A number of packages are failing running tests only on Fedora Rawhide 
x86_64 with processes killed with signal 4 (Illegal instruction).  For 
example:


+ 
PYTHONPATH=/builddir/build/BUILDROOT/mpi4py-3.0.0-6.git39ca78422646.fc30.x86_64/usr/lib64/python2.7/site-packages/openmpi
+ mpiexec -n 1 python2 test/runtests.py -v --no-builddir 
--thread-level=serialized -e spawn
BUILDSTDERR: 
--

BUILDSTDERR: Primary job  terminated normally, but 1 process returned
BUILDSTDERR: a non-zero exit code. Per user-direction, the job has been 
aborted.
BUILDSTDERR: 
--
BUILDSTDERR: 
--
BUILDSTDERR: mpiexec noticed that process rank 0 with PID 0 on node 
656ae442c6bf45fe9b45c5481f41bc45 exited on signal 4 (Illegal instruction).
BUILDSTDERR: 
--


Unfortunately I have been unable to reproduce this in any local mock 
builds.  So I'm left wondering if this is some kind of peculiarity with 
the COPR builders or if there is a real problem with openmpi.  Any 
suggestions for how to further debug this would be greatly appreciated.


(PID 0 seems very odd - that seems to be the same in the different failures)

- Orion

--
Orion Poplawski
Manager of NWRA Technical Systems  720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane   or...@nwra.com
Boulder, CO 80301 https://www.nwra.com/
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel