Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-28 Thread Adam LeBlanc
Hello all, Thank you all for the suggestions. Takahiro suggestion has gotten me to a point were all of the test will run but as soon as it gets to the clean up step IMB will seg fault again. I opened an issues on IMB's Github but I guess I am not gonna be able to get much help from them. So I

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-21 Thread Peter Kjellström
On Wed, 20 Feb 2019 10:46:10 -0500 Adam LeBlanc wrote: > Hello, > > When I do a run with OpenMPI v4.0.0 on Infiniband with this command: > mpirun --mca btl_openib_warn_no_device_params_found 0 --map-by node > --mca orte_base_help_aggregate 0 --mca btl openib,vader,self --mca > pml ob1 --mca

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Llolsten Kaonga
nMPI v4.0.0 signal 11 (Segmentation fault) Hello Howard, Thanks for all of the help and suggestions I will look into them. I also realized that my ansible wasn't setup properly for handling tar files so the nightly build didn't even install, but will do it by hand and will give you an up

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Kawashima, Takahiro
Hello Adam, IMB had a bug related to Reduce_scatter. https://github.com/intel/mpi-benchmarks/pull/11 I'm not sure this bug is the cause but you can try the patch. https://github.com/intel/mpi-benchmarks/commit/841446d8cf4ca1f607c0f24b9a424ee39ee1f569 Thanks, Takahiro Kawashima, Fujitsu

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread George Bosilca
I was not able to reproduce the issue with openib on the 4.0, but instead I randomly segfault in MPI finalize during the grdma cleanup. I could however reproduce the TCP timeout part with both 4.0 and master, on a pretty sane cluster (only 3 interfaces, lo, eth0 and virbr0). With no surprise, the

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
Hello Howard, Thanks for all of the help and suggestions I will look into them. I also realized that my ansible wasn't setup properly for handling tar files so the nightly build didn't even install, but will do it by hand and will give you an update tomorrow somewhere in the afternoon. Thanks,

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Howard Pritchard
Hello Adam, This helps some. Could you post first 20 lines of you config.log. This will help in trying to reproduce. The content of your host file (you can use generic names for the nodes if that'a an issue to publicize) would also help as the number of nodes and number of MPI processes/node

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
On tcp side it doesn't seg fault anymore but will timeout on some tests but on the openib side it will still seg fault, here is the output: [pandora:19256] *** Process received signal *** [pandora:19256] Signal: Segmentation fault (11) [pandora:19256] Signal code: Address not mapped (1)

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Jeff Squyres (jsquyres) via users
Can you try the latest 4.0.x nightly snapshot and see if the problem still occurs? https://www.open-mpi.org/nightly/v4.0.x/ > On Feb 20, 2019, at 1:40 PM, Adam LeBlanc wrote: > > I do here is the output: > > 2 total processes killed (some possibly by mpirun during cleanup) >

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
I do here is the output: 2 total processes killed (some possibly by mpirun during cleanup) [pandora:12238] *** Process received signal *** [pandora:12238] Signal: Segmentation fault (11) [pandora:12238] Signal code: Invalid permissions (2) [pandora:12238] Failing at address: 0x7f5c8e31fff0

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Howard Pritchard
HI Adam, As a sanity check, if you try to use --mca btl self,vader,tcp do you still see the segmentation fault? Howard Am Mi., 20. Feb. 2019 um 08:50 Uhr schrieb Adam LeBlanc < alebl...@iol.unh.edu>: > Hello, > > When I do a run with OpenMPI v4.0.0 on Infiniband with this command: > mpirun

[OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
Hello, When I do a run with OpenMPI v4.0.0 on Infiniband with this command: mpirun --mca btl_openib_warn_no_device_params_found 0 --map-by node --mca orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_allow_ib 1 -np 6 -hostfile /home/aleblanc/ib-mpi-hosts