Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window
Thanks Bart, I opened https://github.com/open-mpi/ompi/issues/6394 to track this issue, and we should follow-up there from now. FWIW, I added a more minimal example, and a possible fix. Cheers, Gilles On 2/18/2019 12:43 AM, Bart Janssens wrote: I just tried on master (commit 91d05f91e28d3614d8b5da707df2505d8564ecd3), the same crash still happens there. On 16 Feb 2019, 17:15 +0100, Open MPI Users , wrote: Probably not. I think this is now fixed. Might be worth trying master to verify. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3
I will also run our testsuite and the HDF5 testsuite on GPFS, I have access to a GPFS file system since recently, and will report back on that, but it will take a few days. Thanks Edgar > -Original Message- > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan > Novosielski > Sent: Sunday, February 17, 2019 2:37 AM > To: users@lists.open-mpi.org > Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI > 3.1.3 > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > This is on GPFS. I'll try it on XFS to see if it makes any difference. > > On 2/16/19 11:57 PM, Gilles Gouaillardet wrote: > > Ryan, > > > > What filesystem are you running on ? > > > > Open MPI defaults to the ompio component, except on Lustre filesystem > > where ROMIO is used. (if the issue is related to ROMIO, that can > > explain why you did not see any difference, in that case, you might > > want to try an other filesystem (local filesystem or NFS for example)\ > > > > > > Cheers, > > > > Gilles > > > > On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski > > wrote: > >> > >> I verified that it makes it through to a bash prompt, but I’m a > >> little less confident that something make test does doesn’t clear it. > >> Any recommendation for a way to verify? > >> > >> In any case, no change, unfortunately. > >> > >> Sent from my iPhone > >> > >>> On Feb 16, 2019, at 08:13, Gabriel, Edgar > >>> wrote: > >>> > >>> What file system are you running on? > >>> > >>> I will look into this, but it might be later next week. I just > >>> wanted to emphasize that we are regularly running the parallel > >>> hdf5 tests with ompio, and I am not aware of any outstanding items > >>> that do not work (and are supposed to work). That being said, I run > >>> the tests manually, and not the 'make test' > >>> commands. Will have to check which tests are being run by that. > >>> > >>> Edgar > >>> > -Original Message- From: users > [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles > Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open MPI > Users Subject: Re: > [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI > 3.1.3 > > Ryan, > > Can you > > export OMPI_MCA_io=^ompio > > and try again after you made sure this environment variable is > passed by srun to the MPI tasks ? > > We have identified and fixed several issues specific to the > (default) ompio component, so that could be a valid workaround > until the next release. > > Cheers, > > Gilles > > Ryan Novosielski wrote: > > Hi there, > > > > Honestly don’t know which piece of this puzzle to look at or how > > to get more > information for troubleshooting. I successfully built HDF5 > 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running the > “make check” in HDF5 is failing at the below point; I am using a > value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t > 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly > configured. > > > > Thanks for any help you can provide. > > > > make[4]: Entering directory > > `/scratch/novosirj/install-files/hdf5-1.10.4-build- > gcc-4.8-openmpi-3.1.3/testpar' > > Testing t_mpi > > t_mpi Test Log > > srun: job 84126610 queued and > waiting > > for resources srun: job 84126610 has been allocated resources > > srun: error: slepner023: tasks 0-5: Alarm clock 0.01user > > 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata > > 5152maxresident)k 0inputs+0outputs (0major+1529minor)pagefaults > > 0swaps make[4]: *** [t_mpi.chkexe_] Error 1 make[4]: Leaving > > directory > > `/scratch/novosirj/install-files/hdf5-1.10.4-build- > gcc-4.8-openmpi-3.1.3/testpar' > > make[3]: *** [build-check-p] Error 1 make[3]: Leaving directory > > `/scratch/novosirj/install-files/hdf5-1.10.4-build- > gcc-4.8-openmpi-3.1.3/testpar' > > make[2]: *** [test] Error 2 make[2]: Leaving directory > > `/scratch/novosirj/install-files/hdf5-1.10.4-build- > gcc-4.8-openmpi-3.1.3/testpar' > > make[1]: *** [check-am] Error 2 make[1]: Leaving directory > > `/scratch/novosirj/install-files/hdf5-1.10.4-build- > gcc-4.8-openmpi-3.1.3/testpar' > > make: *** [check-recursive] Error 1 > > > > -- || \\UTGERS, > > |---*O*--- > > ||_// the State | Ryan Novosielski - > > novos...@rutgers.edu || \\ University | Sr. Technologist - > > 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | > > Office of Advanced Research Computing - MSB C630, Newark `' > ___ users mailing > list > users@lists.open-mpi.org >
Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window
I just tried on master (commit 91d05f91e28d3614d8b5da707df2505d8564ecd3), the same crash still happens there. On 16 Feb 2019, 17:15 +0100, Open MPI Users , wrote: > > Probably not. I think this is now fixed. Might be worth trying master to > verify. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 This is on GPFS. I'll try it on XFS to see if it makes any difference. On 2/16/19 11:57 PM, Gilles Gouaillardet wrote: > Ryan, > > What filesystem are you running on ? > > Open MPI defaults to the ompio component, except on Lustre > filesystem where ROMIO is used. (if the issue is related to ROMIO, > that can explain why you did not see any difference, in that case, > you might want to try an other filesystem (local filesystem or NFS > for example)\ > > > Cheers, > > Gilles > > On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski > wrote: >> >> I verified that it makes it through to a bash prompt, but I’m a >> little less confident that something make test does doesn’t clear >> it. Any recommendation for a way to verify? >> >> In any case, no change, unfortunately. >> >> Sent from my iPhone >> >>> On Feb 16, 2019, at 08:13, Gabriel, Edgar >>> wrote: >>> >>> What file system are you running on? >>> >>> I will look into this, but it might be later next week. I just >>> wanted to emphasize that we are regularly running the parallel >>> hdf5 tests with ompio, and I am not aware of any outstanding >>> items that do not work (and are supposed to work). That being >>> said, I run the tests manually, and not the 'make test' >>> commands. Will have to check which tests are being run by >>> that. >>> >>> Edgar >>> -Original Message- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open MPI Users Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3 Ryan, Can you export OMPI_MCA_io=^ompio and try again after you made sure this environment variable is passed by srun to the MPI tasks ? We have identified and fixed several issues specific to the (default) ompio component, so that could be a valid workaround until the next release. Cheers, Gilles Ryan Novosielski wrote: > Hi there, > > Honestly don’t know which piece of this puzzle to look at > or how to get more information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is failing at the below point; I am using a value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly configured. > > Thanks for any help you can provide. > > make[4]: Entering directory > `/scratch/novosirj/install-files/hdf5-1.10.4-build- gcc-4.8-openmpi-3.1.3/testpar' > Testing t_mpi > t_mpi Test Log > srun: job 84126610 queued and > waiting for resources srun: job 84126610 has been allocated > resources srun: error: slepner023: tasks 0-5: Alarm clock > 0.01user 0.00system 20:03.95elapsed 0%CPU > (0avgtext+0avgdata 5152maxresident)k 0inputs+0outputs > (0major+1529minor)pagefaults 0swaps make[4]: *** > [t_mpi.chkexe_] Error 1 make[4]: Leaving directory > `/scratch/novosirj/install-files/hdf5-1.10.4-build- gcc-4.8-openmpi-3.1.3/testpar' > make[3]: *** [build-check-p] Error 1 make[3]: Leaving > directory > `/scratch/novosirj/install-files/hdf5-1.10.4-build- gcc-4.8-openmpi-3.1.3/testpar' > make[2]: *** [test] Error 2 make[2]: Leaving directory > `/scratch/novosirj/install-files/hdf5-1.10.4-build- gcc-4.8-openmpi-3.1.3/testpar' > make[1]: *** [check-am] Error 2 make[1]: Leaving directory > `/scratch/novosirj/install-files/hdf5-1.10.4-build- gcc-4.8-openmpi-3.1.3/testpar' > make: *** [check-recursive] Error 1 > > -- || \\UTGERS, > |---*O*--- > ||_// the State | Ryan Novosielski - > novos...@rutgers.edu || \\ University | Sr. Technologist - > 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | > Office of Advanced Research Computing - MSB C630, Newark > `' ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users >>> ___ users mailing >>> list users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> ___ users mailing >> list users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > ___ users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > - -- || \\UTGERS, |--*O* ||_// the State |Ryan Novosielski - novos...@rutgers.edu ||