This is what I did for my build — not much going on there: ../openmpi-3.1.3/configure --prefix=/opt/sw/packages/gcc-4_8/openmpi/3.1.3 --with-pmi && \ make -j32
We have a mixture of types of Infiniband, using the RHEL-supplied Infiniband packages. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 20, 2019, at 1:46 PM, Gabriel, Edgar <egabr...@central.uh.edu> wrote: > > Well, the way you describe it, it sounds to me like maybe an atomic issue > with this compiler version. What was your configure line of Open MPI, and > what network interconnect are you using? > > An easy way to test this theory would be to force OpenMPI to use the tcp > interfaces (everything will be slow however). You can do that by creating in > your home directory a directory called .openmpi, and add there a file called > mca-params.conf > > The file should look something like this: > > btl = tcp,self > > > > Thanks > Edgar > > > >> -----Original Message----- >> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan >> Novosielski >> Sent: Wednesday, February 20, 2019 12:02 PM >> To: Open MPI Users <users@lists.open-mpi.org> >> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI >> 3.1.3 >> >> Does it make any sense that it seems to work fine when OpenMPI and HDF5 >> are built with GCC 7.4 and GCC 8.2, but /not/ when they are built with RHEL- >> supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 build, >> I did try an XFS filesystem and it didn’t help. GPFS works fine for either >> of the >> 7.4 and 8.2 builds. >> >> Just as a reminder, since it was reasonably far back in the thread, what I’m >> doing is running the “make check” tests in HDF5 1.10.4, in part because users >> use it, but also because it seems to have a good test suite and I can >> therefore >> verify the compiler and MPI stack installs. I get very little information, >> apart >> from it not working and getting that “Alarm clock” message. >> >> I originally suspected I’d somehow built some component of this with a host- >> specific optimization that wasn’t working on some compute nodes. But I >> controlled for that and it didn’t seem to make any difference. >> >> -- >> ____ >> || \\UTGERS, >> |---------------------------*O*--------------------------- >> ||_// the State | Ryan Novosielski - novos...@rutgers.edu >> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus >> || \\ of NJ | Office of Advanced Research Computing - MSB C630, >> Newark >> `' >> >>> On Feb 18, 2019, at 1:34 PM, Ryan Novosielski <novos...@rutgers.edu> >> wrote: >>> >>> It didn’t work any better with XFS, as it happens. Must be something else. >> I’m going to test some more and see if I can narrow it down any, as it seems >> to me that it did work with a different compiler. >>> >>> -- >>> ____ >>> || \\UTGERS, >>> |---------------------------*O*--------------------------- >>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu >>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS >> Campus >>> || \\ of NJ | Office of Advanced Research Computing - MSB C630, >> Newark >>> `' >>> >>>> On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar <egabr...@central.uh.edu> >> wrote: >>>> >>>> While I was working on something else, I let the tests run with Open MPI >> master (which is for parallel I/O equivalent to the upcoming v4.0.1 >> release), >> and here is what I found for the HDF5 1.10.4 tests on my local desktop: >>>> >>>> In the testpar directory, there is in fact one test that fails for both >>>> ompio >> and romio321 in exactly the same manner. >>>> I used 6 processes as you did (although I used mpirun directly instead of >> srun...) From the 13 tests in the testpar directory, 12 pass correctly >> (t_bigio, >> t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, t_mpi, >> t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame). >>>> >>>> The one tests that officially fails ( t_pflush1) actually reports that it >>>> passed, >> but then throws message that indicates that MPI_Abort has been called, for >> both ompio and romio. I will try to investigate this test to see what is >> going >> on. >>>> >>>> That being said, your report shows an issue in t_mpi, which passes >> without problems for me. This is however not GPFS, this was an XFS local file >> system. Running the tests on GPFS are on my todo list as well. >>>> >>>> Thanks >>>> Edgar >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of >>>>> Gabriel, Edgar >>>>> Sent: Sunday, February 17, 2019 10:34 AM >>>>> To: Open MPI Users <users@lists.open-mpi.org> >>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems >>>>> w/OpenMPI >>>>> 3.1.3 >>>>> >>>>> I will also run our testsuite and the HDF5 testsuite on GPFS, I have >>>>> access to a GPFS file system since recently, and will report back on >>>>> that, but it will take a few days. >>>>> >>>>> Thanks >>>>> Edgar >>>>> >>>>>> -----Original Message----- >>>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of >>>>>> Ryan Novosielski >>>>>> Sent: Sunday, February 17, 2019 2:37 AM >>>>>> To: users@lists.open-mpi.org >>>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems >>>>>> w/OpenMPI >>>>>> 3.1.3 >>>>>> >>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> Hash: SHA1 >>>>>> >>>>>> This is on GPFS. I'll try it on XFS to see if it makes any difference. >>>>>> >>>>>> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote: >>>>>>> Ryan, >>>>>>> >>>>>>> What filesystem are you running on ? >>>>>>> >>>>>>> Open MPI defaults to the ompio component, except on Lustre >>>>>>> filesystem where ROMIO is used. (if the issue is related to ROMIO, >>>>>>> that can explain why you did not see any difference, in that case, >>>>>>> you might want to try an other filesystem (local filesystem or NFS >>>>>>> for example)\ >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski >>>>>>> <novos...@rutgers.edu> wrote: >>>>>>>> >>>>>>>> I verified that it makes it through to a bash prompt, but I’m a >>>>>>>> little less confident that something make test does doesn’t clear it. >>>>>>>> Any recommendation for a way to verify? >>>>>>>> >>>>>>>> In any case, no change, unfortunately. >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar >>>>>>>>> <egabr...@central.uh.edu> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> What file system are you running on? >>>>>>>>> >>>>>>>>> I will look into this, but it might be later next week. I just >>>>>>>>> wanted to emphasize that we are regularly running the parallel >>>>>>>>> hdf5 tests with ompio, and I am not aware of any outstanding >>>>>>>>> items that do not work (and are supposed to work). That being >>>>>>>>> said, I run the tests manually, and not the 'make test' >>>>>>>>> commands. Will have to check which tests are being run by that. >>>>>>>>> >>>>>>>>> Edgar >>>>>>>>> >>>>>>>>>> -----Original Message----- From: users >>>>>>>>>> [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles >>>>>>>>>> Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open >>>>>>>>>> MPI Users <users@lists.open-mpi.org> Subject: Re: >>>>>>>>>> [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI >>>>>>>>>> 3.1.3 >>>>>>>>>> >>>>>>>>>> Ryan, >>>>>>>>>> >>>>>>>>>> Can you >>>>>>>>>> >>>>>>>>>> export OMPI_MCA_io=^ompio >>>>>>>>>> >>>>>>>>>> and try again after you made sure this environment variable is >>>>>>>>>> passed by srun to the MPI tasks ? >>>>>>>>>> >>>>>>>>>> We have identified and fixed several issues specific to the >>>>>>>>>> (default) ompio component, so that could be a valid workaround >>>>>>>>>> until the next release. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Gilles >>>>>>>>>> >>>>>>>>>> Ryan Novosielski <novos...@rutgers.edu> wrote: >>>>>>>>>>> Hi there, >>>>>>>>>>> >>>>>>>>>>> Honestly don’t know which piece of this puzzle to look at or >>>>>>>>>>> how to get more >>>>>>>>>> information for troubleshooting. I successfully built HDF5 >>>>>>>>>> 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running >>>>>>>>>> the “make check” in HDF5 is failing at the below point; I am >>>>>>>>>> using a value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t >>>>>>>>>> 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly >>>>>>>>>> configured. >>>>>>>>>>> >>>>>>>>>>> Thanks for any help you can provide. >>>>>>>>>>> >>>>>>>>>>> make[4]: Entering directory >>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build- >>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar' >>>>>>>>>>> ============================ Testing t_mpi >>>>>>>>>>> ============================ t_mpi Test Log >>>>>>>>>>> ============================ srun: job 84126610 queued >> and >>>>>> waiting >>>>>>>>>>> for resources srun: job 84126610 has been allocated resources >>>>>>>>>>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user >>>>>>>>>>> 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata >>>>>>>>>>> 5152maxresident)k 0inputs+0outputs >>>>> (0major+1529minor)pagefaults >>>>>>>>>>> 0swaps make[4]: *** [t_mpi.chkexe_] Error 1 make[4]: Leaving >>>>>>>>>>> directory >>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build- >>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar' >>>>>>>>>>> make[3]: *** [build-check-p] Error 1 make[3]: Leaving >>>>>>>>>>> directory >>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build- >>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar' >>>>>>>>>>> make[2]: *** [test] Error 2 make[2]: Leaving directory >>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build- >>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar' >>>>>>>>>>> make[1]: *** [check-am] Error 2 make[1]: Leaving directory >>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build- >>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar' >>>>>>>>>>> make: *** [check-recursive] Error 1 >>>>>>>>>>> >>>>>>>>>>> -- ____ || \\UTGERS, >>>>>>>>>>> |---------------------------*O*--------------------------- >>>>>>>>>>> ||_// the State | Ryan Novosielski - >>>>>>>>>>> novos...@rutgers.edu || \\ University | Sr. Technologist - >>>>>>>>>>> 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | >>>>>>>>>>> Office of Advanced Research Computing - MSB C630, Newark `' >>>>>>>>>> _______________________________________________ users >>>>> mailing >>>>>> list >>>>>>>>>> users@lists.open-mpi.org >>>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>>>>>> _______________________________________________ users >> mailing >>>>>> list >>>>>>>>> users@lists.open-mpi.org >>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>>>>> _______________________________________________ users >> mailing >>>>> list >>>>>>>> users@lists.open-mpi.org >>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>>>> _______________________________________________ users mailing >> list >>>>>>> users@lists.open-mpi.org >>>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>>>> >>>>>> >>>>>> - -- >>>>>> ____ >>>>>> || \\UTGERS, |----------------------*O*------------------------ >>>>>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu >>>>>> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus >>>>>> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark >>>>>> `' >>>>>> -----BEGIN PGP SIGNATURE----- >>>>>> >>>>>> >>>>> >> iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXGkdJQAKCRCZv6Bp >>>>>> 0Ryx >>>>>> >>>>> >> vvO3AKChC0/SZ74xeY95WjYEgFhVz+bXlACfYZWEKe4ZDbbbafGAcCuMF04yIgs >>>>>> = >>>>>> =6QM1 >>>>>> -----END PGP SIGNATURE----- >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org >>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users