This is what I did for my build — not much going on there:

../openmpi-3.1.3/configure --prefix=/opt/sw/packages/gcc-4_8/openmpi/3.1.3 
--with-pmi && \
        make -j32

We have a mixture of types of Infiniband, using the RHEL-supplied Infiniband 
packages.

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Feb 20, 2019, at 1:46 PM, Gabriel, Edgar <egabr...@central.uh.edu> wrote:
> 
> Well, the way you describe it, it sounds to me like maybe an atomic issue 
> with this compiler version. What was your configure line of Open MPI, and 
> what network interconnect are you using?
> 
> An easy way to test this theory would be to force OpenMPI to use the tcp 
> interfaces (everything will be slow however). You can do that by creating in 
> your home directory a directory called .openmpi, and add there a file called 
> mca-params.conf
> 
> The file should look something like this:
> 
> btl = tcp,self
> 
> 
> 
> Thanks
> Edgar
> 
> 
> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
>> Novosielski
>> Sent: Wednesday, February 20, 2019 12:02 PM
>> To: Open MPI Users <users@lists.open-mpi.org>
>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>> 3.1.3
>> 
>> Does it make any sense that it seems to work fine when OpenMPI and HDF5
>> are built with GCC 7.4 and GCC 8.2, but /not/ when they are built with RHEL-
>> supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 build,
>> I did try an XFS filesystem and it didn’t help. GPFS works fine for either 
>> of the
>> 7.4 and 8.2 builds.
>> 
>> Just as a reminder, since it was reasonably far back in the thread, what I’m
>> doing is running the “make check” tests in HDF5 1.10.4, in part because users
>> use it, but also because it seems to have a good test suite and I can 
>> therefore
>> verify the compiler and MPI stack installs. I get very little information, 
>> apart
>> from it not working and getting that “Alarm clock” message.
>> 
>> I originally suspected I’d somehow built some component of this with a host-
>> specific optimization that wasn’t working on some compute nodes. But I
>> controlled for that and it didn’t seem to make any difference.
>> 
>> --
>> ____
>> || \\UTGERS,          
>> |---------------------------*O*---------------------------
>> ||_// the State       |         Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>> ||  \\    of NJ       | Office of Advanced Research Computing - MSB C630,
>> Newark
>>     `'
>> 
>>> On Feb 18, 2019, at 1:34 PM, Ryan Novosielski <novos...@rutgers.edu>
>> wrote:
>>> 
>>> It didn’t work any better with XFS, as it happens. Must be something else.
>> I’m going to test some more and see if I can narrow it down any, as it seems
>> to me that it did work with a different compiler.
>>> 
>>> --
>>> ____
>>> || \\UTGERS,         
>>> |---------------------------*O*---------------------------
>>> ||_// the State      |         Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
>> Campus
>>> ||  \\    of NJ      | Office of Advanced Research Computing - MSB C630,
>> Newark
>>>    `'
>>> 
>>>> On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar <egabr...@central.uh.edu>
>> wrote:
>>>> 
>>>> While I was working on something else, I let the tests run with Open MPI
>> master (which is for parallel I/O equivalent to the upcoming v4.0.1  
>> release),
>> and here is what I found for the HDF5 1.10.4 tests on my local desktop:
>>>> 
>>>> In the testpar directory, there is in fact one test that fails for both 
>>>> ompio
>> and romio321 in exactly the same manner.
>>>> I used 6 processes as you did (although I used mpirun directly  instead of
>> srun...) From the 13 tests in the testpar directory, 12 pass correctly 
>> (t_bigio,
>> t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, t_mpi,
>> t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).
>>>> 
>>>> The one tests that officially fails ( t_pflush1) actually reports that it 
>>>> passed,
>> but then throws message that indicates that MPI_Abort has been called, for
>> both ompio and romio. I will try to investigate this test to see what is 
>> going
>> on.
>>>> 
>>>> That being said, your report shows an issue in t_mpi, which passes
>> without problems for me. This is however not GPFS, this was an XFS local file
>> system. Running the tests on GPFS are on my todo list as well.
>>>> 
>>>> Thanks
>>>> Edgar
>>>> 
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>>>> Gabriel, Edgar
>>>>> Sent: Sunday, February 17, 2019 10:34 AM
>>>>> To: Open MPI Users <users@lists.open-mpi.org>
>>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems
>>>>> w/OpenMPI
>>>>> 3.1.3
>>>>> 
>>>>> I will also run our testsuite and the HDF5 testsuite on GPFS, I have
>>>>> access to a GPFS file system since recently, and will report back on
>>>>> that, but it will take a few days.
>>>>> 
>>>>> Thanks
>>>>> Edgar
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>>>>> Ryan Novosielski
>>>>>> Sent: Sunday, February 17, 2019 2:37 AM
>>>>>> To: users@lists.open-mpi.org
>>>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems
>>>>>> w/OpenMPI
>>>>>> 3.1.3
>>>>>> 
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>> 
>>>>>> This is on GPFS. I'll try it on XFS to see if it makes any difference.
>>>>>> 
>>>>>> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
>>>>>>> Ryan,
>>>>>>> 
>>>>>>> What filesystem are you running on ?
>>>>>>> 
>>>>>>> Open MPI defaults to the ompio component, except on Lustre
>>>>>>> filesystem where ROMIO is used. (if the issue is related to ROMIO,
>>>>>>> that can explain why you did not see any difference, in that case,
>>>>>>> you might want to try an other filesystem (local filesystem or NFS
>>>>>>> for example)\
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Gilles
>>>>>>> 
>>>>>>> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
>>>>>>> <novos...@rutgers.edu> wrote:
>>>>>>>> 
>>>>>>>> I verified that it makes it through to a bash prompt, but I’m a
>>>>>>>> little less confident that something make test does doesn’t clear it.
>>>>>>>> Any recommendation for a way to verify?
>>>>>>>> 
>>>>>>>> In any case, no change, unfortunately.
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
>>>>>>>>> <egabr...@central.uh.edu>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> What file system are you running on?
>>>>>>>>> 
>>>>>>>>> I will look into this, but it might be later next week. I just
>>>>>>>>> wanted to emphasize that we are regularly running the parallel
>>>>>>>>> hdf5 tests with ompio, and I am not aware of any outstanding
>>>>>>>>> items that do not work (and are supposed to work). That being
>>>>>>>>> said, I run the tests manually, and not the 'make test'
>>>>>>>>> commands. Will have to check which tests are being run by that.
>>>>>>>>> 
>>>>>>>>> Edgar
>>>>>>>>> 
>>>>>>>>>> -----Original Message----- From: users
>>>>>>>>>> [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
>>>>>>>>>> Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open
>>>>>>>>>> MPI Users <users@lists.open-mpi.org> Subject: Re:
>>>>>>>>>> [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>>>>>>>>>> 3.1.3
>>>>>>>>>> 
>>>>>>>>>> Ryan,
>>>>>>>>>> 
>>>>>>>>>> Can you
>>>>>>>>>> 
>>>>>>>>>> export OMPI_MCA_io=^ompio
>>>>>>>>>> 
>>>>>>>>>> and try again after you made sure this environment variable is
>>>>>>>>>> passed by srun to the MPI tasks ?
>>>>>>>>>> 
>>>>>>>>>> We have identified and fixed several issues specific to the
>>>>>>>>>> (default) ompio component, so that could be a valid workaround
>>>>>>>>>> until the next release.
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> 
>>>>>>>>>> Gilles
>>>>>>>>>> 
>>>>>>>>>> Ryan Novosielski <novos...@rutgers.edu> wrote:
>>>>>>>>>>> Hi there,
>>>>>>>>>>> 
>>>>>>>>>>> Honestly don’t know which piece of this puzzle to look at or
>>>>>>>>>>> how to get more
>>>>>>>>>> information for troubleshooting. I successfully built HDF5
>>>>>>>>>> 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running
>>>>>>>>>> the “make check” in HDF5 is failing at the below point; I am
>>>>>>>>>> using a value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
>>>>>>>>>> 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
>>>>>>>>>> configured.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for any help you can provide.
>>>>>>>>>>> 
>>>>>>>>>>> make[4]: Entering directory
>>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>>>>>>>> ============================ Testing  t_mpi
>>>>>>>>>>> ============================ t_mpi  Test Log
>>>>>>>>>>> ============================ srun: job 84126610 queued
>> and
>>>>>> waiting
>>>>>>>>>>> for resources srun: job 84126610 has been allocated resources
>>>>>>>>>>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user
>>>>>>>>>>> 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata
>>>>>>>>>>> 5152maxresident)k 0inputs+0outputs
>>>>> (0major+1529minor)pagefaults
>>>>>>>>>>> 0swaps make[4]: *** [t_mpi.chkexe_] Error 1 make[4]: Leaving
>>>>>>>>>>> directory
>>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>>>>>>>> make[3]: *** [build-check-p] Error 1 make[3]: Leaving
>>>>>>>>>>> directory
>>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>>>>>>>> make[2]: *** [test] Error 2 make[2]: Leaving directory
>>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>>>>>>>> make[1]: *** [check-am] Error 2 make[1]: Leaving directory
>>>>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>>>>>>>> make: *** [check-recursive] Error 1
>>>>>>>>>>> 
>>>>>>>>>>> -- ____ || \\UTGERS,
>>>>>>>>>>> |---------------------------*O*---------------------------
>>>>>>>>>>> ||_// the State     |         Ryan Novosielski -
>>>>>>>>>>> novos...@rutgers.edu || \\ University | Sr. Technologist -
>>>>>>>>>>> 973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\    of NJ     |
>>>>>>>>>>> Office of Advanced Research Computing - MSB C630, Newark `'
>>>>>>>>>> _______________________________________________ users
>>>>> mailing
>>>>>> list
>>>>>>>>>> users@lists.open-mpi.org
>>>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>>>>> _______________________________________________ users
>> mailing
>>>>>> list
>>>>>>>>> users@lists.open-mpi.org
>>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>>>> _______________________________________________ users
>> mailing
>>>>> list
>>>>>>>> users@lists.open-mpi.org
>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>>> _______________________________________________ users mailing
>> list
>>>>>>> users@lists.open-mpi.org
>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>>> 
>>>>>> 
>>>>>> - --
>>>>>> ____
>>>>>> || \\UTGERS,     |----------------------*O*------------------------
>>>>>> ||_// the State  |    Ryan Novosielski - novos...@rutgers.edu
>>>>>> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
>>>>>> ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark
>>>>>>    `'
>>>>>> -----BEGIN PGP SIGNATURE-----
>>>>>> 
>>>>>> 
>>>>> 
>> iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXGkdJQAKCRCZv6Bp
>>>>>> 0Ryx
>>>>>> 
>>>>> 
>> vvO3AKChC0/SZ74xeY95WjYEgFhVz+bXlACfYZWEKe4ZDbbbafGAcCuMF04yIgs
>>>>>> =
>>>>>> =6QM1
>>>>>> -----END PGP SIGNATURE-----
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org
>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to