Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-25 Thread Peter Kjellström
FYI, Just noticed this post from the hdf group:

https://forum.hdfgroup.org/t/hdf5-and-openmpi/5437

/Peter K


pgpmcS_mBlpzB.pgp
Description: OpenPGP digital signature
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-21 Thread Ryan Novosielski
> On Feb 20, 2019, at 7:14 PM, Gilles Gouaillardet  wrote:
> 
> Ryan,
> 
> That being said, the "Alarm clock" message looks a bit suspicious.
> 
> Does it always occur at 20+ minutes elapsed ?
> 
> Is there some mechanism that automatically kills a job if it does not write 
> anything to stdout for some time ?
> 
> A quick way to rule that out is to
> 
> srun -- mpi=pmi2 -p main -t 1:00:00 -n6 -N1 sleep 1800
> 
> and see if that completes or get killed with the same error message.

FWIW, the “sleep” completes just fine:

[novosirj@amarel-test2 testpar]$ sacct -j 84173276 -M perceval -o 
jobid,jobname,start,end,node,state
   JobIDJobName   Start EndNodeList 
 State
 -- --- --- --- 
--
84173276  sleep 2019-02-21T14:46:03 2019-02-21T15:16:03 node077 
 COMPLETED
84173276.ex+ extern 2019-02-21T14:46:03 2019-02-21T15:16:03 node077 
 COMPLETED
84173276.0sleep 2019-02-21T14:46:03 2019-02-21T15:16:03 node077 
 COMPLETED

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
 `'



signature.asc
Description: Message signed with OpenPGP
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-21 Thread Ryan Novosielski
Related to this or not, I also get a hang on MVAPICH2 2.3 compiled with GCC 
8.2, but on t_filters_parallel, not t_mpi. With that combo, though, I get a 
segfault, or at least a message about one. It’s only “Alarm clock” on the GCC 
4.8 with OpenMPI 3.1.3 combo. It also happens at the ~20 minute mark, FWIW.

Testing  t_filters_parallel

 t_filters_parallel  Test Log

srun: job 84117363 queued and waiting for resources
srun: job 84117363 has been allocated resources
[slepner063.amarel.rutgers.edu:mpi_rank_0][error_sighandler] Caught error: 
Segmentation fault (signal 11)
srun: error: slepner063: task 0: Segmentation fault
srun: error: slepner063: tasks 1-3: Alarm clock
0.01user 0.01system 20:01.44elapsed 0%CPU (0avgtext+0avgdata 5144maxresident)k
0inputs+0outputs (0major+1524minor)pagefaults 0swaps
make[4]: *** [t_filters_parallel.chkexe_] Error 1
make[4]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[3]: *** [build-check-p] Error 1
make[3]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[2]: *** [test] Error 2
make[2]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make: *** [check-recursive] Error 1

> On Feb 21, 2019, at 3:03 PM, Gabriel, Edgar  wrote:
> 
> Yes, I was talking about the same thing, although for me it was not t_mpi, 
> but t_shapesame that was hanging. It might be an indication of the same issue 
> however.
> 
>> -Original Message-
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
>> Novosielski
>> Sent: Thursday, February 21, 2019 1:59 PM
>> To: Open MPI Users 
>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>> 3.1.3
>> 
>> 
>>> On Feb 21, 2019, at 2:52 PM, Gabriel, Edgar 
>> wrote:
>>> 
>>>> -Original Message-
>>>>> Does it always occur at 20+ minutes elapsed ?
>>>> 
>>>> Aha! Yes, you are right: every time it fails, it’s at the 20 minute
>>>> and a couple of seconds mark. For comparison, every time it runs, it
>>>> runs for 2-3 seconds total. So it seems like what might actually be
>>>> happening here is a hang, and not a failure of the test per se.
>>>> 
>>> 
>>> I *think* I can confirm that. I compiled 3.1.3 yesterday with gcc 4.8
>> (although this was OpenSuSE, not Redhat), and it looked to me like one of
>> tests were hanging, but I didn't have time to investigate it further.
>> 
>> Just to be clear, the hanging test I have is t_mpi from HDF5 1.10.4. The
>> OpenMPI 3.1.3 make check passes just fine on all of our builds. But I don’t
>> believe it ever launches any jobs or anything like that.
>> 
>> --
>> 
>> || \\UTGERS,  
>> |---*O*---
>> ||_// the State   | Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>> ||  \\of NJ   | Office of Advanced Research Computing - MSB C630,
>> Newark
>> `'
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



signature.asc
Description: Message signed with OpenPGP
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-21 Thread Gabriel, Edgar
Yes, I was talking about the same thing, although for me it was not t_mpi, but 
t_shapesame that was hanging. It might be an indication of the same issue 
however.

> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
> Novosielski
> Sent: Thursday, February 21, 2019 1:59 PM
> To: Open MPI Users 
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> 
> > On Feb 21, 2019, at 2:52 PM, Gabriel, Edgar 
> wrote:
> >
> >> -Original Message-
> >>> Does it always occur at 20+ minutes elapsed ?
> >>
> >> Aha! Yes, you are right: every time it fails, it’s at the 20 minute
> >> and a couple of seconds mark. For comparison, every time it runs, it
> >> runs for 2-3 seconds total. So it seems like what might actually be
> >> happening here is a hang, and not a failure of the test per se.
> >>
> >
> > I *think* I can confirm that. I compiled 3.1.3 yesterday with gcc 4.8
> (although this was OpenSuSE, not Redhat), and it looked to me like one of
> tests were hanging, but I didn't have time to investigate it further.
> 
> Just to be clear, the hanging test I have is t_mpi from HDF5 1.10.4. The
> OpenMPI 3.1.3 make check passes just fine on all of our builds. But I don’t
> believe it ever launches any jobs or anything like that.
> 
> --
> 
> || \\UTGERS,   
> |---*O*---
> ||_// the State| Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ| Office of Advanced Research Computing - MSB C630,
> Newark
>  `'

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-21 Thread Ryan Novosielski

> On Feb 21, 2019, at 2:52 PM, Gabriel, Edgar  wrote:
> 
>> -Original Message-
>>> Does it always occur at 20+ minutes elapsed ?
>> 
>> Aha! Yes, you are right: every time it fails, it’s at the 20 minute and a 
>> couple
>> of seconds mark. For comparison, every time it runs, it runs for 2-3 seconds
>> total. So it seems like what might actually be happening here is a hang, and
>> not a failure of the test per se.
>> 
> 
> I *think* I can confirm that. I compiled 3.1.3 yesterday with gcc 4.8 
> (although this was OpenSuSE, not Redhat), and it looked to me like one of 
> tests were hanging, but I didn't have time to investigate it further.

Just to be clear, the hanging test I have is t_mpi from HDF5 1.10.4. The 
OpenMPI 3.1.3 make check passes just fine on all of our builds. But I don’t 
believe it ever launches any jobs or anything like that.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
 `'



signature.asc
Description: Message signed with OpenPGP
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-21 Thread Gabriel, Edgar
icially fails ( t_pflush1) actually reports that it
> passed, but then throws message that indicates that MPI_Abort has been
> called, for both ompio and romio. I will try to investigate this test to see 
> what
> is going on.
> >>>>
> >>>> That being said, your report shows an issue in t_mpi, which passes
> without problems for me. This is however not GPFS, this was an XFS local file
> system. Running the tests on GPFS are on my todo list as well.
> >>>>
> >>>> Thanks
> >>>> Edgar
> >>>>
> >>>>
> >>>>
> >>>>> -Original Message-----
> >>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
> >>>>> Gabriel, Edgar
> >>>>> Sent: Sunday, February 17, 2019 10:34 AM
> >>>>> To: Open MPI Users 
> >>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems
> >>>>> w/OpenMPI
> >>>>> 3.1.3
> >>>>>
> >>>>> I will also run our testsuite and the HDF5 testsuite on GPFS, I
> >>>>> have access to a GPFS file system since recently, and will report
> >>>>> back on that, but it will take a few days.
> >>>>>
> >>>>> Thanks
> >>>>> Edgar
> >>>>>
> >>>>>> -Original Message-
> >>>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf
> >>>>>> Of Ryan Novosielski
> >>>>>> Sent: Sunday, February 17, 2019 2:37 AM
> >>>>>> To: users@lists.open-mpi.org
> >>>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems
> >>>>>> w/OpenMPI
> >>>>>> 3.1.3
> >>>>>>
> >>>>>> -BEGIN PGP SIGNED MESSAGE-
> >>>>>> Hash: SHA1
> >>>>>>
> >>>>>> This is on GPFS. I'll try it on XFS to see if it makes any difference.
> >>>>>>
> >>>>>> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
> >>>>>>> Ryan,
> >>>>>>>
> >>>>>>> What filesystem are you running on ?
> >>>>>>>
> >>>>>>> Open MPI defaults to the ompio component, except on Lustre
> >>>>>>> filesystem where ROMIO is used. (if the issue is related to
> >>>>>>> ROMIO, that can explain why you did not see any difference, in
> >>>>>>> that case, you might want to try an other filesystem (local
> >>>>>>> filesystem or NFS for example)\
> >>>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>
> >>>>>>> Gilles
> >>>>>>>
> >>>>>>> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
> >>>>>>>  wrote:
> >>>>>>>> I verified that it makes it through to a bash prompt, but I’m a
> >>>>>>>> little less confident that something make test does doesn’t clear it.
> >>>>>>>> Any recommendation for a way to verify?
> >>>>>>>>
> >>>>>>>> In any case, no change, unfortunately.
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
> >>>>>>>>> 
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> What file system are you running on?
> >>>>>>>>>
> >>>>>>>>> I will look into this, but it might be later next week. I just
> >>>>>>>>> wanted to emphasize that we are regularly running the parallel
> >>>>>>>>> hdf5 tests with ompio, and I am not aware of any outstanding
> >>>>>>>>> items that do not work (and are supposed to work). That being
> >>>>>>>>> said, I run the tests manually, and not the 'make test'
> >>>>>>>>> commands. Will have to check which tests are being run by that.
> >>>>>>>>>
> >>>>>>>>> Edgar
> >>>>>>>>>
>

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-21 Thread Ryan Novosielski
 srun...) From the 13 tests in the testpar directory, 12 pass correctly 
>>>> (t_bigio, t_cache, t_cache_image, testphdf5, t_filters_parallel, 
>>>> t_init_term, t_mpi, t_pflush2, t_pread, t_prestart, t_pshutdown, 
>>>> t_shapesame).
>>>> 
>>>> The one tests that officially fails ( t_pflush1) actually reports that it 
>>>> passed, but then throws message that indicates that MPI_Abort has been 
>>>> called, for both ompio and romio. I will try to investigate this test to 
>>>> see what is going on.
>>>> 
>>>> That being said, your report shows an issue in t_mpi, which passes without 
>>>> problems for me. This is however not GPFS, this was an XFS local file 
>>>> system. Running the tests on GPFS are on my todo list as well.
>>>> 
>>>> Thanks
>>>> Edgar
>>>> 
>>>> 
>>>> 
>>>>> -----Original Message-
>>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>>>> Gabriel, Edgar
>>>>> Sent: Sunday, February 17, 2019 10:34 AM
>>>>> To: Open MPI Users 
>>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>>>>> 3.1.3
>>>>> 
>>>>> I will also run our testsuite and the HDF5 testsuite on GPFS, I have 
>>>>> access to a
>>>>> GPFS file system since recently, and will report back on that, but it 
>>>>> will take a
>>>>> few days.
>>>>> 
>>>>> Thanks
>>>>> Edgar
>>>>> 
>>>>>> -Original Message-
>>>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>>>>> Ryan Novosielski
>>>>>> Sent: Sunday, February 17, 2019 2:37 AM
>>>>>> To: users@lists.open-mpi.org
>>>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>>>>>> 3.1.3
>>>>>> 
>>>>>> -BEGIN PGP SIGNED MESSAGE-
>>>>>> Hash: SHA1
>>>>>> 
>>>>>> This is on GPFS. I'll try it on XFS to see if it makes any difference.
>>>>>> 
>>>>>> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
>>>>>>> Ryan,
>>>>>>> 
>>>>>>> What filesystem are you running on ?
>>>>>>> 
>>>>>>> Open MPI defaults to the ompio component, except on Lustre
>>>>>>> filesystem where ROMIO is used. (if the issue is related to ROMIO,
>>>>>>> that can explain why you did not see any difference, in that case,
>>>>>>> you might want to try an other filesystem (local filesystem or NFS
>>>>>>> for example)\
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Gilles
>>>>>>> 
>>>>>>> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
>>>>>>>  wrote:
>>>>>>>> I verified that it makes it through to a bash prompt, but I’m a
>>>>>>>> little less confident that something make test does doesn’t clear it.
>>>>>>>> Any recommendation for a way to verify?
>>>>>>>> 
>>>>>>>> In any case, no change, unfortunately.
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> What file system are you running on?
>>>>>>>>> 
>>>>>>>>> I will look into this, but it might be later next week. I just
>>>>>>>>> wanted to emphasize that we are regularly running the parallel
>>>>>>>>> hdf5 tests with ompio, and I am not aware of any outstanding items
>>>>>>>>> that do not work (and are supposed to work). That being said, I
>>>>>>>>> run the tests manually, and not the 'make test'
>>>>>>>>> commands. Will have to check which tests are being run by that.
>>>>>>>>> 
>>>>>>>>> Edgar
>>>>>>>>> 
>>&

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Gilles Gouaillardet

Ryan,


as Edgar explained, that could be a compiler issue (fwiw, I am unable to 
reproduce the bug)


You can build Open MPI again and pass --disable-builtin-atomics to the 
configure command line.



That being said, the "Alarm clock" message looks a bit suspicious.

Does it always occur at 20+ minutes elapsed ?

Is there some mechanism that automatically kills a job if it does not 
write anything to stdout for some time ?


A quick way to rule that out is to

srun -- mpi=pmi2 -p main -t 1:00:00 -n6 -N1 sleep 1800

and see if that completes or get killed with the same error message.


You can also run use mpirun instead of srun, and even run mpirun outside 
of slurm


(if your cluster policy allows it, you can for example use mpirun and 
run on the frontend node)



Cheers,


Gilles

On 2/21/2019 3:01 AM, Ryan Novosielski wrote:

Does it make any sense that it seems to work fine when OpenMPI and HDF5 are 
built with GCC 7.4 and GCC 8.2, but /not/ when they are built with 
RHEL-supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 
build, I did try an XFS filesystem and it didn’t help. GPFS works fine for 
either of the 7.4 and 8.2 builds.

Just as a reminder, since it was reasonably far back in the thread, what I’m 
doing is running the “make check” tests in HDF5 1.10.4, in part because users 
use it, but also because it seems to have a good test suite and I can therefore 
verify the compiler and MPI stack installs. I get very little information, 
apart from it not working and getting that “Alarm clock” message.

I originally suspected I’d somehow built some component of this with a 
host-specific optimization that wasn’t working on some compute nodes. But I 
controlled for that and it didn’t seem to make any difference.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
  `'


On Feb 18, 2019, at 1:34 PM, Ryan Novosielski  wrote:

It didn’t work any better with XFS, as it happens. Must be something else. I’m 
going to test some more and see if I can narrow it down any, as it seems to me 
that it did work with a different compiler.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
 `'


On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar  wrote:

While I was working on something else, I let the tests run with Open MPI master 
(which is for parallel I/O equivalent to the upcoming v4.0.1  release), and 
here is what I found for the HDF5 1.10.4 tests on my local desktop:

In the testpar directory, there is in fact one test that fails for both ompio 
and romio321 in exactly the same manner.
I used 6 processes as you did (although I used mpirun directly  instead of 
srun...) From the 13 tests in the testpar directory, 12 pass correctly 
(t_bigio, t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, 
t_mpi, t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).

The one tests that officially fails ( t_pflush1) actually reports that it 
passed, but then throws message that indicates that MPI_Abort has been called, 
for both ompio and romio. I will try to investigate this test to see what is 
going on.

That being said, your report shows an issue in t_mpi, which passes without 
problems for me. This is however not GPFS, this was an XFS local file system. 
Running the tests on GPFS are on my todo list as well.

Thanks
Edgar




-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Gabriel, Edgar
Sent: Sunday, February 17, 2019 10:34 AM
To: Open MPI Users 
Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

I will also run our testsuite and the HDF5 testsuite on GPFS, I have access to a
GPFS file system since recently, and will report back on that, but it will take 
a
few days.

Thanks
Edgar


-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Ryan Novosielski
Sent: Sunday, February 17, 2019 2:37 AM
To: users@lists.open-mpi.org
Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This is on GPFS. I'll try it on XFS to see if it makes any difference.

On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:

Ryan,

What filesystem are you running on ?

Open MPI defaults to the ompio component, except on Lustre
filesystem where ROMIO is used. (if the issue is related to ROMIO,
that can explain why you did not see any difference, in that case,
you might want to try an other files

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Ryan Novosielski
This is what I did for my build — not much going on there:

../openmpi-3.1.3/configure --prefix=/opt/sw/packages/gcc-4_8/openmpi/3.1.3 
--with-pmi && \
make -j32

We have a mixture of types of Infiniband, using the RHEL-supplied Infiniband 
packages.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
 `'

> On Feb 20, 2019, at 1:46 PM, Gabriel, Edgar  wrote:
> 
> Well, the way you describe it, it sounds to me like maybe an atomic issue 
> with this compiler version. What was your configure line of Open MPI, and 
> what network interconnect are you using?
> 
> An easy way to test this theory would be to force OpenMPI to use the tcp 
> interfaces (everything will be slow however). You can do that by creating in 
> your home directory a directory called .openmpi, and add there a file called 
> mca-params.conf
> 
> The file should look something like this:
> 
> btl = tcp,self
> 
> 
> 
> Thanks
> Edgar
> 
> 
> 
>> -Original Message-
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
>> Novosielski
>> Sent: Wednesday, February 20, 2019 12:02 PM
>> To: Open MPI Users 
>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>> 3.1.3
>> 
>> Does it make any sense that it seems to work fine when OpenMPI and HDF5
>> are built with GCC 7.4 and GCC 8.2, but /not/ when they are built with RHEL-
>> supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 build,
>> I did try an XFS filesystem and it didn’t help. GPFS works fine for either 
>> of the
>> 7.4 and 8.2 builds.
>> 
>> Just as a reminder, since it was reasonably far back in the thread, what I’m
>> doing is running the “make check” tests in HDF5 1.10.4, in part because users
>> use it, but also because it seems to have a good test suite and I can 
>> therefore
>> verify the compiler and MPI stack installs. I get very little information, 
>> apart
>> from it not working and getting that “Alarm clock” message.
>> 
>> I originally suspected I’d somehow built some component of this with a host-
>> specific optimization that wasn’t working on some compute nodes. But I
>> controlled for that and it didn’t seem to make any difference.
>> 
>> --
>> 
>> || \\UTGERS,  
>> |---*O*---
>> ||_// the State   | Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>> ||  \\of NJ   | Office of Advanced Research Computing - MSB C630,
>> Newark
>> `'
>> 
>>> On Feb 18, 2019, at 1:34 PM, Ryan Novosielski 
>> wrote:
>>> 
>>> It didn’t work any better with XFS, as it happens. Must be something else.
>> I’m going to test some more and see if I can narrow it down any, as it seems
>> to me that it did work with a different compiler.
>>> 
>>> --
>>> 
>>> || \\UTGERS, 
>>> |---*O*---
>>> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
>> Campus
>>> ||  \\of NJ  | Office of Advanced Research Computing - MSB C630,
>> Newark
>>>`'
>>> 
>>>> On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar 
>> wrote:
>>>> 
>>>> While I was working on something else, I let the tests run with Open MPI
>> master (which is for parallel I/O equivalent to the upcoming v4.0.1  
>> release),
>> and here is what I found for the HDF5 1.10.4 tests on my local desktop:
>>>> 
>>>> In the testpar directory, there is in fact one test that fails for both 
>>>> ompio
>> and romio321 in exactly the same manner.
>>>> I used 6 processes as you did (although I used mpirun directly  instead of
>> srun...) From the 13 tests in the testpar directory, 12 pass correctly 
>> (t_bigio,
>> t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, t_mpi,
>> t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).
>>>> 
>>>> The one tests that officially fails ( t_pflush1) actually reports that it 
>>>> passed,
>> but then throws message that indicates that MPI_Abort has been

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Gabriel, Edgar
Well, the way you describe it, it sounds to me like maybe an atomic issue with 
this compiler version. What was your configure line of Open MPI, and what 
network interconnect are you using?

An easy way to test this theory would be to force OpenMPI to use the tcp 
interfaces (everything will be slow however). You can do that by creating in 
your home directory a directory called .openmpi, and add there a file called 
mca-params.conf

The file should look something like this:

btl = tcp,self



Thanks
Edgar



> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
> Novosielski
> Sent: Wednesday, February 20, 2019 12:02 PM
> To: Open MPI Users 
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> Does it make any sense that it seems to work fine when OpenMPI and HDF5
> are built with GCC 7.4 and GCC 8.2, but /not/ when they are built with RHEL-
> supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 build,
> I did try an XFS filesystem and it didn’t help. GPFS works fine for either of 
> the
> 7.4 and 8.2 builds.
> 
> Just as a reminder, since it was reasonably far back in the thread, what I’m
> doing is running the “make check” tests in HDF5 1.10.4, in part because users
> use it, but also because it seems to have a good test suite and I can 
> therefore
> verify the compiler and MPI stack installs. I get very little information, 
> apart
> from it not working and getting that “Alarm clock” message.
> 
> I originally suspected I’d somehow built some component of this with a host-
> specific optimization that wasn’t working on some compute nodes. But I
> controlled for that and it didn’t seem to make any difference.
> 
> --
> 
> || \\UTGERS,   
> |---*O*---
> ||_// the State| Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ| Office of Advanced Research Computing - MSB C630,
> Newark
>  `'
> 
> > On Feb 18, 2019, at 1:34 PM, Ryan Novosielski 
> wrote:
> >
> > It didn’t work any better with XFS, as it happens. Must be something else.
> I’m going to test some more and see if I can narrow it down any, as it seems
> to me that it did work with a different compiler.
> >
> > --
> > 
> > || \\UTGERS, 
> > |---*O*---
> > ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> Campus
> > ||  \\of NJ  | Office of Advanced Research Computing - MSB C630,
> Newark
> > `'
> >
> >> On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar 
> wrote:
> >>
> >> While I was working on something else, I let the tests run with Open MPI
> master (which is for parallel I/O equivalent to the upcoming v4.0.1  release),
> and here is what I found for the HDF5 1.10.4 tests on my local desktop:
> >>
> >> In the testpar directory, there is in fact one test that fails for both 
> >> ompio
> and romio321 in exactly the same manner.
> >> I used 6 processes as you did (although I used mpirun directly  instead of
> srun...) From the 13 tests in the testpar directory, 12 pass correctly 
> (t_bigio,
> t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, t_mpi,
> t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).
> >>
> >> The one tests that officially fails ( t_pflush1) actually reports that it 
> >> passed,
> but then throws message that indicates that MPI_Abort has been called, for
> both ompio and romio. I will try to investigate this test to see what is going
> on.
> >>
> >> That being said, your report shows an issue in t_mpi, which passes
> without problems for me. This is however not GPFS, this was an XFS local file
> system. Running the tests on GPFS are on my todo list as well.
> >>
> >> Thanks
> >> Edgar
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
> >>> Gabriel, Edgar
> >>> Sent: Sunday, February 17, 2019 10:34 AM
> >>> To: Open MPI Users 
> >>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems
> >>> w/OpenMPI
> >>> 3.1.3
> >>>
> >>> I will also run our testsuite and the HDF5 testsuite on GPFS, I have
> >>> access to a GPFS file system since recently, and wil

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Ryan Novosielski
Does it make any sense that it seems to work fine when OpenMPI and HDF5 are 
built with GCC 7.4 and GCC 8.2, but /not/ when they are built with 
RHEL-supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 
build, I did try an XFS filesystem and it didn’t help. GPFS works fine for 
either of the 7.4 and 8.2 builds.

Just as a reminder, since it was reasonably far back in the thread, what I’m 
doing is running the “make check” tests in HDF5 1.10.4, in part because users 
use it, but also because it seems to have a good test suite and I can therefore 
verify the compiler and MPI stack installs. I get very little information, 
apart from it not working and getting that “Alarm clock” message.

I originally suspected I’d somehow built some component of this with a 
host-specific optimization that wasn’t working on some compute nodes. But I 
controlled for that and it didn’t seem to make any difference.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
 `'

> On Feb 18, 2019, at 1:34 PM, Ryan Novosielski  wrote:
> 
> It didn’t work any better with XFS, as it happens. Must be something else. 
> I’m going to test some more and see if I can narrow it down any, as it seems 
> to me that it did work with a different compiler.
> 
> --
> 
> || \\UTGERS,   
> |---*O*---
> ||_// the State| Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ| Office of Advanced Research Computing - MSB C630, 
> Newark
> `'
> 
>> On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar  wrote:
>> 
>> While I was working on something else, I let the tests run with Open MPI 
>> master (which is for parallel I/O equivalent to the upcoming v4.0.1  
>> release), and here is what I found for the HDF5 1.10.4 tests on my local 
>> desktop:
>> 
>> In the testpar directory, there is in fact one test that fails for both 
>> ompio and romio321 in exactly the same manner.
>> I used 6 processes as you did (although I used mpirun directly  instead of 
>> srun...) From the 13 tests in the testpar directory, 12 pass correctly 
>> (t_bigio, t_cache, t_cache_image, testphdf5, t_filters_parallel, 
>> t_init_term, t_mpi, t_pflush2, t_pread, t_prestart, t_pshutdown, 
>> t_shapesame).
>> 
>> The one tests that officially fails ( t_pflush1) actually reports that it 
>> passed, but then throws message that indicates that MPI_Abort has been 
>> called, for both ompio and romio. I will try to investigate this test to see 
>> what is going on.
>> 
>> That being said, your report shows an issue in t_mpi, which passes without 
>> problems for me. This is however not GPFS, this was an XFS local file 
>> system. Running the tests on GPFS are on my todo list as well.
>> 
>> Thanks
>> Edgar
>> 
>> 
>> 
>>> -----Original Message-
>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>> Gabriel, Edgar
>>> Sent: Sunday, February 17, 2019 10:34 AM
>>> To: Open MPI Users 
>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>>> 3.1.3
>>> 
>>> I will also run our testsuite and the HDF5 testsuite on GPFS, I have access 
>>> to a
>>> GPFS file system since recently, and will report back on that, but it will 
>>> take a
>>> few days.
>>> 
>>> Thanks
>>> Edgar
>>> 
>>>> -Original Message-
>>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>>> Ryan Novosielski
>>>> Sent: Sunday, February 17, 2019 2:37 AM
>>>> To: users@lists.open-mpi.org
>>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>>>> 3.1.3
>>>> 
>>>> -BEGIN PGP SIGNED MESSAGE-
>>>> Hash: SHA1
>>>> 
>>>> This is on GPFS. I'll try it on XFS to see if it makes any difference.
>>>> 
>>>> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
>>>>> Ryan,
>>>>> 
>>>>> What filesystem are you running on ?
>>>>> 
>>>>> Open MPI defaults to the ompio component, except on Lustre
>>>>> filesystem where ROMIO is used. (if the issue is related to ROMIO,
>>&

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-18 Thread Gilles Gouaillardet

Edgar,


t_pflush1 does not call MPI_Finalize(), that is why there is an error 
message regardless ompio or romio is used.


I naively tried to call MPI_Finalize(), but it causes the program to hang.


Cheers,


Gilles

On 2/19/2019 2:23 AM, Gabriel, Edgar wrote:

While I was working on something else, I let the tests run with Open MPI master 
(which is for parallel I/O equivalent to the upcoming v4.0.1  release), and 
here is what I found for the HDF5 1.10.4 tests on my local desktop:

In the testpar directory, there is in fact one test that fails for both ompio 
and romio321 in exactly the same manner.
I used 6 processes as you did (although I used mpirun directly  instead of 
srun...) From the 13 tests in the testpar directory, 12 pass correctly 
(t_bigio, t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, 
t_mpi, t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).

The one tests that officially fails ( t_pflush1) actually reports that it 
passed, but then throws message that indicates that MPI_Abort has been called, 
for both ompio and romio. I will try to investigate this test to see what is 
going on.

That being said, your report shows an issue in t_mpi, which passes without 
problems for me. This is however not GPFS, this was an XFS local file system. 
Running the tests on GPFS are on my todo list as well.

Thanks
Edgar




-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Gabriel, Edgar
Sent: Sunday, February 17, 2019 10:34 AM
To: Open MPI Users 
Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

I will also run our testsuite and the HDF5 testsuite on GPFS, I have access to a
GPFS file system since recently, and will report back on that, but it will take 
a
few days.

Thanks
Edgar


-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Ryan Novosielski
Sent: Sunday, February 17, 2019 2:37 AM
To: users@lists.open-mpi.org
Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This is on GPFS. I'll try it on XFS to see if it makes any difference.

On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:

Ryan,

What filesystem are you running on ?

Open MPI defaults to the ompio component, except on Lustre
filesystem where ROMIO is used. (if the issue is related to ROMIO,
that can explain why you did not see any difference, in that case,
you might want to try an other filesystem (local filesystem or NFS
for example)\


Cheers,

Gilles

On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
 wrote:

I verified that it makes it through to a bash prompt, but I’m a
little less confident that something make test does doesn’t clear it.
Any recommendation for a way to verify?

In any case, no change, unfortunately.

Sent from my iPhone


On Feb 16, 2019, at 08:13, Gabriel, Edgar

wrote:

What file system are you running on?

I will look into this, but it might be later next week. I just
wanted to emphasize that we are regularly running the parallel
hdf5 tests with ompio, and I am not aware of any outstanding items
that do not work (and are supposed to work). That being said, I
run the tests manually, and not the 'make test'
commands. Will have to check which tests are being run by that.

Edgar


-Original Message- From: users
[mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open
MPI Users  Subject: Re:
[OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

Ryan,

Can you

export OMPI_MCA_io=^ompio

and try again after you made sure this environment variable is
passed by srun to the MPI tasks ?

We have identified and fixed several issues specific to the
(default) ompio component, so that could be a valid workaround
until the next release.

Cheers,

Gilles

Ryan Novosielski  wrote:

Hi there,

Honestly don’t know which piece of this puzzle to look at or how
to get more

information for troubleshooting. I successfully built HDF5
1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running the
“make check” in HDF5 is failing at the below point; I am using a
value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
configured.

Thanks for any help you can provide.

make[4]: Entering directory
`/scratch/novosirj/install-files/hdf5-1.10.4-build-

gcc-4.8-openmpi-3.1.3/testpar'

 Testing  t_mpi
 t_mpi  Test Log
 srun: job 84126610 queued and

waiting

for resources srun: job 84126610 has been allocated resources
srun: error: slepner023: tasks 0-5: Alarm clock 0.01user
0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata
5152maxresident)k 0inputs+0outputs

(0major+1529minor)pagefaults

0swaps make[4]: *** [t_mpi.chkexe_] Error 1 make[4]: Leaving
directory
`/scratch/novosirj/i

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-18 Thread Ryan Novosielski
It didn’t work any better with XFS, as it happens. Must be something else. I’m 
going to test some more and see if I can narrow it down any, as it seems to me 
that it did work with a different compiler.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
 `'

> On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar  wrote:
> 
> While I was working on something else, I let the tests run with Open MPI 
> master (which is for parallel I/O equivalent to the upcoming v4.0.1  
> release), and here is what I found for the HDF5 1.10.4 tests on my local 
> desktop:
> 
> In the testpar directory, there is in fact one test that fails for both ompio 
> and romio321 in exactly the same manner.
> I used 6 processes as you did (although I used mpirun directly  instead of 
> srun...) From the 13 tests in the testpar directory, 12 pass correctly 
> (t_bigio, t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, 
> t_mpi, t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).
> 
> The one tests that officially fails ( t_pflush1) actually reports that it 
> passed, but then throws message that indicates that MPI_Abort has been 
> called, for both ompio and romio. I will try to investigate this test to see 
> what is going on.
> 
> That being said, your report shows an issue in t_mpi, which passes without 
> problems for me. This is however not GPFS, this was an XFS local file system. 
> Running the tests on GPFS are on my todo list as well.
> 
> Thanks
> Edgar
> 
> 
> 
>> -Original Message-
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>> Gabriel, Edgar
>> Sent: Sunday, February 17, 2019 10:34 AM
>> To: Open MPI Users 
>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>> 3.1.3
>> 
>> I will also run our testsuite and the HDF5 testsuite on GPFS, I have access 
>> to a
>> GPFS file system since recently, and will report back on that, but it will 
>> take a
>> few days.
>> 
>> Thanks
>> Edgar
>> 
>>> -Original Message-
>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>> Ryan Novosielski
>>> Sent: Sunday, February 17, 2019 2:37 AM
>>> To: users@lists.open-mpi.org
>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>>> 3.1.3
>>> 
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA1
>>> 
>>> This is on GPFS. I'll try it on XFS to see if it makes any difference.
>>> 
>>> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
>>>> Ryan,
>>>> 
>>>> What filesystem are you running on ?
>>>> 
>>>> Open MPI defaults to the ompio component, except on Lustre
>>>> filesystem where ROMIO is used. (if the issue is related to ROMIO,
>>>> that can explain why you did not see any difference, in that case,
>>>> you might want to try an other filesystem (local filesystem or NFS
>>>> for example)\
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
>>>>  wrote:
>>>>> 
>>>>> I verified that it makes it through to a bash prompt, but I’m a
>>>>> little less confident that something make test does doesn’t clear it.
>>>>> Any recommendation for a way to verify?
>>>>> 
>>>>> In any case, no change, unfortunately.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> What file system are you running on?
>>>>>> 
>>>>>> I will look into this, but it might be later next week. I just
>>>>>> wanted to emphasize that we are regularly running the parallel
>>>>>> hdf5 tests with ompio, and I am not aware of any outstanding items
>>>>>> that do not work (and are supposed to work). That being said, I
>>>>>> run the tests manually, and not the 'make test'
>>>>>> commands. Will have to check which tests are being run by that.
>>>>>> 
>>>>>> Edgar
>>>>>> 
>&g

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-18 Thread Gabriel, Edgar
While I was working on something else, I let the tests run with Open MPI master 
(which is for parallel I/O equivalent to the upcoming v4.0.1  release), and 
here is what I found for the HDF5 1.10.4 tests on my local desktop:

In the testpar directory, there is in fact one test that fails for both ompio 
and romio321 in exactly the same manner.
I used 6 processes as you did (although I used mpirun directly  instead of 
srun...) From the 13 tests in the testpar directory, 12 pass correctly 
(t_bigio, t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, 
t_mpi, t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame). 

The one tests that officially fails ( t_pflush1) actually reports that it 
passed, but then throws message that indicates that MPI_Abort has been called, 
for both ompio and romio. I will try to investigate this test to see what is 
going on.

That being said, your report shows an issue in t_mpi, which passes without 
problems for me. This is however not GPFS, this was an XFS local file system. 
Running the tests on GPFS are on my todo list as well.

Thanks
Edgar



> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
> Gabriel, Edgar
> Sent: Sunday, February 17, 2019 10:34 AM
> To: Open MPI Users 
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> I will also run our testsuite and the HDF5 testsuite on GPFS, I have access 
> to a
> GPFS file system since recently, and will report back on that, but it will 
> take a
> few days.
> 
> Thanks
> Edgar
> 
> > -Original Message-
> > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
> > Ryan Novosielski
> > Sent: Sunday, February 17, 2019 2:37 AM
> > To: users@lists.open-mpi.org
> > Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> > 3.1.3
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > This is on GPFS. I'll try it on XFS to see if it makes any difference.
> >
> > On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
> > > Ryan,
> > >
> > > What filesystem are you running on ?
> > >
> > > Open MPI defaults to the ompio component, except on Lustre
> > > filesystem where ROMIO is used. (if the issue is related to ROMIO,
> > > that can explain why you did not see any difference, in that case,
> > > you might want to try an other filesystem (local filesystem or NFS
> > > for example)\
> > >
> > >
> > > Cheers,
> > >
> > > Gilles
> > >
> > > On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
> > >  wrote:
> > >>
> > >> I verified that it makes it through to a bash prompt, but I’m a
> > >> little less confident that something make test does doesn’t clear it.
> > >> Any recommendation for a way to verify?
> > >>
> > >> In any case, no change, unfortunately.
> > >>
> > >> Sent from my iPhone
> > >>
> > >>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
> > >>> 
> > >>> wrote:
> > >>>
> > >>> What file system are you running on?
> > >>>
> > >>> I will look into this, but it might be later next week. I just
> > >>> wanted to emphasize that we are regularly running the parallel
> > >>> hdf5 tests with ompio, and I am not aware of any outstanding items
> > >>> that do not work (and are supposed to work). That being said, I
> > >>> run the tests manually, and not the 'make test'
> > >>> commands. Will have to check which tests are being run by that.
> > >>>
> > >>> Edgar
> > >>>
> > >>>> -Original Message- From: users
> > >>>> [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> > >>>> Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open
> > >>>> MPI Users  Subject: Re:
> > >>>> [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> > >>>> 3.1.3
> > >>>>
> > >>>> Ryan,
> > >>>>
> > >>>> Can you
> > >>>>
> > >>>> export OMPI_MCA_io=^ompio
> > >>>>
> > >>>> and try again after you made sure this environment variable is
> > >>>> passed by srun to the MPI tasks ?
> > >>>>
> > >>>> We have identified and fixed several issues specific t

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-17 Thread Gabriel, Edgar
I will also run our testsuite and the HDF5 testsuite on GPFS, I have access to 
a GPFS file system since recently, and will report back on that, but it will 
take a few days.

Thanks
Edgar

> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
> Novosielski
> Sent: Sunday, February 17, 2019 2:37 AM
> To: users@lists.open-mpi.org
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> This is on GPFS. I'll try it on XFS to see if it makes any difference.
> 
> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
> > Ryan,
> >
> > What filesystem are you running on ?
> >
> > Open MPI defaults to the ompio component, except on Lustre filesystem
> > where ROMIO is used. (if the issue is related to ROMIO, that can
> > explain why you did not see any difference, in that case, you might
> > want to try an other filesystem (local filesystem or NFS for example)\
> >
> >
> > Cheers,
> >
> > Gilles
> >
> > On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
> >  wrote:
> >>
> >> I verified that it makes it through to a bash prompt, but I’m a
> >> little less confident that something make test does doesn’t clear it.
> >> Any recommendation for a way to verify?
> >>
> >> In any case, no change, unfortunately.
> >>
> >> Sent from my iPhone
> >>
> >>> On Feb 16, 2019, at 08:13, Gabriel, Edgar 
> >>> wrote:
> >>>
> >>> What file system are you running on?
> >>>
> >>> I will look into this, but it might be later next week. I just
> >>> wanted to emphasize that we are regularly running the parallel
> >>> hdf5 tests with ompio, and I am not aware of any outstanding items
> >>> that do not work (and are supposed to work). That being said, I run
> >>> the tests manually, and not the 'make test'
> >>> commands. Will have to check which tests are being run by that.
> >>>
> >>> Edgar
> >>>
> >>>> -Original Message- From: users
> >>>> [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> >>>> Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open MPI
> >>>> Users  Subject: Re:
> >>>> [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> >>>> 3.1.3
> >>>>
> >>>> Ryan,
> >>>>
> >>>> Can you
> >>>>
> >>>> export OMPI_MCA_io=^ompio
> >>>>
> >>>> and try again after you made sure this environment variable is
> >>>> passed by srun to the MPI tasks ?
> >>>>
> >>>> We have identified and fixed several issues specific to the
> >>>> (default) ompio component, so that could be a valid workaround
> >>>> until the next release.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Gilles
> >>>>
> >>>> Ryan Novosielski  wrote:
> >>>>> Hi there,
> >>>>>
> >>>>> Honestly don’t know which piece of this puzzle to look at or how
> >>>>> to get more
> >>>> information for troubleshooting. I successfully built HDF5
> >>>> 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running the
> >>>> “make check” in HDF5 is failing at the below point; I am using a
> >>>> value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
> >>>> 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
> >>>> configured.
> >>>>>
> >>>>> Thanks for any help you can provide.
> >>>>>
> >>>>> make[4]: Entering directory
> >>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >>>> gcc-4.8-openmpi-3.1.3/testpar'
> >>>>>  Testing  t_mpi
> >>>>>  t_mpi  Test Log
> >>>>>  srun: job 84126610 queued and
> waiting
> >>>>> for resources srun: job 84126610 has been allocated resources
> >>>>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user
> >>>>> 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata
> >>>>> 5152maxresident)k 0inputs+0outputs (0major+1529minor)pagefaults
> >>>>> 0swap

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-17 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This is on GPFS. I'll try it on XFS to see if it makes any difference.

On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
> Ryan,
> 
> What filesystem are you running on ?
> 
> Open MPI defaults to the ompio component, except on Lustre
> filesystem where ROMIO is used. (if the issue is related to ROMIO,
> that can explain why you did not see any difference, in that case,
> you might want to try an other filesystem (local filesystem or NFS
> for example)\
> 
> 
> Cheers,
> 
> Gilles
> 
> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
>  wrote:
>> 
>> I verified that it makes it through to a bash prompt, but I’m a
>> little less confident that something make test does doesn’t clear
>> it. Any recommendation for a way to verify?
>> 
>> In any case, no change, unfortunately.
>> 
>> Sent from my iPhone
>> 
>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
>>>  wrote:
>>> 
>>> What file system are you running on?
>>> 
>>> I will look into this, but it might be later next week. I just
>>> wanted to emphasize that we are regularly running the parallel
>>> hdf5 tests with ompio, and I am not aware of any outstanding
>>> items that do not work (and are supposed to work). That being
>>> said, I run the tests manually, and not the 'make test'
>>> commands. Will have to check which tests are being run by
>>> that.
>>> 
>>> Edgar
>>> 
>>>> -----Original Message- From: users
>>>> [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>>> Gilles Gouaillardet Sent: Saturday, February 16, 2019 1:49
>>>> AM To: Open MPI Users  Subject: Re:
>>>> [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 
>>>> 3.1.3
>>>> 
>>>> Ryan,
>>>> 
>>>> Can you
>>>> 
>>>> export OMPI_MCA_io=^ompio
>>>> 
>>>> and try again after you made sure this environment variable
>>>> is passed by srun to the MPI tasks ?
>>>> 
>>>> We have identified and fixed several issues specific to the
>>>> (default) ompio component, so that could be a valid
>>>> workaround until the next release.
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> Ryan Novosielski  wrote:
>>>>> Hi there,
>>>>> 
>>>>> Honestly don’t know which piece of this puzzle to look at
>>>>> or how to get more
>>>> information for troubleshooting. I successfully built HDF5
>>>> 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running
>>>> the “make check” in HDF5 is failing at the below point; I am
>>>> using a value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
>>>> 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
>>>> configured.
>>>>> 
>>>>> Thanks for any help you can provide.
>>>>> 
>>>>> make[4]: Entering directory
>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>>  Testing  t_mpi 
>>>>>  t_mpi  Test Log 
>>>>>  srun: job 84126610 queued and
>>>>> waiting for resources srun: job 84126610 has been allocated
>>>>> resources srun: error: slepner023: tasks 0-5: Alarm clock
>>>>> 0.01user 0.00system 20:03.95elapsed 0%CPU
>>>>> (0avgtext+0avgdata 5152maxresident)k 0inputs+0outputs
>>>>> (0major+1529minor)pagefaults 0swaps make[4]: ***
>>>>> [t_mpi.chkexe_] Error 1 make[4]: Leaving directory
>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>> make[3]: *** [build-check-p] Error 1 make[3]: Leaving
>>>>> directory
>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>> make[2]: *** [test] Error 2 make[2]: Leaving directory
>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>> make[1]: *** [check-am] Error 2 make[1]: Leaving directory
>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>>>> gcc-4.8-openmpi-3.1.3/testpar'
>>>>> make: *** [check-recursive] Error 1
>>>>> 
>>>

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-16 Thread Gilles Gouaillardet
Ryan,

What filesystem are you running on ?

Open MPI defaults to the ompio component, except on Lustre filesystem
where ROMIO is used.
(if the issue is related to ROMIO, that can explain why you did not
see any difference,
in that case, you might want to try an other filesystem (local
filesystem or NFS for example)\


Cheers,

Gilles

On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski  wrote:
>
> I verified that it makes it through to a bash prompt, but I’m a little less 
> confident that something make test does doesn’t clear it. Any recommendation 
> for a way to verify?
>
> In any case, no change, unfortunately.
>
> Sent from my iPhone
>
> > On Feb 16, 2019, at 08:13, Gabriel, Edgar  wrote:
> >
> > What file system are you running on?
> >
> > I will look into this, but it might be later next week. I just wanted to 
> > emphasize that we are regularly running the parallel hdf5 tests with ompio, 
> > and I am not aware of any outstanding items that do not work (and are 
> > supposed to work). That being said, I run the tests manually, and not the 
> > 'make test' commands. Will have to check which tests are being run by that.
> >
> > Edgar
> >
> >> -Original Message-
> >> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> >> Gouaillardet
> >> Sent: Saturday, February 16, 2019 1:49 AM
> >> To: Open MPI Users 
> >> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> >> 3.1.3
> >>
> >> Ryan,
> >>
> >> Can you
> >>
> >> export OMPI_MCA_io=^ompio
> >>
> >> and try again after you made sure this environment variable is passed by 
> >> srun
> >> to the MPI tasks ?
> >>
> >> We have identified and fixed several issues specific to the (default) ompio
> >> component, so that could be a valid workaround until the next release.
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> Ryan Novosielski  wrote:
> >>> Hi there,
> >>>
> >>> Honestly don’t know which piece of this puzzle to look at or how to get 
> >>> more
> >> information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL
> >> system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is
> >> failing at the below point; I am using a value of RUNPARALLEL='srun --
> >> mpi=pmi2 -p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise
> >> properly configured.
> >>>
> >>> Thanks for any help you can provide.
> >>>
> >>> make[4]: Entering directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> 
> >>> Testing  t_mpi
> >>> 
> >>> t_mpi  Test Log
> >>> 
> >>> srun: job 84126610 queued and waiting for resources
> >>> srun: job 84126610 has been allocated resources
> >>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user 0.00system
> >>> 20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
> >>> 0inputs+0outputs (0major+1529minor)pagefaults 0swaps
> >>> make[4]: *** [t_mpi.chkexe_] Error 1
> >>> make[4]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make[3]: *** [build-check-p] Error 1
> >>> make[3]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make[2]: *** [test] Error 2
> >>> make[2]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make[1]: *** [check-am] Error 2
> >>> make[1]: Leaving directory 
> >>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >> gcc-4.8-openmpi-3.1.3/testpar'
> >>> make: *** [check-recursive] Error 1
> >>>
> >>> --
> >>> 
> >>> || \\UTGERS,   
> >>> |---*O*---
> >>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu
> >>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
> >>> Campus
> >>> ||  \\of NJ | Office of Advanced Research Computing - MSB C630, 
> >>> Newark
> >>>  `'
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-16 Thread Ryan Novosielski
I verified that it makes it through to a bash prompt, but I’m a little less 
confident that something make test does doesn’t clear it. Any recommendation 
for a way to verify?

In any case, no change, unfortunately. 

Sent from my iPhone

> On Feb 16, 2019, at 08:13, Gabriel, Edgar  wrote:
> 
> What file system are you running on?
> 
> I will look into this, but it might be later next week. I just wanted to 
> emphasize that we are regularly running the parallel hdf5 tests with ompio, 
> and I am not aware of any outstanding items that do not work (and are 
> supposed to work). That being said, I run the tests manually, and not the 
> 'make test' commands. Will have to check which tests are being run by that.
> 
> Edgar
> 
>> -Original Message-
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
>> Gouaillardet
>> Sent: Saturday, February 16, 2019 1:49 AM
>> To: Open MPI Users 
>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>> 3.1.3
>> 
>> Ryan,
>> 
>> Can you
>> 
>> export OMPI_MCA_io=^ompio
>> 
>> and try again after you made sure this environment variable is passed by srun
>> to the MPI tasks ?
>> 
>> We have identified and fixed several issues specific to the (default) ompio
>> component, so that could be a valid workaround until the next release.
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> Ryan Novosielski  wrote:
>>> Hi there,
>>> 
>>> Honestly don’t know which piece of this puzzle to look at or how to get more
>> information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL
>> system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is
>> failing at the below point; I am using a value of RUNPARALLEL='srun --
>> mpi=pmi2 -p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise
>> properly configured.
>>> 
>>> Thanks for any help you can provide.
>>> 
>>> make[4]: Entering directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> 
>>> Testing  t_mpi
>>> 
>>> t_mpi  Test Log
>>> 
>>> srun: job 84126610 queued and waiting for resources
>>> srun: job 84126610 has been allocated resources
>>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user 0.00system
>>> 20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
>>> 0inputs+0outputs (0major+1529minor)pagefaults 0swaps
>>> make[4]: *** [t_mpi.chkexe_] Error 1
>>> make[4]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make[3]: *** [build-check-p] Error 1
>>> make[3]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make[2]: *** [test] Error 2
>>> make[2]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make[1]: *** [check-am] Error 2
>>> make[1]: Leaving directory 
>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>> gcc-4.8-openmpi-3.1.3/testpar'
>>> make: *** [check-recursive] Error 1
>>> 
>>> --
>>> 
>>> || \\UTGERS,   
>>> |---*O*---
>>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>> ||  \\of NJ | Office of Advanced Research Computing - MSB C630, 
>>> Newark
>>>  `'
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-16 Thread Gabriel, Edgar
What file system are you running on?

I will look into this, but it might be later next week. I just wanted to 
emphasize that we are regularly running the parallel hdf5 tests with ompio, and 
I am not aware of any outstanding items that do not work (and are supposed to 
work). That being said, I run the tests manually, and not the 'make test' 
commands. Will have to check which tests are being run by that.

Edgar

> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> Gouaillardet
> Sent: Saturday, February 16, 2019 1:49 AM
> To: Open MPI Users 
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> Ryan,
> 
> Can you
> 
> export OMPI_MCA_io=^ompio
> 
> and try again after you made sure this environment variable is passed by srun
> to the MPI tasks ?
> 
> We have identified and fixed several issues specific to the (default) ompio
> component, so that could be a valid workaround until the next release.
> 
> Cheers,
> 
> Gilles
> 
> Ryan Novosielski  wrote:
> >Hi there,
> >
> >Honestly don’t know which piece of this puzzle to look at or how to get more
> information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL
> system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is
> failing at the below point; I am using a value of RUNPARALLEL='srun --
> mpi=pmi2 -p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise
> properly configured.
> >
> >Thanks for any help you can provide.
> >
> >make[4]: Entering directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >
> >Testing  t_mpi
> >
> >t_mpi  Test Log
> >
> >srun: job 84126610 queued and waiting for resources
> >srun: job 84126610 has been allocated resources
> >srun: error: slepner023: tasks 0-5: Alarm clock 0.01user 0.00system
> >20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
> >0inputs+0outputs (0major+1529minor)pagefaults 0swaps
> >make[4]: *** [t_mpi.chkexe_] Error 1
> >make[4]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make[3]: *** [build-check-p] Error 1
> >make[3]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make[2]: *** [test] Error 2
> >make[2]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make[1]: *** [check-am] Error 2
> >make[1]: Leaving directory 
> >`/scratch/novosirj/install-files/hdf5-1.10.4-build-
> gcc-4.8-openmpi-3.1.3/testpar'
> >make: *** [check-recursive] Error 1
> >
> >--
> >
> >|| \\UTGERS,  
> >|---*O*---
> >||_// the State   | Ryan Novosielski - novos...@rutgers.edu
> >|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> >||  \\of NJ   | Office of Advanced Research Computing - MSB C630, 
> >Newark
> >   `'
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-15 Thread Gilles Gouaillardet
Ryan,

Can you

export OMPI_MCA_io=^ompio

and try again after you made sure this environment variable is passed by srun 
to the MPI tasks ?

We have identified and fixed several issues specific to the (default) ompio 
component, so that could be a valid workaround until the next release.

Cheers,

Gilles

Ryan Novosielski  wrote:
>Hi there,
>
>Honestly don’t know which piece of this puzzle to look at or how to get more 
>information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL 
>system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is 
>failing at the below point; I am using a value of RUNPARALLEL='srun --mpi=pmi2 
>-p main -t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly 
>configured.
>
>Thanks for any help you can provide.
>
>make[4]: Entering directory 
>`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
>
>Testing  t_mpi
>
>t_mpi  Test Log
>
>srun: job 84126610 queued and waiting for resources
>srun: job 84126610 has been allocated resources
>srun: error: slepner023: tasks 0-5: Alarm clock
>0.01user 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
>0inputs+0outputs (0major+1529minor)pagefaults 0swaps
>make[4]: *** [t_mpi.chkexe_] Error 1
>make[4]: Leaving directory 
>`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
>make[3]: *** [build-check-p] Error 1
>make[3]: Leaving directory 
>`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
>make[2]: *** [test] Error 2
>make[2]: Leaving directory 
>`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
>make[1]: *** [check-am] Error 2
>make[1]: Leaving directory 
>`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
>make: *** [check-recursive] Error 1
>
>--
>
>|| \\UTGERS,|---*O*---
>||_// the State | Ryan Novosielski - novos...@rutgers.edu
>|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>||  \\of NJ | Office of Advanced Research Computing - MSB C630, 
>Newark
>   `'
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-15 Thread Ryan Novosielski
Hi there,

Honestly don’t know which piece of this puzzle to look at or how to get more 
information for troubleshooting. I successfully built HDF5 1.10.4 with RHEL 
system GCC 4.8.5 and OpenMPI 3.1.3. Running the “make check” in HDF5 is failing 
at the below point; I am using a value of RUNPARALLEL='srun --mpi=pmi2 -p main 
-t 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly configured.

Thanks for any help you can provide.

make[4]: Entering directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'

Testing  t_mpi

t_mpi  Test Log

srun: job 84126610 queued and waiting for resources
srun: job 84126610 has been allocated resources
srun: error: slepner023: tasks 0-5: Alarm clock
0.01user 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata 5152maxresident)k
0inputs+0outputs (0major+1529minor)pagefaults 0swaps
make[4]: *** [t_mpi.chkexe_] Error 1
make[4]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
make[3]: *** [build-check-p] Error 1
make[3]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
make[2]: *** [test] Error 2
make[2]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-openmpi-3.1.3/testpar'
make: *** [check-recursive] Error 1

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
   `'


signature.asc
Description: Message signed with OpenPGP
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users