Edgar,

t_pflush1 does not call MPI_Finalize(), that is why there is an error message regardless ompio or romio is used.

I naively tried to call MPI_Finalize(), but it causes the program to hang.


Cheers,


Gilles

On 2/19/2019 2:23 AM, Gabriel, Edgar wrote:
While I was working on something else, I let the tests run with Open MPI master 
(which is for parallel I/O equivalent to the upcoming v4.0.1  release), and 
here is what I found for the HDF5 1.10.4 tests on my local desktop:

In the testpar directory, there is in fact one test that fails for both ompio 
and romio321 in exactly the same manner.
I used 6 processes as you did (although I used mpirun directly  instead of 
srun...) From the 13 tests in the testpar directory, 12 pass correctly 
(t_bigio, t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, 
t_mpi, t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).

The one tests that officially fails ( t_pflush1) actually reports that it 
passed, but then throws message that indicates that MPI_Abort has been called, 
for both ompio and romio. I will try to investigate this test to see what is 
going on.

That being said, your report shows an issue in t_mpi, which passes without 
problems for me. This is however not GPFS, this was an XFS local file system. 
Running the tests on GPFS are on my todo list as well.

Thanks
Edgar



-----Original Message-----
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Gabriel, Edgar
Sent: Sunday, February 17, 2019 10:34 AM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

I will also run our testsuite and the HDF5 testsuite on GPFS, I have access to a
GPFS file system since recently, and will report back on that, but it will take 
a
few days.

Thanks
Edgar

-----Original Message-----
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Ryan Novosielski
Sent: Sunday, February 17, 2019 2:37 AM
To: users@lists.open-mpi.org
Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is on GPFS. I'll try it on XFS to see if it makes any difference.

On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
Ryan,

What filesystem are you running on ?

Open MPI defaults to the ompio component, except on Lustre
filesystem where ROMIO is used. (if the issue is related to ROMIO,
that can explain why you did not see any difference, in that case,
you might want to try an other filesystem (local filesystem or NFS
for example)\


Cheers,

Gilles

On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
<novos...@rutgers.edu> wrote:
I verified that it makes it through to a bash prompt, but I’m a
little less confident that something make test does doesn’t clear it.
Any recommendation for a way to verify?

In any case, no change, unfortunately.

Sent from my iPhone

On Feb 16, 2019, at 08:13, Gabriel, Edgar
<egabr...@central.uh.edu>
wrote:

What file system are you running on?

I will look into this, but it might be later next week. I just
wanted to emphasize that we are regularly running the parallel
hdf5 tests with ompio, and I am not aware of any outstanding items
that do not work (and are supposed to work). That being said, I
run the tests manually, and not the 'make test'
commands. Will have to check which tests are being run by that.

Edgar

-----Original Message----- From: users
[mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open
MPI Users <users@lists.open-mpi.org> Subject: Re:
[OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
3.1.3

Ryan,

Can you

export OMPI_MCA_io=^ompio

and try again after you made sure this environment variable is
passed by srun to the MPI tasks ?

We have identified and fixed several issues specific to the
(default) ompio component, so that could be a valid workaround
until the next release.

Cheers,

Gilles

Ryan Novosielski <novos...@rutgers.edu> wrote:
Hi there,

Honestly don’t know which piece of this puzzle to look at or how
to get more
information for troubleshooting. I successfully built HDF5
1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running the
“make check” in HDF5 is failing at the below point; I am using a
value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
configured.
Thanks for any help you can provide.

make[4]: Entering directory
`/scratch/novosirj/install-files/hdf5-1.10.4-build-
gcc-4.8-openmpi-3.1.3/testpar'
============================ Testing  t_mpi
============================ t_mpi  Test Log
============================ srun: job 84126610 queued and
waiting
for resources srun: job 84126610 has been allocated resources
srun: error: slepner023: tasks 0-5: Alarm clock 0.01user
0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata
5152maxresident)k 0inputs+0outputs
(0major+1529minor)pagefaults
0swaps make[4]: *** [t_mpi.chkexe_] Error 1 make[4]: Leaving
directory
`/scratch/novosirj/install-files/hdf5-1.10.4-build-
gcc-4.8-openmpi-3.1.3/testpar'
make[3]: *** [build-check-p] Error 1 make[3]: Leaving directory
`/scratch/novosirj/install-files/hdf5-1.10.4-build-
gcc-4.8-openmpi-3.1.3/testpar'
make[2]: *** [test] Error 2 make[2]: Leaving directory
`/scratch/novosirj/install-files/hdf5-1.10.4-build-
gcc-4.8-openmpi-3.1.3/testpar'
make[1]: *** [check-am] Error 2 make[1]: Leaving directory
`/scratch/novosirj/install-files/hdf5-1.10.4-build-
gcc-4.8-openmpi-3.1.3/testpar'
make: *** [check-recursive] Error 1

-- ____ || \\UTGERS,
|---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski -
novos...@rutgers.edu || \\ University | Sr. Technologist -
973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\    of NJ     |
Office of Advanced Research Computing - MSB C630, Newark `'
_______________________________________________ users
mailing
list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing
list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing
list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

- --
  ____
  || \\UTGERS,     |----------------------*O*------------------------
  ||_// the State  |    Ryan Novosielski - novos...@rutgers.edu
  || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
  ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark
       `'
-----BEGIN PGP SIGNATURE-----


iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXGkdJQAKCRCZv6Bp
0Ryx

vvO3AKChC0/SZ74xeY95WjYEgFhVz+bXlACfYZWEKe4ZDbbbafGAcCuMF04yIgs
=
=6QM1
-----END PGP SIGNATURE-----
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to