As a check of mpiP, I ran HDF5 testpar/t_bigio under it. This was on one node with four ranks (interactively) on lustre with its default of one 1MB stripe, ompi-4.0.5 + ucx-1.9, hdf5-1.10.7, MCA defaults.
I don't know how useful it is, but here's the summary: romio: @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% Count COV File_write_at_all 26 2.58e+04 47.50 50.24 16 0.00 File_read_at_all 14 2.42e+04 44.47 47.03 16 0.00 File_set_view 29 515 0.95 1.00 16 0.09 File_set_view 3 382 0.70 0.74 16 0.00 ompio: @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% Count COV File_read_at_all 14 3.32e+06 82.83 82.90 16 0.00 File_write_at_all 26 6.72e+05 16.77 16.78 16 0.02 File_set_view 11 1.14e+04 0.28 0.28 16 0.91 File_set_view 29 340 0.01 0.01 16 0.35 with call sites ID Lev File/Address Line Parent_Funct MPI_Call 11 0 H5FDmpio.c 1651 H5FD_mpio_write File_set_view 14 0 H5FDmpio.c 1436 H5FD_mpio_read File_read_at_all 26 0 H5FDmpio.c 1636 H5FD_mpio_write File_write_at_all I also looked at the romio hang in testphdf5. In the absence of a parallel debugger, strace and kill show an endless loop of read(...,"",0) under this: [login2:115045] [ 2] .../mca_io_romio321.so(ADIOI_LUSTRE_ReadContig+0xa8)[0x20003d1cab88] [login2:115045] [ 3] .../mca_io_romio321.so(ADIOI_GEN_ReadStrided+0x528)[0x20003d1e4f08] [login2:115045] [ 4] .../mca_io_romio321.so(ADIOI_GEN_ReadStridedColl+0x1084)[0x20003d1e4514] [login2:115045] [ 5] .../mca_io_romio321.so(MPIOI_File_read_all+0x124)[0x20003d1c37c4] [login2:115045] [ 6] .../mca_io_romio321.so(mca_io_romio_dist_MPI_File_read_at_all+0x34)[0x20003d1c41d4] [login2:115045] [ 7] .../mca_io_romio321.so(mca_io_romio321_file_read_at_all+0x3c)[0x20003d1bdabc] [login2:115045] [ 8] .../libmpi.so.40(PMPI_File_read_at_all+0x13c)[0x20000078de4c]