As a check of mpiP, I ran HDF5 testpar/t_bigio under it.  This was on
one node with four ranks (interactively) on lustre with its default of
one 1MB stripe, ompi-4.0.5 + ucx-1.9, hdf5-1.10.7, MCA defaults.

I don't know how useful it is, but here's the summary:

romio:

  @--- Aggregate Time (top twenty, descending, milliseconds) ----------------
  ---------------------------------------------------------------------------
  Call                 Site       Time    App%    MPI%      Count    COV
  File_write_at_all      26   2.58e+04   47.50   50.24         16   0.00
  File_read_at_all       14   2.42e+04   44.47   47.03         16   0.00
  File_set_view          29        515    0.95    1.00         16   0.09
  File_set_view           3        382    0.70    0.74         16   0.00

ompio:

  @--- Aggregate Time (top twenty, descending, milliseconds) ----------------
  ---------------------------------------------------------------------------
  Call                 Site       Time    App%    MPI%      Count    COV
  File_read_at_all       14   3.32e+06   82.83   82.90         16   0.00
  File_write_at_all      26   6.72e+05   16.77   16.78         16   0.02
  File_set_view          11   1.14e+04    0.28    0.28         16   0.91
  File_set_view          29        340    0.01    0.01         16   0.35

with call sites

   ID Lev File/Address        Line Parent_Funct                    MPI_Call
  
   11   0 H5FDmpio.c          1651 H5FD_mpio_write                 File_set_view
   14   0 H5FDmpio.c          1436 H5FD_mpio_read                  
File_read_at_all
   26   0 H5FDmpio.c          1636 H5FD_mpio_write                 
File_write_at_all

I also looked at the romio hang in testphdf5.  In the absence of a
parallel debugger, strace and kill show an endless loop of read(...,"",0)
under this:

  [login2:115045] [ 2] 
.../mca_io_romio321.so(ADIOI_LUSTRE_ReadContig+0xa8)[0x20003d1cab88]
  [login2:115045] [ 3] 
.../mca_io_romio321.so(ADIOI_GEN_ReadStrided+0x528)[0x20003d1e4f08]
  [login2:115045] [ 4] 
.../mca_io_romio321.so(ADIOI_GEN_ReadStridedColl+0x1084)[0x20003d1e4514]
  [login2:115045] [ 5] 
.../mca_io_romio321.so(MPIOI_File_read_all+0x124)[0x20003d1c37c4]
  [login2:115045] [ 6] 
.../mca_io_romio321.so(mca_io_romio_dist_MPI_File_read_at_all+0x34)[0x20003d1c41d4]
  [login2:115045] [ 7] 
.../mca_io_romio321.so(mca_io_romio321_file_read_at_all+0x3c)[0x20003d1bdabc]
  [login2:115045] [ 8] 
.../libmpi.so.40(PMPI_File_read_at_all+0x13c)[0x20000078de4c]

Reply via email to