Mark Dixon via users <users@lists.open-mpi.org> writes:

> Surely I cannot be the only one who cares about using a recent openmpi
> with hdf5 on lustre?

I generally have similar concerns.  I dug out the romio tests, assuming
something more basic is useful.  I ran them with ompi 4.0.5+ucx on
Mark's lustre system (similar to a few nodes of Summit, apart from the
filesystem, but with quad-rail IB which doesn't give the bandwidth I
expected).

The perf test says romio performs a bit better.  Also -- from overall
time -- it's faster on IMB-IO (which I haven't looked at in detail, and
ran with suboptimal striping).

  Test: perf
  romio321
  Access size per process = 4194304 bytes, ntimes = 5
  Write bandwidth without file sync = 19317.372354 Mbytes/sec
  Read bandwidth without prior file sync = 35033.325451 Mbytes/sec
  Write bandwidth including file sync = 1081.096713 Mbytes/sec
  Read bandwidth after file sync = 47135.349155 Mbytes/sec
  ompio
  Access size per process = 4194304 bytes, ntimes = 5
  Write bandwidth without file sync = 18442.698536 Mbytes/sec
  Read bandwidth without prior file sync = 31958.198676 Mbytes/sec
  Write bandwidth including file sync = 1081.058583 Mbytes/sec
  Read bandwidth after file sync = 31506.854710 Mbytes/sec

However, romio coll_perf fails as follows, and ompio runs.  Isn't there
mpi-io regression testing?

  [gpu025:89063:0:89063] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x1fffbc000010)
  ==== backtrace (tid:  89063) ====
   0 0x000000000005453c ucs_debug_print_backtrace()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucs/debug/debug.c:656
   1 0x0000000000041b04 ucp_rndv_pack_data()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1335
   2 0x000000000001c814 uct_self_ep_am_bcopy()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:278
   3 0x000000000003f7ac uct_ep_am_bcopy()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/api/uct.h:2561
   4 0x000000000003f7ac ucp_do_am_bcopy_multi()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/proto/proto_am.inl:79
   5 0x000000000003f7ac ucp_rndv_progress_am_bcopy()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1352
   6 0x0000000000041cb8 ucp_request_try_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:223
   7 0x0000000000041cb8 ucp_request_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:258
   8 0x0000000000041cb8 ucp_rndv_rtr_handler()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1754
   9 0x000000000001c984 uct_iface_invoke_am()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/base/uct_iface.h:635
  10 0x000000000001c984 uct_self_iface_sendrecv_am()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:149
  11 0x000000000001c984 uct_self_ep_am_short()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:262
  12 0x000000000002ee30 uct_ep_am_short()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/api/uct.h:2549
  13 0x000000000002ee30 ucp_do_am_single()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/proto/proto_am.c:68
  14 0x0000000000042908 ucp_proto_progress_rndv_rtr()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:172
  15 0x000000000003f4c4 ucp_request_try_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:223
  16 0x000000000003f4c4 ucp_request_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:258
  17 0x000000000003f4c4 ucp_rndv_req_send_rtr()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:423
  18 0x0000000000045214 ucp_rndv_matched()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1262
  19 0x0000000000046158 ucp_rndv_process_rts()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1280
  20 0x0000000000046268 ucp_rndv_rts_handler()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1304
  21 0x000000000001c984 uct_iface_invoke_am()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/base/uct_iface.h:635
  22 0x000000000001c984 uct_self_iface_sendrecv_am()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:149
  23 0x000000000001c984 uct_self_ep_am_short()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:262
  24 0x000000000002ee30 uct_ep_am_short()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/api/uct.h:2549
  25 0x000000000002ee30 ucp_do_am_single()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/proto/proto_am.c:68
  26 0x000000000003f430 ucp_proto_progress_rndv_rts()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:125
  27 0x0000000000049df4 ucp_request_try_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:223
  28 0x0000000000049df4 ucp_request_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:258
  29 0x0000000000049df4 ucp_tag_send_req()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/tag_send.c:124
  30 0x0000000000049df4 ucp_tag_send_nbx()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/tag_send.c:289
  31 0x0000000000007170 mca_pml_ucx_isend()  ???:0
  32 0x00000000000adf94 MPI_Isend()  ???:0
  33 0x000000000001cfbc ADIOI_LUSTRE_WriteStridedColl()  ???:0
  34 0x0000000000018318 MPIOI_File_write_all()  ???:0
  35 0x00000000000184c8 mca_io_romio_dist_MPI_File_write_all()  ???:0
  36 0x000000000000f1fc mca_io_romio321_file_write_all()  ???:0
  37 0x00000000000a0e7c MPI_File_write_all()  ???:0
  38 0x00000000100017a0 main()  
/users/***/lustre/openmpi-4.0.5/ompi/mca/io/romio321/romio/test/coll_perf.c:97
  39 0x0000000000025200 generic_start_main.isra.0()  libc-start.c:0
  40 0x00000000000253f4 __libc_start_main()  ???:0

Reply via email to