Hello, when we running quite complicated weather code with openmpi (4.0.3) and UCX (from git - 1.9.0) we are getting following error:
mpirun --np 2 --mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx4_0:2 MASTERODB SetMasterThreadsStackSizeBeforeMain() : Master thread's stack size = 307187712 bytes [setrlimit() was not called] SetMasterThreadsStackSizeBeforeMain() : Master thread's stack size = 307191808 bytes [setrlimit() was not called] [forman:44802] pmix: init called [forman:44803] pmix: init called [forman:44803] pmix: executing put for key pmix.hname type 3 [forman:44802] pmix: executing put for key pmix.hname type 3 -------------------------------------------------------------------------- No components were able to be opened in the pml framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: forman Framework: pml -------------------------------------------------------------------------- [forman:44803] PML ucx cannot be selected [forman:44802] PML ucx cannot be selected [forman:44803] pmix:client abort called [forman:44802] pmix:client abort called [forman:44803] pmix:client wait_cbfunc received [forman:44802] pmix:client wait_cbfunc received [forman:44792] 1 more process has sent help message help-mca-base.txt / find-available:none found [forman:44792] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages The most bizarre thing is that if I compile simple fortran mpipong code inside the same environment (same modules, same compiler) everything works as expected. I have no idea where to go further. Any hit would be highly appreciated. Thanks, Jure ----------------------------------------------------------------------- Configure command line: 'F77=ifort' 'FC=ifort' '--enable-mca-no-build=btl-uct' '--prefix=/sw/local/openmpi-4.0.3-intel' '--with-ucx=/sw/local/forman/ucx-1.9.0' '--with-tm=/opt/pbs' '--with-libfabric=/sw/local/forman/libfabric-1.9.0' '--without-verbs' '--with-ofi=no' ucx_info -v # UCT version=1.9.0 revision 66be5c3 # configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=/sw/local/forman/ucx-1.9.0 --with-verbs