Hello,

when we running quite complicated weather code with openmpi (4.0.3) and UCX
(from git - 1.9.0)  we are getting following error:

mpirun --np 2 --mca pml ucx  --mca btl ^vader,tcp,openib,uct -x
UCX_NET_DEVICES=mlx4_0:2 MASTERODB
SetMasterThreadsStackSizeBeforeMain() : Master thread's stack size =
307187712 bytes [setrlimit() was not called]
SetMasterThreadsStackSizeBeforeMain() : Master thread's stack size =
307191808 bytes [setrlimit() was not called]
[forman:44802] pmix: init called
[forman:44803] pmix: init called
[forman:44803] pmix: executing put for key pmix.hname type 3
[forman:44802] pmix: executing put for key pmix.hname type 3
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      forman
  Framework: pml
--------------------------------------------------------------------------
[forman:44803] PML ucx cannot be selected
[forman:44802] PML ucx cannot be selected
[forman:44803] pmix:client abort called
[forman:44802] pmix:client abort called
[forman:44803] pmix:client wait_cbfunc received
[forman:44802] pmix:client wait_cbfunc received
[forman:44792] 1 more process has sent help message help-mca-base.txt /
find-available:none found
[forman:44792] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

The most bizarre  thing is that if I compile simple fortran mpipong code
inside the same environment (same modules, same compiler) everything works
as expected.

I have no idea where to go further.

Any hit would be highly appreciated.

Thanks, Jure

-----------------------------------------------------------------------
  Configure command line: 'F77=ifort' 'FC=ifort'
                          '--enable-mca-no-build=btl-uct'
                          '--prefix=/sw/local/openmpi-4.0.3-intel'
                          '--with-ucx=/sw/local/forman/ucx-1.9.0'
                          '--with-tm=/opt/pbs'

'--with-libfabric=/sw/local/forman/libfabric-1.9.0'
                          '--without-verbs' '--with-ofi=no'

ucx_info -v
# UCT version=1.9.0 revision 66be5c3
# configured with: --disable-logging --disable-debug --disable-assertions
--disable-params-check --prefix=/sw/local/forman/ucx-1.9.0 --with-verbs

Reply via email to