We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for 
that matter) built with UCX support.   The leak shows up
whether the “ucx” PML is specified for the run or not.  The applications in 
question are arepo and gizmo but it I have no reason to believe
that others are not affected as well.

Basically the MPI processes grow without bound until SLURM kills the job or the 
host memory is exhausted.  
If I configure and build with “--without-ucx” the problem goes away.

I didn’t see anything about this on the UCX github site so I thought I’d ask 
here.  Anyone else seeing the same or similar?

What version of UCX is OpenMPI 3.1.x tested against?

Regards,

Charlie Taylor
UF Research Computing

Details:
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163

Configuration Options:
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" 
LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"

rpmbuild --ba \
         --define '_name openmpi' \
         --define "_version $OMPI_VER" \
         --define "_release ${RELEASE}" \
         --define "_prefix $PREFIX" \
         --define '_mandir %{_prefix}/share/man' \
         --define '_defaultdocdir %{_prefix}' \
         --define 'mflags -j 8' \
         --define 'use_default_rpm_opt_flags 1' \
         --define 'use_check_files 0' \
         --define 'install_shell_scripts 1' \
         --define 'shell_scripts_basename mpivars' \
         --define "configure_options $CFG_OPTS " \
         openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to