Hi everyone! We’re observing output such as the following when running non-trivial MPI software through SLURM’s srun
[cn-11:52778] unrecognized payload type 255 [cn-11:52778] base = 0x9ce2c0, proto = 0x9ce2c0, hdr = 0x9ce300 [cn-11:52778] 0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [cn-11:52778] 10: 00 00 00 00 00 00 06 02 ff 0c 1f c2 06 02 ff 0c [cn-11:52778] 20: b9 8f 08 00 45 00 00 3c 00 00 40 00 08 11 5d 5d [cn-11:52778] 30: 0a 95 00 16 0a 95 00 15 e5 05 e8 d9 00 28 7c 8c [cn-11:52778] 40: 01 00 00 00 00 00 31 b6 00 00 8f e3 00 00 00 00 [cn-11:52778] 50: 00 00 00 00 00 00 06 02 ff 0c d3 25 06 02 ff 0c [cn-11:52778] 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [cn-11:52778] 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 It is independent of the software BUT is NOT observable when running with mpiexec/mpirun. When switching to the TCP or vader BTL we have clean output and the message is not observed. It is output by different ranks on various nodes, so not reproducibly the same nodes. The location of the message seems to be from here[1] Any idea how to get rid of this or what might be the root cause? Hints what to check for would be greatly appreciated! TIA! Petar Environment: 1.4.0-cisco-1.0.531.1-RHEL7U3 SLURM 17.02.7 OpenMPI 2.0.2 configured with libfabric, usnic, SLURM, SLURM’s PMI library: ./configure --prefix=/software/171020/software/openmpi/2.0.2-gcc-6.3.0-2.27 --enable-shared --enable-mpi-thread-multiple --with-libfabric=/opt/cisco/libfabric --without-memory-manager --enable-mpirun-prefix-by-default --enable-mpirun-prefix-by-default --with-hwloc=$EBROOTHWLOC --with-usnic --with-verbs-usnic --with-slurm --with-pmi=/cm/shared/apps/slurm/current --enable-dlopen LDFLAGS="-Wl,-rpath -Wl,/opt/cisco/libfabric/lib -Wl,--enable-new-dtags" NIC UCSC-MLOM-C40Q-03 [VIC 1387] VIC Firmware 4.1(3a) [1] https://github.com/open-mpi/ompi/blob/9c3ae64297e034b30cb65298908014764216c616/opal/mca/btl/usnic/btl_usnic_recv.c#L354 _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users