Hi everyone!

We’re observing output such as the following when running non-trivial MPI 
software through  SLURM’s srun

 [cn-11:52778] unrecognized payload type 255
[cn-11:52778] base = 0x9ce2c0, proto = 0x9ce2c0, hdr = 0x9ce300
[cn-11:52778]    0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[cn-11:52778]   10: 00 00 00 00 00 00 06 02 ff 0c 1f c2 06 02 ff 0c
[cn-11:52778]   20: b9 8f 08 00 45 00 00 3c 00 00 40 00 08 11 5d 5d
[cn-11:52778]   30: 0a 95 00 16 0a 95 00 15 e5 05 e8 d9 00 28 7c 8c
[cn-11:52778]   40: 01 00 00 00 00 00 31 b6 00 00 8f e3 00 00 00 00
[cn-11:52778]   50: 00 00 00 00 00 00 06 02 ff 0c d3 25 06 02 ff 0c
[cn-11:52778]   60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[cn-11:52778]   70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


It is independent of the software BUT is NOT observable when running with 
mpiexec/mpirun. When switching to the TCP or vader BTL we have clean output and 
the message is not observed. It is output by different ranks on various nodes, 
so not reproducibly the same nodes.

The location of the message seems to be from here[1] 

Any idea how to get rid of this or what might be the root cause? Hints what to 
check for would be greatly appreciated! 
 
TIA!

Petar


Environment:
1.4.0-cisco-1.0.531.1-RHEL7U3
SLURM 17.02.7
OpenMPI 2.0.2 configured with libfabric, usnic, SLURM, SLURM’s PMI library:

./configure --prefix=/software/171020/software/openmpi/2.0.2-gcc-6.3.0-2.27 
--enable-shared --enable-mpi-thread-multiple 
--with-libfabric=/opt/cisco/libfabric --without-memory-manager 
--enable-mpirun-prefix-by-default --enable-mpirun-prefix-by-default 
--with-hwloc=$EBROOTHWLOC --with-usnic --with-verbs-usnic --with-slurm 
--with-pmi=/cm/shared/apps/slurm/current --enable-dlopen  LDFLAGS="-Wl,-rpath 
-Wl,/opt/cisco/libfabric/lib -Wl,--enable-new-dtags"

NIC  UCSC-MLOM-C40Q-03 [VIC 1387]
VIC Firmware  4.1(3a)


[1] 
https://github.com/open-mpi/ompi/blob/9c3ae64297e034b30cb65298908014764216c616/opal/mca/btl/usnic/btl_usnic_recv.c#L354



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to