Il 14/10/20 14:32, Jeff Squyres (jsquyres) ha scritto: >> The version is 3.1.3 , as packaged in Debian Buster. > The 3.1.x series is pretty old. If you want to stay in the 3.1.x > series, you might try upgrading to the latest -- 3.1.6. That has a > bunch of bug fixes compared to v3.1.3. I'm bound to using distro packages... I don't have the resources to also compile from sources and debug interactions between different packages (OMPI, Slurm, OFED... just to start, and every one would require an expert).
>> I don't know OpenMPI (or even MPI in general) much. Some time ago, I've >> had to add a >> mtl = psm2 >> line to /etc/openmpi/openmpi-mca-params.conf . > This implies that you have Infinipath networking on your cluster. Actually we have InfiniBand on most of the nodes. All Mellanox cards (I've been warned about bad interactions between different vendors), some ConnectX-3 cards (connected to a 40Gbps switch) and some ConnetX-5 ones (connected to a 100Gbps switch, linked to the first). The link between the two switches is mostly unused, unless for the traffic to the Gluster servers, over IPoIB. > I can't imagine what installing gdb would do to mask the problem. Strange. Imagine my face when the program started working under gdb, then continued even when launched directly with no binary changes... :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786