Il 14/10/20 14:32, Jeff Squyres (jsquyres) ha scritto:

>> The version is 3.1.3 , as packaged in Debian Buster.
> The 3.1.x series is pretty old.  If you want to stay in the 3.1.x
> series, you might try upgrading to the latest -- 3.1.6.  That has a
> bunch of bug fixes compared to v3.1.3.
I'm bound to using distro packages...
I don't have the resources to also compile from sources and debug
interactions between different packages (OMPI, Slurm, OFED... just to
start, and every one would require an expert).

>> I don't know OpenMPI (or even MPI in general) much. Some time ago, I've
>> had to add a
>> mtl = psm2
>> line to /etc/openmpi/openmpi-mca-params.conf .
> This implies that you have Infinipath networking on your cluster.
Actually we have InfiniBand on most of the nodes. All Mellanox cards
(I've been warned about bad interactions between different vendors),
some ConnectX-3 cards (connected to a 40Gbps switch) and some ConnetX-5
ones (connected to a 100Gbps switch, linked to the first). The link
between the two switches is mostly unused, unless for the traffic to the
Gluster servers, over IPoIB.

> I can't imagine what installing gdb would do to mask the problem.  Strange.
Imagine my face when the program started working under gdb, then
continued even when launched directly with no binary changes... :)

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Reply via email to