On Oct 15, 2020, at 3:27 AM, Diego Zuccato <diego.zucc...@unibo.it> wrote:
> 
>>> The version is 3.1.3 , as packaged in Debian Buster.
>> The 3.1.x series is pretty old.  If you want to stay in the 3.1.x
>> series, you might try upgrading to the latest -- 3.1.6.  That has a
>> bunch of bug fixes compared to v3.1.3.
> I'm bound to using distro packages...

That's going to be a bit limiting, I'm afraid.  There definitely were bug fixes 
after 3.1.3; it's possible that you're running into some things that were fixed 
later in the v3.1.x series.

> I don't have the resources to also compile from sources and debug
> interactions between different packages (OMPI, Slurm, OFED... just to
> start, and every one would require an expert).

You're right that it is a bit daunting.  Sorry about that; it's the nature of 
HPC that there is a large, complicated software stack.

FWIW, one [slightly] simpler method may well be to get your distro's Open MPI 
3.1.3 source package and just tweak it to use Open MPI 3.1.6 instead.  I.e., 
let it use the same build dependencies that are already built in to the source 
package, etc.  That would at least get you an Open MPI install that is 
configured/build/installed exactly the same way as your existing 3.1.3 package.

>>> I don't know OpenMPI (or even MPI in general) much. Some time ago, I've
>>> had to add a
>>> mtl = psm2
>>> line to /etc/openmpi/openmpi-mca-params.conf .
>> This implies that you have Infinipath networking on your cluster.

Sidenote that doesn't actually matter, but just to clarify: I should have said 
s/Infinipath/Omnipath/.  :-)

> Actually we have InfiniBand on most of the nodes. All Mellanox cards
> (I've been warned about bad interactions between different vendors),

It definitely is simpler to stick with a single type of networking.

> some ConnectX-3 cards (connected to a 40Gbps switch) and some ConnetX-5
> ones (connected to a 100Gbps switch, linked to the first). The link
> between the two switches is mostly unused, unless for the traffic to the
> Gluster servers, over IPoIB.

You should probably remove the "mtl=psm2" line then.

1. With Open MPI v3.1.x, it's harmless, but misleading.
2. With Open MPI v4.x., it might cause the wrong type of networking plugin to 
be used on your InfiniBand network (which will just result in your MPI jobs 
failing).

Open MPI has a few "MPI engines" for point-to-point communication under the 
covers: "ob1", "cm", and "ucx" are the most notable (in the Open MPI v4.0.x 
series).  I explained this stuff in a recent series of presentations that we 
gave to the community recently.

Check out "The ABCs of Open MPI: Decoding the Alphabet Soup of the Modern HPC 
Ecosystem (Part 2)" 
(https://www.open-mpi.org/video/?category=general#abcs-of-open-mpi-part-2).  
See slides 28-41 in the PDF, or starting at about 47 minutes in 
https://www.youtube.com/watch?v=C4XfxUoSYQs.

The slides are about Open MPI v4.x (where UCX is the preferred IB transport), 
but most of what is discussed is also applicable to the v3.1.x series.  If I 
recall correctly, the one notable difference is that the "openib" BTL is used 
by default for InfiniBand networks in the v3.1.x series (vs. the UCX PML).

>> I can't imagine what installing gdb would do to mask the problem.  Strange.
> Imagine my face when the program started working under gdb, then
> continued even when launched directly with no binary changes... :)

There must be some kind of strange side effect happening here.  Weird.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to