Re: [OMPI users] Hardware topology influence

2022-09-14 Thread Jeff Squyres (jsquyres) via users
It was pointed out to me off-list that I should update my worldview on HPC in 
VMs.  :-)

So let me clarify my remarks about VMs: yes, many organizations run bare-metal 
HPC environments.  However, it is no longer unusual to run HPC in VMs.  Using 
modern VM technology, especially when tuned for HPC workloads (e.g., bind each 
vCPU to a physical CPU), VMs can effect quite low overheard these days.  There 
are many benefits to running virtualized environments, and those are no longer 
off-limits to HPC workloads.  Indeed, VM overheads may be outweighed by other 
benefits of running in VM-based environments.

That being said, I'm not encouraging you to run 96 VMs on a single host, for 
example.  I have not done any VM testing myself, but I imagine that the same 
adage that applies to HPC bare metal environments also applies to HPC VM 
environments: let Open MPI use shared memory to communicate (vs. a network) 
whenever possible.  In your environment, this likely translates to having a 
single VM per host (encompassing all the physical CPUs that you want to use on 
that host) and launching N_x MPI processes in each VM (where N_x is the number 
of vCPU/physical CPUs available in VM x).  This will allow the MPI processes to 
use shared memory for on-node communication.

--
Jeff Squyres
jsquy...@cisco.com

From: Jeff Squyres (jsquyres) 
Sent: Tuesday, September 13, 2022 10:08 AM
To: Open MPI Users 
Cc: Gilles Gouaillardet 
Subject: Re: [OMPI users] Hardware topology influence

Let me add a little more color on what Gilles stated.

First, you should probably upgrade to the latest v4.1.x release: v4.1.4.  It 
has a bunch of bug fixes compared to v4.1.0.

Second, you should know that it is relatively uncommon to run HPC/MPI apps 
inside VMs because the virtualization infrastructure will -- by definition -- 
decrease your overall performance.  This is usually counter to the goal of 
writing/running HPC applications.  If you do run HPC/MPI applications in VMs, 
it is strongly recommended that you bind the cores in the VM to physical cores 
to attempt to minimize the performance loss.

By default, Open MPI maps MPI processes by core when deciding how many 
processes to place on each machine (and also deciding how to bind them).  For 
example, Open MPI looks at a machine and sees that it has N cores, and (by 
default) maps N MPI processes to that machine.  You can change Open MPI's 
defaults to map by hardware thread ("Hyperthread" in Interl parlance) instead 
of by core, but conventional knowledge is that math-heavy processes don't 
perform well with the limited resources of a single hardware thread, and 
benefit from the full resources of the core (this depends on your specific app, 
of course -- YMMV).  Intel's and AMD's hardware threads have gotten better over 
the years, but I think they still represent a division of resources in the 
core, and will likely still be performance-detrimental to at least some classes 
of HPC applications.  It's a surprisingly complicated topic.

In the v4.x series, note that you can use "mpirun --report-bindings ..." to see 
exactly where Open MPI thinks it has bound each process.  Note that this 
binding occurs before each MPI process starts; it's nothing that the 
application itself needs to do.

--
Jeff Squyres
jsquy...@cisco.com

From: users  on behalf of Gilles Gouaillardet 
via users 
Sent: Tuesday, September 13, 2022 9:07 AM
To: Open MPI Users 
Cc: Gilles Gouaillardet 
Subject: Re: [OMPI users] Hardware topology influence

Lucas,

the number of MPI tasks started by mpirun is either
 - explicitly passed via the command line (e.g. mpirun -np 2306 ...)
 - equals to the number of available slots, and this value is either
 a) retrieved from the resource manager (such as a SLURM allocation)
 b) explicitly set in a machine file (e.g. mpirun -machinefile 
 ...) or the command line
 (e.g. mpirun --hosts host0:96,host1:96 ...)
 c) if none of the above is set, the number of detected cores on the system

Cheers,

Gilles

On Tue, Sep 13, 2022 at 9:23 PM Lucas Chaloyard via users 
mailto:users@lists.open-mpi.org>> wrote:
Hello,

I'm working as a research intern in a lab where we're studying virtualization.
And I've been working with several benchmarks using OpenMPI 4.1.0 (ASKAP, GPAW 
and Incompact3d from Phrononix Test suite).

To briefly explain my experiments, I'm running those benchmarks on several 
virtual machines using different topologies.
During one experiment I've been comparing those two topologies :
- Topology1 : 96 vCPUS divided in 96 sockets containing 1 threads
- Topology2 : 96 vCPUS divided in 48 sockets containing 2 threads (usage of 
hyperthreading)

For the ASKAP Benchmark :
- While using Topology2, 2306 processes will be created by the application to 
do its work.
- While using Topology1, 4612 processes will be created by the application to 
do its work.
This is also happening when running

Re: [OMPI users] MPI_THREAD_MULTIPLE question

2022-09-14 Thread Barrett, Brian via users
Yes, this is the case for Open MPI 4.x and earlier, due to various bugs.  When 
Open MPI 5.0 ships, we will resolve this issue.

Brian

On 9/9/22, 9:58 PM, "users on behalf of mrlong336 via users" 
mailto:users-boun...@lists.open-mpi.org> on 
behalf of users@lists.open-mpi.org> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


mpirun reports the following error:
The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
Workarounds are to run on a single node, or to use a system with an RDMA
capable network such as Infiniband.

Does this error mean that the network must support RDMA if it wants to run 
distributed? Will Gigabit/10 Gigabit Ethernet work?



Best regards,

Timesir

mrlong...@gmail.com