Paul --

The help message describes 4 ways to set the number of available slots on your 
machine:

1. Hostfile, via "slots=N" clauses (N defaults to number of
   processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
   hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
   RM is present, Open MPI defaults to the number of processor cores

Here's some examples:

1. Hostfile

$ cat myhosts.txt
localhost slots=12
$ mpirun --hostfile myhosts.txt -np 12 ...

2. Using --host

$ mpirun --host localhost:12 -np 12 ...

3. Setup and run a resource manager for this machine (e.g., SLURM).  This is 
almost certainly not worth it for a single machine, so I won't show an example 
here.

4. If slots were not specified by #1-#3, Open MPI will set the max number of 
slots to the number of hardware resources available.  In this case, it will 
count **cores** -- so if you have 8 cores, the max number of slots will be set 
to 8.  See below for more on this.

Answers to your specific questions are below.


On Nov 8, 2020, at 12:24 AM, Paul Cizmas via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:

Gilles:

Thank you for your reply.  Unfortunately, it did not quite help me.

As I said in my e-mail, I can run this on a Mac by only specifying

$mympirun -np 12  $exe input1

without worrying about “slots”.

So, my questions are:

1. Why do I need “slot” on the Linux?

The Open MPI concept of slots -- and its default values -- is the same on MacOS 
and Linux.  However, the exact mechanism has changed over different versions of 
Open MPI.

What version of Open MPI are you running on your Mac?

2. Is there a relation between slots, sockets, cores and threads?  The 
workstation has 1 socket, 8 cores per socket and 2 threads per core, or 16 
CPUs.  How many slots are there?

There is no fundamental, direct relationship between slots and available 
hardware resources.  They are actually two different concepts:

1. What is the max number of processes that you can run?
2. How many hardware resources do you have?

It is common to run exactly as many MPI processes as you have hardware 
resources (e.g., cores), but that's not required.

That being said, there is an indirect relationship in that if the max number of 
slots are not otherwise specified, Open MPI will set the max number of slots to 
be the number of hardware resources.

Specifically, Open MPI tries to set the max number of slots several different 
ways.  Per the original help message, if you specified hosts in a hostfile, if 
you don't specify a "slots=N" clause, Open MPI defaults to the number of 
*cores* (not hyperthreads) on that host.

Similarly, the last way that Open MPI tries to set the max number of slots -- 
if no other mechanism was used -- is to take the number of *cores* (not 
hyperthreads) as the default max number of slots.

More below.

3. If I need to specify “slot”, what is the syntax?

I tried:

$mympirun -np 12 slots=12 $exe input1

See examples above.

and got:
======================================================
No protocol specified
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 12
slots that were requested by the application:

slots=12

Either request fewer slots for your application, or make more slots
available for use.
======================================================
Finally, I made it work by using

$mympirun -np 12 --use-hwthread-cpus $exe input1

The --use-hwthread-cpus option does a few things.

If you have not specified a max number of slots, and Open MPI therefore uses 
the number of hardware resources as the max number of slots for a host, the 
--use-hwthread-cpus option tells Open MPI to count **hyperthreads** (instead of 
**cores**) as the max number of slots.

Since you have 16 hyperthreads on your host, "mpirun -np 12 --use-hwthread-cpus 
..." works because you still have 4 slots left.

and ignored all the slot options, so I missed the chance to learn about slots.

I did not find an example on how to specify the “slot” although the message 
lists four options - four options but zero examples.

FWIW, I have actually just submitted a PR to update the mpirun(1) man page to 
have a lengthy, detailed explanation of "slots" and other things.  Check out 
this PR:

    https://github.com/open-mpi/ompi/pull/8099

This is currently a PR against v4.0.x, but it will also go into v4.1.x when 
final.

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>

Reply via email to