There is a typo in your command line.
You should use --mca (minus minus) instead of -mca

Also, you can try --machinefile instead of -machinefile

Cheers,

Gilles

There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

  –mca

On Mon, Nov 14, 2022 at 11:04 AM timesir via users <users@lists.open-mpi.org>
wrote:

> *(py3.9) ➜  /share  mpirun -n 2 -machinefile hosts –mca rmaps_base_verbose
> 100 --mca ras_base_verbose 100  which mpirun*
> [computer01:04570] mca: base: component_find: searching NULL for ras
> components
> [computer01:04570] mca: base: find_dyn_components: checking NULL for ras
> components
> [computer01:04570] pmix:mca: base: components_register: registering
> framework ras components
> [computer01:04570] pmix:mca: base: components_register: found loaded
> component simulator
> [computer01:04570] pmix:mca: base: components_register: component
> simulator register function successful
> [computer01:04570] pmix:mca: base: components_register: found loaded
> component pbs
> [computer01:04570] pmix:mca: base: components_register: component pbs
> register function successful
> [computer01:04570] pmix:mca: base: components_register: found loaded
> component slurm
> [computer01:04570] pmix:mca: base: components_register: component slurm
> register function successful
> [computer01:04570] mca: base: components_open: opening ras components
> [computer01:04570] mca: base: components_open: found loaded component
> simulator
> [computer01:04570] mca: base: components_open: found loaded component pbs
> [computer01:04570] mca: base: components_open: component pbs open function
> successful
> [computer01:04570] mca: base: components_open: found loaded component slurm
> [computer01:04570] mca: base: components_open: component slurm open
> function successful
> [computer01:04570] mca:base:select: Auto-selecting ras components
> [computer01:04570] mca:base:select:(  ras) Querying component [simulator]
> [computer01:04570] mca:base:select:(  ras) Querying component [pbs]
> [computer01:04570] mca:base:select:(  ras) Querying component [slurm]
> [computer01:04570] mca:base:select:(  ras) No component selected!
>
> ======================   ALLOCATED NODES
> ======================
> [10/1444]
>     computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
>         Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
>         aliases: 192.168.180.48
>     192.168.60.203: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>         Flags: SLOTS_GIVEN
>         aliases: NONE
> =================================================================
>
> ======================   ALLOCATED NODES   ======================
>     computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
>         Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
>         aliases: 192.168.180.48
>     hepslustretest03: slots=1 max_slots=0 slots_inuse=0 state=UP
>         Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
>         aliases: 192.168.60.203,172.17.180.203,172.168.10.23,172.168.10.143
> =================================================================
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 2
> slots that were requested by the application:
>
>   –mca
>
> Either request fewer procs for your application, or make more slots
> available for use.
>
> A "slot" is the PRRTE term for an allocatable unit where we can
> launch a process.  The number of slots available are defined by the
> environment in which PRRTE processes are run:
>
>   1. Hostfile, via "slots=N" clauses (N defaults to number of
>      processor cores if not provided)
>   2. The --host command line parameter, via a ":N" suffix on the
>      hostname (N defaults to 1 if not provided)
>   3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
>   4. If none of a hostfile, the --host command line parameter, or an
>      RM is present, PRRTE defaults to the number of processor cores
>
> In all the above cases, if you want PRRTE to default to the number
> of hardware threads instead of the number of processor cores, use the
> --use-hwthread-cpus option.
>
> Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
> number of available slots when deciding the number of processes to
> launch.
> --------------------------------------------------------------------------
>
>
>
> 在 2022/11/13 23:42, Jeff Squyres (jsquyres) 写道:
>
> Interesting.  It says:
>
> [computer01:106117] AVAILABLE NODES FOR MAPPING:
> [computer01:106117] node: computer01 daemon: 0 slots_available: 1
>
> This is why it tells you you're out of slots: you're asking for 2, but it
> only found 1.  This means it's not seeing your hostfile somehow.
>
> I should have asked you to run with *2*​ variables last time -- can you
> re-run with "mpirun --mca rmaps_base_verbose 100 --mca ras_base_verbose 100
> ..."?
>
> Turning on the RAS verbosity should show us what the hostfile component is
> doing.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> ------------------------------
> *From:* 龙龙 <mrlong...@gmail.com> <mrlong...@gmail.com>
> *Sent:* Sunday, November 13, 2022 3:13 AM
> *To:* Jeff Squyres (jsquyres) <jsquy...@cisco.com> <jsquy...@cisco.com>;
> Open MPI Users <users@lists.open-mpi.org> <users@lists.open-mpi.org>
> *Subject:* Re: [OMPI devel] There are not enough slots available in the
> system to satisfy the 2, slots that were requested by the application
>
>
> *(py3.9) ➜ /share mpirun –version*
>
> mpirun (Open MPI) 5.0.0rc9
>
> Report bugs to https://www.open-mpi.org/community/help/
>
> *(py3.9) ➜ /share cat hosts*
>
> 192.168.180.48 slots=1
> 192.168.60.203 slots=1
>
> *(py3.9) ➜ /share mpirun -n 2 -machinefile hosts –mca rmaps_base_verbose
> 100 which mpirun*
>
> [computer01:106117] mca: base: component_find: searching NULL for rmaps
> components
> [computer01:106117] mca: base: find_dyn_components: checking NULL for
> rmaps components
> [computer01:106117] pmix:mca: base: components_register: registering
> framework rmaps components
> [computer01:106117] pmix:mca: base: components_register: found loaded
> component ppr
> [computer01:106117] pmix:mca: base: components_register: component ppr
> register function successful
> [computer01:106117] pmix:mca: base: components_register: found loaded
> component rank_file
> [computer01:106117] pmix:mca: base: components_register: component
> rank_file has no register or open function
> [computer01:106117] pmix:mca: base: components_register: found loaded
> component round_robin
> [computer01:106117] pmix:mca: base: components_register: component
> round_robin register function successful
> [computer01:106117] pmix:mca: base: components_register: found loaded
> component seq
> [computer01:106117] pmix:mca: base: components_register: component seq
> register function successful
> [computer01:106117] mca: base: components_open: opening rmaps components
> [computer01:106117] mca: base: components_open: found loaded component ppr
> [computer01:106117] mca: base: components_open: component ppr open
> function successful
> [computer01:106117] mca: base: components_open: found loaded component
> rank_file
> [computer01:106117] mca: base: components_open: found loaded component
> round_robin
> [computer01:106117] mca: base: components_open: component round_robin open
> function successful
> [computer01:106117] mca: base: components_open: found loaded component seq
> [computer01:106117] mca: base: components_open: component seq open
> function successful
> [computer01:106117] mca:rmaps:select: checking available component ppr
> [computer01:106117] mca:rmaps:select: Querying component [ppr]
> [computer01:106117] mca:rmaps:select: checking available component
> rank_file
> [computer01:106117] mca:rmaps:select: Querying component [rank_file]
> [computer01:106117] mca:rmaps:select: checking available component
> round_robin
> [computer01:106117] mca:rmaps:select: Querying component [round_robin]
> [computer01:106117] mca:rmaps:select: checking available component seq
> [computer01:106117] mca:rmaps:select: Querying component [seq]
> [computer01:106117] [prterun-computer01-106117@0,0]: Final mapper
> priorities
> [computer01:106117] Mapper: ppr Priority: 90
> [computer01:106117] Mapper: seq Priority: 60
> [computer01:106117] Mapper: round_robin Priority: 10
> [computer01:106117] Mapper: rank_file Priority: 0
> [computer01:106117] mca:rmaps: mapping job prterun-computer01-106117@1
>
> [computer01:106117] mca:rmaps: setting mapping policies for job
> prterun-computer01-106117@1 inherit TRUE hwtcpus FALSE [9/1957]
> [computer01:106117] mca:rmaps[358] mapping not given - using bycore
> [computer01:106117] setdefaultbinding[365] binding not given - using bycore
> [computer01:106117] mca:rmaps:ppr: job prterun-computer01-106117@1 not
> using ppr mapper PPR NULL policy PPR NOTSET
> [computer01:106117] mca:rmaps:seq: job prterun-computer01-106117@1 not
> using seq mapper
> [computer01:106117] mca:rmaps:rr: mapping job prterun-computer01-106117@1
> [computer01:106117] AVAILABLE NODES FOR MAPPING:
> [computer01:106117] node: computer01 daemon: 0 slots_available: 1
> [computer01:106117] mca:rmaps:rr: mapping by Core for job
> prterun-computer01-106117@1 slots 1 num_procs 2
> ------------------------------
>
> There are not enough slots available in the system to satisfy the 2
> slots that were requested by the application:
>
> which
>
> Either request fewer procs for your application, or make more slots
> available for use.
>
> A “slot” is the PRRTE term for an allocatable unit where we can
> launch a process. The number of slots available are defined by the
> environment in which PRRTE processes are run:
>
>    1. Hostfile, via “slots=N” clauses (N defaults to number of
>    processor cores if not provided)
>    2. The –host command line parameter, via a “:N” suffix on the
>    hostname (N defaults to 1 if not provided)
>    3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
>    4. If none of a hostfile, the –host command line parameter, or an
>    RM is present, PRRTE defaults to the number of processor cores
>
> In all the above cases, if you want PRRTE to default to the number
> of hardware threads instead of the number of processor cores, use the
> –use-hwthread-cpus option.
>
> Alternatively, you can use the –map-by :OVERSUBSCRIBE option to ignore the
> number of available slots when deciding the number of processes to
> launch.
> ------------------------------
> 在 2022/11/8 05:46, Jeff Squyres (jsquyres) 写道:
>
> In the future, can you please just mail one of the lists?  This particular
> question is probably more of a users type of question (since we're not
> talking about the internals of Open MPI itself), so I'll reply just on the
> users list.
>
> For what it's worth, I'm unable to replicate your error:
>
> $ mpirun --version
>
> mpirun (Open MPI) 5.0.0rc9
>
>
> Report bugs to https://www.open-mpi.org/community/help/
> $ cat hostfile
>
> mpi002 slots=1
>
> mpi005 slots=1
>
> $ mpirun -n 2 --machinefile hostfile hostname
>
> mpi002
>
> mpi005
>
> Can you try running with "--mca rmaps_base_verbose 100" so that we can get
> some debugging output and see why the slots aren't working for you?  Show
> the full output, like I did above (e.g., cat the hostfile, and then mpirun
> with the MCA param and all the output).  Thanks!
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> ------------------------------
> *From:* devel <devel-boun...@lists.open-mpi.org>
> <devel-boun...@lists.open-mpi.org> on behalf of mrlong via devel
> <de...@lists.open-mpi.org> <de...@lists.open-mpi.org>
> *Sent:* Monday, November 7, 2022 3:37 AM
> *To:* de...@lists.open-mpi.org <de...@lists.open-mpi.org>
> <de...@lists.open-mpi.org>; Open MPI Users <users@lists.open-mpi.org>
> <users@lists.open-mpi.org>
> *Cc:* mrlong <mrlong...@gmail.com> <mrlong...@gmail.com>
> *Subject:* [OMPI devel] There are not enough slots available in the
> system to satisfy the 2, slots that were requested by the application
>
>
> *Two machines, each with 64 cores. The contents of the hosts file are:*
>
> 192.168.180.48 slots=1
> 192.168.60.203 slots=1
> *Why do you get the following error when running with openmpi 5.0.0rc9?*
>
> (py3.9) [user@machine01 share]0.5692263713929891nbsp; mpirun -n 2
> --machinefile hosts hostname
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 2
> slots that were requested by the application:
>
>   hostname
>
> Either request fewer procs for your application, or make more slots
> available for use.
>
> A "slot" is the PRRTE term for an allocatable unit where we can
> launch a process.  The number of slots available are defined by the
> environment in which PRRTE processes are run:
>
>   1. Hostfile, via "slots=N" clauses (N defaults to number of
>      processor cores if not provided)
>   2. The --host command line parameter, via a ":N" suffix on the
>      hostname (N defaults to 1 if not provided)
>   3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
>   4. If none of a hostfile, the --host command line parameter, or an
>      RM is present, PRRTE defaults to the number of processor cores
>
> In all the above cases, if you want PRRTE to default to the number
> of hardware threads instead of the number of processor cores, use the
> --use-hwthread-cpus option.
>
> Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
> number of available slots when deciding the number of processes to
> launch.
>
>
>

Reply via email to