[OMPI users] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application
*Two machines, each with 64 cores. The contents of the hosts file are:* 192.168.180.48 slots=1 192.168.60.203 slots=1 *Why do you get the following error when running with openmpi 5.0.0rc9?* (py3.9) [user@machine01 share]$ mpirun -n 2 --machinefile hosts hostname -- There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: hostname Either request fewer procs for your application, or make more slots available for use. A "slot" is the PRRTE term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which PRRTE processes are run: 1. Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided) 2. The --host command line parameter, via a ":N" suffix on the hostname (N defaults to 1 if not provided) 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) 4. If none of a hostfile, the --host command line parameter, or an RM is present, PRRTE defaults to the number of processor cores In all the above cases, if you want PRRTE to default to the number of hardware threads instead of the number of processor cores, use the --use-hwthread-cpus option. Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the number of available slots when deciding the number of processes to launch.
[OMPI users] [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma, basesmuma, ucx_p2p:basesmsocket, basesmuma, p2p
The execution of openmpi 5.0.0rc9 results in the following: (py3.9) [user@machine01 share]$ mpirun -n 2 python test.py [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error Why is this message printed?
Re: [OMPI users] [EXTERNAL] OFI, destroy_vni_context(1137).......: OFI domain close failed (ofi_init.c:1137:destroy_vni_context:Device or resource busy)
Thanks, what you said seems to be right, I just checked and solved it. It might be caused by a conflict between openmpi and mpich library. 在 2022/11/2 02:06, Pritchard Jr., Howard 写道: HI, You are using MPICH or a vendor derivative of MPICH. You probably want to resend this email to the mpich users/help mail list. Howard *From: *users on behalf of mrlong via users *Reply-To: *Open MPI Users *Date: *Tuesday, November 1, 2022 at 11:26 AM *To: *"de...@lists.open-mpi.org" , "users@lists.open-mpi.org" *Cc: *mrlong *Subject: *[EXTERNAL] [OMPI users] OFI, destroy_vni_context(1137)...: OFI domain close failed (ofi_init.c:1137:destroy_vni_context:Device or resource busy) Hi, teachers code: import mpi4py import time import numpy as np from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank() print("rank",rank) if __name__ == '__main__': if rank == 0: mem = np.array([0], dtype='i') win = MPI.Win.Create(mem, comm=comm) else: win = MPI.Win.Create(None, comm=comm) print(rank, "end") (py3.6.8) ➜ ~ mpirun -n 2 python -u test.py <https://urldefense.com/v3/__http:/test.py__;!!Bt8fGhp8LhKGRg!EpS4l-5_ADRkiOPiRrqKHV_deuvAYDui9_niJetq7MR6TwaQ5cLC_akDsMLZGdFmPOtiSFaby1mi2zqnczR1$> rank 0 rank 1 0 end 1 end Abort(806449679): Fatal error in internal_Finalize: Other MPI error, error stack: internal_Finalize(50)...: MPI_Finalize failed MPII_Finalize(345)..: MPID_Finalize(511)..: MPIDI_OFI_mpi_finalize_hook(895): destroy_vni_context(1137)...: OFI domain close failed (ofi_init.c:1137:destroy_vni_context:Device or resource busy) *Why is this happening? How to debug? This error is not reported on the other machine.*
[OMPI users] OFI, destroy_vni_context(1137).......: OFI domain close failed (ofi_init.c:1137:destroy_vni_context:Device or resource busy)
Hi, teachers code: import mpi4py import time import numpy as np from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank() print("rank",rank) if __name__ == '__main__': if rank == 0: mem = np.array([0], dtype='i') win = MPI.Win.Create(mem, comm=comm) else: win = MPI.Win.Create(None, comm=comm) print(rank, "end") (py3.6.8) ➜ ~ mpirun -n 2 python -u test.py rank 0 rank 1 0 end 1 end Abort(806449679): Fatal error in internal_Finalize: Other MPI error, error stack: internal_Finalize(50)...: MPI_Finalize failed MPII_Finalize(345)..: MPID_Finalize(511)..: MPIDI_OFI_mpi_finalize_hook(895): destroy_vni_context(1137)...: OFI domain close failed (ofi_init.c:1137:destroy_vni_context:Device or resource busy) *Why is this happening? How to debug? This error is not reported on the other machine.*
[OMPI users] --mca btl_base_verbose 30 not working in version 5.0
mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 2 --machinefile hostfile hostname Why this sentence does not print IP addresses are routable in openmpi 5.0.0.rc9?
[OMPI users] Open MPI 5.0.0rc8 failure but 4.1.4version work well
Two machines. A: 192.168.180.48 B: 192.168.60.203 The hostfile content is 192.168.60.203 slots=2 1. using openmpi 4.1.4, execute "mpirun -n 2 --machinefile hostfile hostname" on machine A. The hostname of B is printed correctly. 2. However, using openmpi 5.0.0rc8, the result on machine A is $mpirun -n 2 --machinefile hostfile hostname -- All nodes which are allocated for this job are already filled. -- -- -- --- Why is this happening?