Other debug. I saw this <https://github.com/open-mpi/ompi/issues/6130> and tried with: rmaps_base_oversubscribe = 1 rmaps_base_inherit = 1 written in $prefix/etc/openmpi-mca-params.conf of all nodes with no luck.
I think my problem is somehow related to this https://github.com/open-mpi/ompi/pull/5968 But I couldn't find any documentation on 'SUBSCRIBE_GIVEN' I've also tried versions 3.0.0 and 4.0.3 with no success. Version 4.0.3 though got me another error -> https://github.com/open-mpi/ompi/issues/4555 Process 2 ([[42756,1],0]) is on host: unknown! BTLs attempted: self tcp Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_dpm_dyn_init() failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- [houprg118069:17128] *** An error occurred in MPI_Init_thread [houprg118069:17128] *** reported by process [46912434864130,140733193388032] [houprg118069:17128] *** on a NULL communicator [houprg118069:17128] *** Unknown error [houprg118069:17128] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [houprg118069:17128] *** and potentially your MPI job) [houprg118069:17128] [[42756,2],0] ORTE_ERROR_LOG: Unreachable in file dpm/dpm.c at line 433 Other than this the error is always: -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- On Wed, Apr 29, 2020 at 2:20 AM carlos aguni <aguni...@gmail.com> wrote: > Hi all, > > I'm trying to MPI_Spawn processes with no success. > I'm facing the following error: > ================= > All nodes which are allocated for this job are already filled. > ================== > > I'm setting the hostname as follows: > MPI_Info_set(minfo, "host", hostname); > > I'm already running with `--oversubscribe` flag and I've already tried > these hostfile: > > controller max-slots=1 > client max-slots=3 > gateway max-slots=3 > server1 max-slots=41 > server2 max-slots=41 > server3 max-slots=41 > server4 max-slots=41 > > controller slots=1 > client slots=3 > gateway slots=3 > server1 slots=41 > server2 slots=41 > server3 slots=41 > server4 slots=41 > > Can anyone help me? Is there a way to force/bypass/disable it? > I'm running openmpi3/3.1.4 gnu8/8.3.0 from openhpc. > > Regards, > Carlos. >