Other debug.

I saw this <https://github.com/open-mpi/ompi/issues/6130> and tried with:
rmaps_base_oversubscribe = 1
rmaps_base_inherit = 1
written in $prefix/etc/openmpi-mca-params.conf of all nodes with no luck.

I think my problem is somehow related to this
https://github.com/open-mpi/ompi/pull/5968
But I couldn't find any documentation on 'SUBSCRIBE_GIVEN'

I've also tried versions 3.0.0 and 4.0.3 with no success.

Version 4.0.3 though got me another error ->
https://github.com/open-mpi/ompi/issues/4555

Process 2 ([[42756,1],0]) is on host: unknown!
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
[houprg118069:17128] *** An error occurred in MPI_Init_thread
[houprg118069:17128] *** reported by process [46912434864130,140733193388032]
[houprg118069:17128] *** on a NULL communicator
[houprg118069:17128] *** Unknown error
[houprg118069:17128] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[houprg118069:17128] ***    and potentially your MPI job)
[houprg118069:17128] [[42756,2],0] ORTE_ERROR_LOG: Unreachable in file
dpm/dpm.c at line 433

Other than this the error is always:
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------

On Wed, Apr 29, 2020 at 2:20 AM carlos aguni <aguni...@gmail.com> wrote:

> Hi all,
>
> I'm trying to MPI_Spawn processes with no success.
> I'm facing the following error:
> =================
> All nodes which are allocated for this job are already filled.
> ==================
>
> I'm setting the hostname as follows:
> MPI_Info_set(minfo, "host", hostname);
>
> I'm already running with `--oversubscribe` flag and I've already tried
> these hostfile:
>
> controller max-slots=1
> client     max-slots=3
> gateway    max-slots=3
> server1    max-slots=41
> server2    max-slots=41
> server3    max-slots=41
> server4    max-slots=41
>
> controller slots=1
> client     slots=3
> gateway    slots=3
> server1    slots=41
> server2    slots=41
> server3    slots=41
> server4    slots=41
>
>  Can anyone help me? Is there a way to force/bypass/disable it?
>  I'm running openmpi3/3.1.4 gnu8/8.3.0 from openhpc.
>
> Regards,
> Carlos.
>

Reply via email to