Hello Ralph,

Is there any update on this?

Thanks,
Adam LeBlanc

On Fri, Nov 2, 2018 at 11:06 AM Adam LeBlanc <alebl...@iol.unh.edu> wrote:

> Hello Ralph,
>
> When I do the -np 7 it still fails with "There are not enough slots
> available in the system to satisfy the 7 slots that were requested by the
> application", but when I do -np 2 it will actually run from a machine that
> was failing but will only run on one other machine and in this case it ran
> from a machine with 2 processors to a machine with only 1 processor. If I
> try to make -np higher then 2 it will also fail.
>
> -Adam LeBlanc
>
> On Thu, Nov 1, 2018 at 3:53 PM Ralph H Castain <r...@open-mpi.org> wrote:
>
>> Hmmm - try adding a value for nprocs instead of leaving it blank. Say
>> ā€œ-np 7ā€
>>
>> Sent from my iPhone
>>
>> On Nov 1, 2018, at 11:56 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote:
>>
>> Hello Ralph,
>>
>> Here is the output for a failing machine:
>>
>> [130_02:44:13_aleblanc@farbauti]{~}$ > mpirun --mca
>> btl_openib_warn_no_device_params_found 0 --mca orte_base_help_aggregate 0
>> --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_receive_queues
>> P,65536,120,64,32 -hostfile /home/soesterreich/ce-mpi-hosts --mca
>> ras_base_verbose 5 IMB-MPI1
>>
>> ======================   ALLOCATED NODES   ======================
>> farbauti: flags=0x11 slots=1 max_slots=0 slots_inuse=0 state=UP
>> hyperion-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> io-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> jarnsaxa-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> rhea-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> tarqeq-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> tarvos-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> =================================================================
>> --------------------------------------------------------------------------
>> There are not enough slots available in the system to satisfy the 7 slots
>> that were requested by the application:
>>   10
>>
>> Either request fewer slots for your application, or make more slots
>> available
>> for use.
>> --------------------------------------------------------------------------
>>
>>
>> Here is an output of a passing machine:
>>
>> [1_02:54:26_aleblanc@hyperion]{~}$ > mpirun --mca
>> btl_openib_warn_no_device_params_found 0 --mca orte_base_help_aggregate 0
>> --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_receive_queues
>> P,65536,120,64,32 -hostfile /home/soesterreich/ce-mpi-hosts --mca
>> ras_base_verbose 5 IMB-MPI1
>>
>> ======================   ALLOCATED NODES   ======================
>> hyperion: flags=0x11 slots=1 max_slots=0 slots_inuse=0 state=UP
>> farbauti-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> io-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> jarnsaxa-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> rhea-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> tarqeq-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> tarvos-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
>> =================================================================
>>
>>
>> Yes the hostfile is available on all nodes through an NFS mount for all
>> of our home directories.
>>
>> On Thu, Nov 1, 2018 at 2:44 PM Adam LeBlanc <alebl...@iol.unh.edu> wrote:
>>
>>>
>>>
>>> ---------- Forwarded message ---------
>>> From: Ralph H Castain <r...@open-mpi.org>
>>> Date: Thu, Nov 1, 2018 at 2:34 PM
>>> Subject: Re: [OMPI users] Bug with Open-MPI Processor Count
>>> To: Open MPI Users <users@lists.open-mpi.org>
>>>
>>>
>>> Iā€™m a little under the weather and so will only be able to help a bit at
>>> a time. However, a couple of things to check:
>>>
>>> * add -mca ras_base_verbose 5 to the cmd line to see what mpirun thought
>>> the allocation was
>>>
>>> * is the hostfile available on every node?
>>>
>>> Ralph
>>>
>>> On Nov 1, 2018, at 10:55 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote:
>>>
>>> Hello Ralph,
>>>
>>> Attached below is the verbose output for a failing machine and a passing
>>> machine.
>>>
>>> Thanks,
>>> Adam LeBlanc
>>>
>>> On Thu, Nov 1, 2018 at 1:41 PM Adam LeBlanc <alebl...@iol.unh.edu>
>>> wrote:
>>>
>>>>
>>>>
>>>> ---------- Forwarded message ---------
>>>> From: Ralph H Castain <r...@open-mpi.org>
>>>> Date: Thu, Nov 1, 2018 at 1:07 PM
>>>> Subject: Re: [OMPI users] Bug with Open-MPI Processor Count
>>>> To: Open MPI Users <users@lists.open-mpi.org>
>>>>
>>>>
>>>> Set rmaps_base_verbose=10 for debugging output
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Nov 1, 2018, at 9:31 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote:
>>>>
>>>> The version by the way for Open-MPI is 3.1.2.
>>>>
>>>> -Adam LeBlanc
>>>>
>>>> On Thu, Nov 1, 2018 at 12:05 PM Adam LeBlanc <alebl...@iol.unh.edu>
>>>> wrote:
>>>>
>>>>> Hello, I am an employee of the UNH InterOperability Lab, and we are in
>>>>> the process of testing OFED-4.17-RC1 for the OpenFabrics Alliance. We have
>>>>> purchased some new hardware that has one processor, and noticed an issue
>>>>> when running mpi jobs on nodes that do not have similar processor counts.
>>>>> If we launch the MPI job from a node that has 2 processors, it will fail
>>>>> and stating there are not enough resources and will not start the run, 
>>>>> like
>>>>> so:
>>>>> --------------------------------------------------------------------------
>>>>> There are not enough slots available in the system to satisfy the 14 slots
>>>>> that were requested by the application:   IMB-MPI1 Either request fewer
>>>>> slots for your application, or make more slots available for use.
>>>>> --------------------------------------------------------------------------
>>>>> If we launch the MPI job from the node with one processor, without 
>>>>> changing
>>>>> the mpirun command at all, it runs as expected. Here is the command being
>>>>> run: mpirun --mca btl_openib_warn_no_device_params_found 0 --mca
>>>>> orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca
>>>>> btl_openib_receive_queues P,65536,120,64,32 -hostfile
>>>>> /home/soesterreich/ce-mpi-hosts IMB-MPI1 Here is the hostfile being used:
>>>>> farbauti-ce.ofa.iol.unh.edu slots=1 hyperion-ce.ofa.iol.unh.edu
>>>>> slots=1 io-ce.ofa.iol.unh.edu slots=1 jarnsaxa-ce.ofa.iol.unh.edu
>>>>> slots=1 rhea-ce.ofa.iol.unh.edu slots=1 tarqeq-ce.ofa.iol.unh.edu
>>>>> slots=1 tarvos-ce.ofa.iol.unh.edu slots=1 This seems like a bug and
>>>>> we would like some help to explain and fix what is happening. The IBTA
>>>>> plugfest saw similar behaviours, so this should be reproduceable. Thanks,
>>>>> Adam LeBlanc
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>
>>> <passing_verbose_output.txt><failing_verbose_output.txt>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to