On 17/04/2019 16.51, Jeff Squyres (jsquyres) via users wrote:
> On Apr 17, 2019, at 3:38 AM, Steffen Christgau <christ...@cs.uni-potsdam.de> 
> wrote:
>> was configured with nothing more than a --prefix and
>> --enable-mpi-fortran. I checked for updates and it appears that there
>> was an issue until 4.0.1 with oversubscription. The changelog states
>>
>>> - Fix a problem with the ORTE rmaps_base_oversubscribe MCA paramater.
>>
>> Using --mca rmaps_base_oversubscribe 1 on the command line works with
>> the 4.0.1 version. I added an entry in the openmpi-mca-params.conf file
>> and I can now use over-subscribing calls of mpirun without any
>> additional arguments aside from the number of processes.
> 
> Excellent!  Glad that v4.0.1 fixed the issue for you.

Thanks for fixing the mca param issue in 4.0.1. ;-)

However, I observed two things while struggling with that problem

1) The FAQ #24 on "Running MPI Jobs"
(https://www.open-mpi.org/faq/?category=running#oversubscribing) appears
to be incorrect on the oversubscribing matter. If I follow the first
example and without any MCA parameter configured I still get the "There
are not enough slots available in the system..." error message with
4.0.1. The entry indicates that this should work. Maybe a hint on the
rmaps MCA param (which one? see below) would be helpful to other users.

2) The MCA parameters rmaps_base_no_oversubscribe and
rmaps_base_oversubscribe have nearly the same meaning. At least to me,
it appears that one is the negation of the other (from ompi_info,
emphasizes mine):

rmaps_base_no_oversubscribe: "If true, then _do not allow_
oversubscription of nodes - mpirun will return an error if there aren't
enough nodes to launch all processes without oversubscribing"

rmaps_base_oversubscribe: "If true, then _allow_ oversubscription of
nodes and overloading of processing elements"

So if I have base_oversubscribe set to true ("I want nodes to be
oversubscribed") and no_oversubscribe set to true as well ("No, please
don't oversubscribe nodes") then I won't get oversubscription. In fact,
4.0.1's mpirun tell me with the above setting:

$ mpirun -n 8 --mca rmaps_base_no_oversubscribe 1 --mca
rmaps_base_oversubscribe 1 hostname
--------------------------------------------------------------------------
Conflicting directives for mapping policy are causing the policy
to be redefined:

  New policy:   oversubscribe
  Prior policy:  BYSOCKET:NOOVERSUBSCRIBE

Please check that only one policy is defined.
--------------------------------------------------------------------------

The message makes sense, since the MCA param values are conflicting

The other way around:

$ mpirun -n 8 --mca rmaps_base_no_oversubscribe 0 --mca
rmaps_base_oversubscribe 0 hostname
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 8 slots
....

Still makes sense, since the oversubscribe MCA value prevents
oversubscription.

Does that mean, in the end, one has to configure both variables with the
correct meanings (one must the inverse of the other) to achieve the
intended effect? What is the rationale behind this (if any)? Is
rmaps_base_no_oversubscribe something like a safeguard?

Regards, Steffen
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to