On 17/04/2019 16.51, Jeff Squyres (jsquyres) via users wrote: > On Apr 17, 2019, at 3:38 AM, Steffen Christgau <christ...@cs.uni-potsdam.de> > wrote: >> was configured with nothing more than a --prefix and >> --enable-mpi-fortran. I checked for updates and it appears that there >> was an issue until 4.0.1 with oversubscription. The changelog states >> >>> - Fix a problem with the ORTE rmaps_base_oversubscribe MCA paramater. >> >> Using --mca rmaps_base_oversubscribe 1 on the command line works with >> the 4.0.1 version. I added an entry in the openmpi-mca-params.conf file >> and I can now use over-subscribing calls of mpirun without any >> additional arguments aside from the number of processes. > > Excellent! Glad that v4.0.1 fixed the issue for you.
Thanks for fixing the mca param issue in 4.0.1. ;-) However, I observed two things while struggling with that problem 1) The FAQ #24 on "Running MPI Jobs" (https://www.open-mpi.org/faq/?category=running#oversubscribing) appears to be incorrect on the oversubscribing matter. If I follow the first example and without any MCA parameter configured I still get the "There are not enough slots available in the system..." error message with 4.0.1. The entry indicates that this should work. Maybe a hint on the rmaps MCA param (which one? see below) would be helpful to other users. 2) The MCA parameters rmaps_base_no_oversubscribe and rmaps_base_oversubscribe have nearly the same meaning. At least to me, it appears that one is the negation of the other (from ompi_info, emphasizes mine): rmaps_base_no_oversubscribe: "If true, then _do not allow_ oversubscription of nodes - mpirun will return an error if there aren't enough nodes to launch all processes without oversubscribing" rmaps_base_oversubscribe: "If true, then _allow_ oversubscription of nodes and overloading of processing elements" So if I have base_oversubscribe set to true ("I want nodes to be oversubscribed") and no_oversubscribe set to true as well ("No, please don't oversubscribe nodes") then I won't get oversubscription. In fact, 4.0.1's mpirun tell me with the above setting: $ mpirun -n 8 --mca rmaps_base_no_oversubscribe 1 --mca rmaps_base_oversubscribe 1 hostname -------------------------------------------------------------------------- Conflicting directives for mapping policy are causing the policy to be redefined: New policy: oversubscribe Prior policy: BYSOCKET:NOOVERSUBSCRIBE Please check that only one policy is defined. -------------------------------------------------------------------------- The message makes sense, since the MCA param values are conflicting The other way around: $ mpirun -n 8 --mca rmaps_base_no_oversubscribe 0 --mca rmaps_base_oversubscribe 0 hostname -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 8 slots .... Still makes sense, since the oversubscribe MCA value prevents oversubscription. Does that mean, in the end, one has to configure both variables with the correct meanings (one must the inverse of the other) to achieve the intended effect? What is the rationale behind this (if any)? Is rmaps_base_no_oversubscribe something like a safeguard? Regards, Steffen _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users