Sorry for the delay - got tied up. This looks sane to me and should work.
FWIW: I haven't seen any problem with comm_spawn_multiple. That code will only
executes if the specific limit is hit, so I suspect that is an issue of scale
and race conditions.
On Jul 12, 2011, at 6:44 PM, Eugene Loh
Well, yes and no.
Yes - the value definitely needs to be computed in all cases since it is used
later on.
No - the way it is computed is only correct for that one usage. The later usage
(in the block that starts with 0 < opal_sys_limits.num_procs)) needs a
completely different value.
The
The function orte_odls_base_default_launch_local() has a variable
num_procs_alive that is basically initialized like this:
if ( oversubscribed ) {
...
} else {
num_procs_alive = ...;
}
Specifically, if the "oversubscribed" test passes, the variable is not