Looks to me like the warning message saids it all - the problem is in
openib.
The reason we took this action was to force the problems to the surface
across the code base so that people would address them. We've tried before
to just ask people to set the right flags to enable async progress and
tried with vader - same crash
*14:14:22* [vegas12:32068] 7 more processes have sent help message
help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set
MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages*14:14:22* +
will do and update shortly.
On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
> Mike,
>
> could you try again with
>
> OMPI_MCA_btl=vader,self,openib
>
> it seems the sm module causes a hang
> (which later causes the timeout sending a SIGSEGV)
>
>
We should have given more of a "heads up" here. We recognize that the trunk
may well become unstable as we can't test all the variations, and clearly
some timing issues are going to arise with this change. Our hope is that we
can iron them out quickly. If not, then we'll revert and try again.
You
I see your point, but I don't know how to make that happen. The problem is
that spawn really should fail under certain conditions because you asked us
to do something we couldn't do - i.e., you asked that we launch and bind
more processes then we could. Increasing the number of available resources
Saw it and will review - thanks!
On Tue, Jun 24, 2014 at 9:51 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
> Ralph,
>
> i pushed the change (r32079) and updated the wiki.
>
> the RFC can be now closed and the consensus is semantic of
> opal_hwloc_base_get_relative_locality
Mike,
could you try again with
OMPI_MCA_btl=vader,self,openib
it seems the sm module causes a hang
(which later causes the timeout sending a SIGSEGV)
Cheers,
Gilles
On 2014/06/25 14:22, Mike Dubman wrote:
> Hi,
> The following commit broke trunk in jenkins:
>
Per the OMPI developer
Hi,
The following commit broke trunk in jenkins:
>>>Per the OMPI developer conference, remove the last vestiges of
OMPI_USE_PROGRESS_THREADS
*22:15:09* +
LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09*
+
Hi Ralph,
On 2014/06/25 2:51, Ralph Castain wrote:
> Had a chance to review this with folks here, and we think that having
> oversubscribe automatically set overload makes some sense. However, we do
> want to retain the ability to separately specify oversubscribe and overload
> as well since