Re: [OMPI devel] trunk broken

2014-06-25 Thread Ralph Castain
Looks to me like the warning message saids it all - the problem is in openib. The reason we took this action was to force the problems to the surface across the code base so that people would address them. We've tried before to just ask people to set the right flags to enable async progress and

Re: [OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
tried with vader - same crash *14:14:22* [vegas12:32068] 7 more processes have sent help message help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages*14:14:22* +

Re: [OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
will do and update shortly. On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Mike, > > could you try again with > > OMPI_MCA_btl=vader,self,openib > > it seems the sm module causes a hang > (which later causes the timeout sending a SIGSEGV) > >

Re: [OMPI devel] trunk broken

2014-06-25 Thread Ralph Castain
We should have given more of a "heads up" here. We recognize that the trunk may well become unstable as we can't test all the variations, and clearly some timing issues are going to arise with this change. Our hope is that we can iron them out quickly. If not, then we'll revert and try again. You

Re: [OMPI devel] MPI_Comm_spawn fails under certain conditions

2014-06-25 Thread Ralph Castain
I see your point, but I don't know how to make that happen. The problem is that spawn really should fail under certain conditions because you asked us to do something we couldn't do - i.e., you asked that we launch and bind more processes then we could. Increasing the number of available resources

Re: [OMPI devel] OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-25 Thread Ralph Castain
Saw it and will review - thanks! On Tue, Jun 24, 2014 at 9:51 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Ralph, > > i pushed the change (r32079) and updated the wiki. > > the RFC can be now closed and the consensus is semantic of > opal_hwloc_base_get_relative_locality

Re: [OMPI devel] trunk broken

2014-06-25 Thread Gilles Gouaillardet
Mike, could you try again with OMPI_MCA_btl=vader,self,openib it seems the sm module causes a hang (which later causes the timeout sending a SIGSEGV) Cheers, Gilles On 2014/06/25 14:22, Mike Dubman wrote: > Hi, > The following commit broke trunk in jenkins: > Per the OMPI developer

[OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
Hi, The following commit broke trunk in jenkins: >>>Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS *22:15:09* + LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09* +

Re: [OMPI devel] MPI_Comm_spawn fails under certain conditions

2014-06-25 Thread Gilles Gouaillardet
Hi Ralph, On 2014/06/25 2:51, Ralph Castain wrote: > Had a chance to review this with folks here, and we think that having > oversubscribe automatically set overload makes some sense. However, we do > want to retain the ability to separately specify oversubscribe and overload > as well since