Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children never returned! I configured MPI with ./configure --prefix=/dir/ --enable-debug --with-tm=/usr/local/ On Feb 22, 2014, at 12:53 AM, Ralph Castain wrote: > Strange - it all looks just fine. How was OMPI configured? >

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Ralph Castain
Strange - it all looks just fine. How was OMPI configured? On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran wrote: > Ok, I figured out that it was not a problem with the node grsacc04 because I > now conducted the same on totally different set of nodes. > > I must

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
Ok, I figured out that it was not a problem with the node grsacc04 because I now conducted the same on totally different set of nodes. I must really say that with --bind-to none option, the program completed "many" times compared to earlier but still "sometimes" it hangs! Attaching now the

Re: [OMPI devel] 1.7.5 status

2014-02-21 Thread Paul Hargrove
On Fri, Feb 21, 2014 at 1:18 PM, Ralph Castain wrote: > Still on the table: > [...] > * SGI xpmem support > To the best of my knowledge I am the only one with platform access to test this. Nathan hasn't sent me anything new recently. -Paul -- Paul H. Hargrove

[OMPI devel] 1.7.5 status

2014-02-21 Thread Ralph Castain
Hi folks Just an end-of-week status update on the 1.7.5 branch. With most CMRs applied, it doesn't look too bad. We still have failures in the following MPI functions: * intercomm_create - was supposed to be fixed by the coll/ml CMR, but apparently was not * datatype/idx_null *

Re: [OMPI devel] startup sstore orte/mca/ess/base/ess_base_std_tool.c

2014-02-21 Thread Josh Hursey
+1 On Fri, Feb 21, 2014 at 10:04 AM, Ralph Castain wrote: > looks fine to me > > > On Feb 21, 2014, at 6:23 AM, Adrian Reber wrote: > > > To restart a process using orte-restart I need sstore initialized when > > running as a tool. This is currently

Re: [OMPI devel] mca_base_component_distill_checkpoint_ready variable

2014-02-21 Thread Nathan Hjelm
On Fri, Feb 21, 2014 at 05:21:10PM +0100, Adrian Reber wrote: > There is a variable in the FT code which is not defined and therefore > currently #ifdef'd out. > > #if (OPAL_ENABLE_FT == 1) && (OPAL_ENABLE_FT_CR == 1) > #ifdef ENABLE_FT_FIXED > /* FIXME_FT > * > * the variable

[OMPI devel] mca_base_component_distill_checkpoint_ready variable

2014-02-21 Thread Adrian Reber
There is a variable in the FT code which is not defined and therefore currently #ifdef'd out. #if (OPAL_ENABLE_FT == 1) && (OPAL_ENABLE_FT_CR == 1) #ifdef ENABLE_FT_FIXED /* FIXME_FT * * the variable mca_base_component_distill_checkpoint_ready * was removed by commit

Re: [OMPI devel] startup sstore orte/mca/ess/base/ess_base_std_tool.c

2014-02-21 Thread Ralph Castain
looks fine to me On Feb 21, 2014, at 6:23 AM, Adrian Reber wrote: > To restart a process using orte-restart I need sstore initialized when > running as a tool. This is currently missing. The new code is > > #if OPAL_ENABLE_FT_CR == 1 > > and should only affect --with-ft

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Ralph Castain
Well, that all looks fine. However, I note that the procs on grsacc04 all stopped making progress at the same point, which is why the job hung. All the procs on the other nodes were just fine. So let's try a couple of things: 1. add "--bind-to none" to your cmd line so we avoid any contention

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
Right, so I have the output here. Same case, mpiexec -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca grpcomm_base_verbose 5 -np 3 ./simple_spawn Output attached. Best, Suraj output Description: Binary data On Feb 21, 2014, at 5:30 AM, Ralph Castain wrote: > > On Feb 20, 2014,