On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
> On Jan 3, 2013, at 3:01 AM, Ake Sandgren <ake.sandg...@hpc2n.umu.se> wrote:
> 
> > On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
> >> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
> >>> Hi!
> >>> 
> >>> The grpcomm component hier seems to have vanished between 1.6.1 and
> >>> 1.6.3.
> >>> Why?
> >>> It seems that the version of slurm we are using (not the latest at the
> >>> moment) is using it for startup.
> 
> It should be using PMI if you are directly launching processes via srun, and 
> should not be using hier any more.

Shouldn't the grpcomm pmi component be turned on by default then, if it
is needed?

> >>> 
> >> 
> >> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.
> 
> Yes - that is the *only* scenario (a direct launch of procs via srun) that 
> should use hier

What i have in my submit file is:
#SBATCH -n x

srun some-mpi-binary

This fails since hier is missing.

The reason one wants to use srun and not mpirun is getting slurms cgroup
containement.

> > 
> > orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
> > grpcomm
> 
> Something is very wrong if that is true. How was this configured, and how are 
> you starting this job?

Not sure if it actually tries to use hier at runtime, i just noticed
that it had a setenv OMPI_MCA_grpcomm=hier in the code.

So what is the real problem here?

configure line is:
./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions

Reply via email to