On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote: > On Jan 3, 2013, at 3:01 AM, Ake Sandgren <ake.sandg...@hpc2n.umu.se> wrote: > > > On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: > >> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: > >>> Hi! > >>> > >>> The grpcomm component hier seems to have vanished between 1.6.1 and > >>> 1.6.3. > >>> Why? > >>> It seems that the version of slurm we are using (not the latest at the > >>> moment) is using it for startup. > > It should be using PMI if you are directly launching processes via srun, and > should not be using hier any more.
Shouldn't the grpcomm pmi component be turned on by default then, if it is needed? > >>> > >> > >> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier. > > Yes - that is the *only* scenario (a direct launch of procs via srun) that > should use hier What i have in my submit file is: #SBATCH -n x srun some-mpi-binary This fails since hier is missing. The reason one wants to use srun and not mpirun is getting slurms cgroup containement. > > > > orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier > > grpcomm > > Something is very wrong if that is true. How was this configured, and how are > you starting this job? Not sure if it actually tries to use hier at runtime, i just noticed that it had a setenv OMPI_MCA_grpcomm=hier in the code. So what is the real problem here? configure line is: ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions