Re: [OMPI users] grpcomm component hier gone...
On Thu, 2013-01-03 at 07:14 -0800, Ralph Castain wrote: > > Well, it isn't :-) > > configure says: > > --- MCA component grpcomm:pmi (m4 configuration macro) > > checking for MCA component grpcomm:pmi compile mode... dso > > checking if user requested PMI support... no > > checking if MCA component grpcomm:pmi can compile... no > > Ah - that is the problem. You need to configure > --with-pmi= Ahh thanks. Was assuming i needed something like that. > > Not sure what you mean here. slurm's pmi module is available (and Intel > > MPI can use it if i point it to it). > > Yeah, we need to be pointed to it just like Intel. Doh :-( > > > > Anyway, I think that if there is code that tries to use the hier > > component it shouldn't have been removed. > > Agreed - it looks like something picked up an unintended change. Just trying > to help you work with it as I don't know when a 1.6.4 will occur. I pulled the 1.6.1 hier component and reran autogen so i have it working but it's good to know what's to be expected in later releases.
Re: [OMPI users] grpcomm component hier gone...
On Jan 3, 2013, at 7:07 AM, Ake Sandgrenwrote: > On Thu, 2013-01-03 at 07:00 -0800, Ralph Castain wrote: >> On Jan 3, 2013, at 6:52 AM, Ake Sandgren wrote: >> >>> On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote: On Jan 3, 2013, at 3:01 AM, Ake Sandgren wrote: > On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: >> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: >>> Hi! >>> >>> The grpcomm component hier seems to have vanished between 1.6.1 and >>> 1.6.3. >>> Why? >>> It seems that the version of slurm we are using (not the latest at the >>> moment) is using it for startup. It should be using PMI if you are directly launching processes via srun, and should not be using hier any more. >>> >>> Shouldn't the grpcomm pmi component be turned on by default then, if it >>> is needed? >> >> It should be > > Well, it isn't :-) > configure says: > --- MCA component grpcomm:pmi (m4 configuration macro) > checking for MCA component grpcomm:pmi compile mode... dso > checking if user requested PMI support... no > checking if MCA component grpcomm:pmi can compile... no Ah - that is the problem. You need to configure --with-pmi= > >>> So what is the real problem here? >> >> Do you have PMI installed and running on your system? I think that is the >> source of the trouble - if PMI isn't running, then this will fail. > > Not sure what you mean here. slurm's pmi module is available (and Intel > MPI can use it if i point it to it). Yeah, we need to be pointed to it just like Intel. > > Anyway, I think that if there is code that tries to use the hier > component it shouldn't have been removed. Agreed - it looks like something picked up an unintended change. Just trying to help you work with it as I don't know when a 1.6.4 will occur. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] grpcomm component hier gone...
On Thu, 2013-01-03 at 07:00 -0800, Ralph Castain wrote: > On Jan 3, 2013, at 6:52 AM, Ake Sandgrenwrote: > > > On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote: > >> On Jan 3, 2013, at 3:01 AM, Ake Sandgren wrote: > >> > >>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: > On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: > > Hi! > > > > The grpcomm component hier seems to have vanished between 1.6.1 and > > 1.6.3. > > Why? > > It seems that the version of slurm we are using (not the latest at the > > moment) is using it for startup. > >> > >> It should be using PMI if you are directly launching processes via srun, > >> and should not be using hier any more. > > > > Shouldn't the grpcomm pmi component be turned on by default then, if it > > is needed? > > It should be Well, it isn't :-) configure says: --- MCA component grpcomm:pmi (m4 configuration macro) checking for MCA component grpcomm:pmi compile mode... dso checking if user requested PMI support... no checking if MCA component grpcomm:pmi can compile... no > > So what is the real problem here? > > Do you have PMI installed and running on your system? I think that is the > source of the trouble - if PMI isn't running, then this will fail. Not sure what you mean here. slurm's pmi module is available (and Intel MPI can use it if i point it to it). Anyway, I think that if there is code that tries to use the hier component it shouldn't have been removed.
Re: [OMPI users] grpcomm component hier gone...
On Jan 3, 2013, at 6:52 AM, Ake Sandgrenwrote: > On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote: >> On Jan 3, 2013, at 3:01 AM, Ake Sandgren wrote: >> >>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: > Hi! > > The grpcomm component hier seems to have vanished between 1.6.1 and > 1.6.3. > Why? > It seems that the version of slurm we are using (not the latest at the > moment) is using it for startup. >> >> It should be using PMI if you are directly launching processes via srun, and >> should not be using hier any more. > > Shouldn't the grpcomm pmi component be turned on by default then, if it > is needed? It should be > > Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier. >> >> Yes - that is the *only* scenario (a direct launch of procs via srun) that >> should use hier > > What i have in my submit file is: > #SBATCH -n x > > srun some-mpi-binary > > This fails since hier is missing. > > The reason one wants to use srun and not mpirun is getting slurms cgroup > containement. > >>> >>> orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier >>> grpcomm >> >> Something is very wrong if that is true. How was this configured, and how >> are you starting this job? > > Not sure if it actually tries to use hier at runtime, i just noticed > that it had a setenv OMPI_MCA_grpcomm=hier in the code. > > So what is the real problem here? Do you have PMI installed and running on your system? I think that is the source of the trouble - if PMI isn't running, then this will fail. > > configure line is: > ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] grpcomm component hier gone...
On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote: > On Jan 3, 2013, at 3:01 AM, Ake Sandgrenwrote: > > > On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: > >> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: > >>> Hi! > >>> > >>> The grpcomm component hier seems to have vanished between 1.6.1 and > >>> 1.6.3. > >>> Why? > >>> It seems that the version of slurm we are using (not the latest at the > >>> moment) is using it for startup. > > It should be using PMI if you are directly launching processes via srun, and > should not be using hier any more. Shouldn't the grpcomm pmi component be turned on by default then, if it is needed? > >>> > >> > >> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier. > > Yes - that is the *only* scenario (a direct launch of procs via srun) that > should use hier What i have in my submit file is: #SBATCH -n x srun some-mpi-binary This fails since hier is missing. The reason one wants to use srun and not mpirun is getting slurms cgroup containement. > > > > orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier > > grpcomm > > Something is very wrong if that is true. How was this configured, and how are > you starting this job? Not sure if it actually tries to use hier at runtime, i just noticed that it had a setenv OMPI_MCA_grpcomm=hier in the code. So what is the real problem here? configure line is: ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions
Re: [OMPI users] grpcomm component hier gone...
On Jan 3, 2013, at 3:01 AM, Ake Sandgrenwrote: > On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: >> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: >>> Hi! >>> >>> The grpcomm component hier seems to have vanished between 1.6.1 and >>> 1.6.3. >>> Why? >>> It seems that the version of slurm we are using (not the latest at the >>> moment) is using it for startup. It should be using PMI if you are directly launching processes via srun, and should not be using hier any more. >>> >> >> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier. Yes - that is the *only* scenario (a direct launch of procs via srun) that should use hier > > orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier > grpcomm Something is very wrong if that is true. How was this configured, and how are you starting this job? > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] grpcomm component hier gone...
On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: > On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: > > Hi! > > > > The grpcomm component hier seems to have vanished between 1.6.1 and > > 1.6.3. > > Why? > > It seems that the version of slurm we are using (not the latest at the > > moment) is using it for startup. > > > > Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier. orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier grpcomm
Re: [OMPI users] grpcomm component hier gone...
On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: > Hi! > > The grpcomm component hier seems to have vanished between 1.6.1 and > 1.6.3. > Why? > It seems that the version of slurm we are using (not the latest at the > moment) is using it for startup. > Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier. Please fix :-)