Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ake Sandgren
On Thu, 2013-01-03 at 07:14 -0800, Ralph Castain wrote:
> > Well, it isn't :-)
> > configure says:
> > --- MCA component grpcomm:pmi (m4 configuration macro)
> > checking for MCA component grpcomm:pmi compile mode... dso
> > checking if user requested PMI support... no
> > checking if MCA component grpcomm:pmi can compile... no
> 
> Ah - that is the problem. You need to configure 
> --with-pmi=

Ahh thanks. Was assuming i needed something like that.

> > Not sure what you mean here. slurm's pmi module is available (and Intel
> > MPI can use it if i point it to it).
> 
> Yeah, we need to be pointed to it just like Intel.

Doh :-(

> > 
> > Anyway, I think that if there is code that tries to use the hier
> > component it shouldn't have been removed.
> 
> Agreed - it looks like something picked up an unintended change. Just trying 
> to help you work with it as I don't know when a 1.6.4 will occur.

I pulled the 1.6.1 hier component and reran autogen so i have it working
but it's good to know what's to be expected in later releases.



Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ralph Castain

On Jan 3, 2013, at 7:07 AM, Ake Sandgren  wrote:

> On Thu, 2013-01-03 at 07:00 -0800, Ralph Castain wrote:
>> On Jan 3, 2013, at 6:52 AM, Ake Sandgren  wrote:
>> 
>>> On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
 On Jan 3, 2013, at 3:01 AM, Ake Sandgren  wrote:
 
> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
>> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
>>> Hi!
>>> 
>>> The grpcomm component hier seems to have vanished between 1.6.1 and
>>> 1.6.3.
>>> Why?
>>> It seems that the version of slurm we are using (not the latest at the
>>> moment) is using it for startup.
 
 It should be using PMI if you are directly launching processes via srun, 
 and should not be using hier any more.
>>> 
>>> Shouldn't the grpcomm pmi component be turned on by default then, if it
>>> is needed?
>> 
>> It should be
> 
> Well, it isn't :-)
> configure says:
> --- MCA component grpcomm:pmi (m4 configuration macro)
> checking for MCA component grpcomm:pmi compile mode... dso
> checking if user requested PMI support... no
> checking if MCA component grpcomm:pmi can compile... no

Ah - that is the problem. You need to configure 
--with-pmi=

> 
>>> So what is the real problem here?
>> 
>> Do you have PMI installed and running on your system? I think that is the 
>> source of the trouble - if PMI isn't running, then this will fail.
> 
> Not sure what you mean here. slurm's pmi module is available (and Intel
> MPI can use it if i point it to it).

Yeah, we need to be pointed to it just like Intel.

> 
> Anyway, I think that if there is code that tries to use the hier
> component it shouldn't have been removed.

Agreed - it looks like something picked up an unintended change. Just trying to 
help you work with it as I don't know when a 1.6.4 will occur.


> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ake Sandgren
On Thu, 2013-01-03 at 07:00 -0800, Ralph Castain wrote:
> On Jan 3, 2013, at 6:52 AM, Ake Sandgren  wrote:
> 
> > On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
> >> On Jan 3, 2013, at 3:01 AM, Ake Sandgren  wrote:
> >> 
> >>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
>  On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
> > Hi!
> > 
> > The grpcomm component hier seems to have vanished between 1.6.1 and
> > 1.6.3.
> > Why?
> > It seems that the version of slurm we are using (not the latest at the
> > moment) is using it for startup.
> >> 
> >> It should be using PMI if you are directly launching processes via srun, 
> >> and should not be using hier any more.
> > 
> > Shouldn't the grpcomm pmi component be turned on by default then, if it
> > is needed?
> 
> It should be

Well, it isn't :-)
configure says:
--- MCA component grpcomm:pmi (m4 configuration macro)
checking for MCA component grpcomm:pmi compile mode... dso
checking if user requested PMI support... no
checking if MCA component grpcomm:pmi can compile... no

> > So what is the real problem here?
> 
> Do you have PMI installed and running on your system? I think that is the 
> source of the trouble - if PMI isn't running, then this will fail.

Not sure what you mean here. slurm's pmi module is available (and Intel
MPI can use it if i point it to it).

Anyway, I think that if there is code that tries to use the hier
component it shouldn't have been removed.



Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ralph Castain

On Jan 3, 2013, at 6:52 AM, Ake Sandgren  wrote:

> On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
>> On Jan 3, 2013, at 3:01 AM, Ake Sandgren  wrote:
>> 
>>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
 On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
> Hi!
> 
> The grpcomm component hier seems to have vanished between 1.6.1 and
> 1.6.3.
> Why?
> It seems that the version of slurm we are using (not the latest at the
> moment) is using it for startup.
>> 
>> It should be using PMI if you are directly launching processes via srun, and 
>> should not be using hier any more.
> 
> Shouldn't the grpcomm pmi component be turned on by default then, if it
> is needed?

It should be

> 
> 
 
 Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.
>> 
>> Yes - that is the *only* scenario (a direct launch of procs via srun) that 
>> should use hier
> 
> What i have in my submit file is:
> #SBATCH -n x
> 
> srun some-mpi-binary
> 
> This fails since hier is missing.
> 
> The reason one wants to use srun and not mpirun is getting slurms cgroup
> containement.
> 
>>> 
>>> orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
>>> grpcomm
>> 
>> Something is very wrong if that is true. How was this configured, and how 
>> are you starting this job?
> 
> Not sure if it actually tries to use hier at runtime, i just noticed
> that it had a setenv OMPI_MCA_grpcomm=hier in the code.
> 
> So what is the real problem here?

Do you have PMI installed and running on your system? I think that is the 
source of the trouble - if PMI isn't running, then this will fail.


> 
> configure line is:
> ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ake Sandgren
On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
> On Jan 3, 2013, at 3:01 AM, Ake Sandgren  wrote:
> 
> > On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
> >> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
> >>> Hi!
> >>> 
> >>> The grpcomm component hier seems to have vanished between 1.6.1 and
> >>> 1.6.3.
> >>> Why?
> >>> It seems that the version of slurm we are using (not the latest at the
> >>> moment) is using it for startup.
> 
> It should be using PMI if you are directly launching processes via srun, and 
> should not be using hier any more.

Shouldn't the grpcomm pmi component be turned on by default then, if it
is needed?

> >>> 
> >> 
> >> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.
> 
> Yes - that is the *only* scenario (a direct launch of procs via srun) that 
> should use hier

What i have in my submit file is:
#SBATCH -n x

srun some-mpi-binary

This fails since hier is missing.

The reason one wants to use srun and not mpirun is getting slurms cgroup
containement.

> > 
> > orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
> > grpcomm
> 
> Something is very wrong if that is true. How was this configured, and how are 
> you starting this job?

Not sure if it actually tries to use hier at runtime, i just noticed
that it had a setenv OMPI_MCA_grpcomm=hier in the code.

So what is the real problem here?

configure line is:
./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions



Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ralph Castain

On Jan 3, 2013, at 3:01 AM, Ake Sandgren  wrote:

> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
>> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
>>> Hi!
>>> 
>>> The grpcomm component hier seems to have vanished between 1.6.1 and
>>> 1.6.3.
>>> Why?
>>> It seems that the version of slurm we are using (not the latest at the
>>> moment) is using it for startup.

It should be using PMI if you are directly launching processes via srun, and 
should not be using hier any more.

>>> 
>> 
>> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.

Yes - that is the *only* scenario (a direct launch of procs via srun) that 
should use hier

> 
> orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
> grpcomm

Something is very wrong if that is true. How was this configured, and how are 
you starting this job?


> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ake Sandgren
On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
> > Hi!
> > 
> > The grpcomm component hier seems to have vanished between 1.6.1 and
> > 1.6.3.
> > Why?
> > It seems that the version of slurm we are using (not the latest at the
> > moment) is using it for startup.
> > 
> 
> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.

orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
grpcomm



Re: [OMPI users] grpcomm component hier gone...

2013-01-03 Thread Ake Sandgren
On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
> Hi!
> 
> The grpcomm component hier seems to have vanished between 1.6.1 and
> 1.6.3.
> Why?
> It seems that the version of slurm we are using (not the latest at the
> moment) is using it for startup.
> 

Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.

Please fix :-)