Re: [OMPI users] Qlogic & openmpi

2011-12-05 Thread Bogdan Costescu
On Mon, Dec 5, 2011 at 16:12, Ralph Castain  wrote:
> Sounds like we should be setting this value when starting the process - yes?
> If so, what is the "good" value, and how do we compute it?

I've also been just looking at this for the past few days. What I came
up with is a small script psm_shctx which sets the envvar then execs
the MPI binary and is inserted between mpirun and the MPI binary:

mpirun psm_shctx my_mpi_app

Of course the same effect can be obtained if the orted would set the
envvar before starting the process. There is however a problem:
deciding how many contexts to use. For max. performance, one should
use a ratio of 1:1 between MPI ranks and contexts; the highest ratio
possible (but with lowest performance) is 4 MPI ranks per context;
another restriction is that each job should have at least 1 context.

F.e. on AMD cluster nodes with 4 CPUs of 12 cores (so total of 48
cores) one gets 16 contexts; assigning all 16 contexts to 48 ranks
would mean a ratio of 1:3 but this can only apply if allocation of
cores is done in multiples of 4; with a less advantageous allocation
strategy more contexts are lost due to rounding up. At the extreme, if
there's only one rank per job, there can only be maximum 16 jobs -
using all 16 contexts and the rest of 32 cores have to remain idle or
be used for other jobs that don't require communication over
InfiniPath.

There is a further issue though: MPI-2 dynamic creation of processes -
if it's not known how many ranks there will be, I guess one should use
the highest context sharing ratio (1:4) to be on the safe side.

I've found a mention of this envvar being handled in the changelog for
MVAPICH2 1.4.1 - maybe that can serve as source of inspiration ? (but
I haven't looked at it...)

Hope this helps,
Bogdan


Re: [OMPI users] Qlogic & openmpi

2011-12-05 Thread arnaud Heritier
-
Arnaud HERITIER
Meteo France International
+33 561432940
arnaud.herit...@mfi.fr
--


On Mon, Dec 5, 2011 at 6:12 PM, Ralph Castain  wrote:

>
> On Dec 5, 2011, at 5:49 AM, arnaud Heritier wrote:
>
> Hello,
>
> I found the solution, thanks to Qlogic support.
>
> The "can't open /dev/ipath, network down (err=26)" message from the ipath
> driver is really misleading.
>
> Actually, this is an hardware context problem on the Qlogic PSM. PSM can't
> allocate any hardware context for the job because  other(s) MPI job(s) have
> already used all available contexts. In order to avoid this problem, every
> MPI jobs have to use the  PSM_SHAREDCONTEXTS_MAX variable set with the good
> value, according to the number of processes that will run on the node. If
> we don't use this variable, PSM will "greedily" use all contexts with the
> first mpi job spawned on the node.
>
>
> Sounds like we should be setting this value when starting the process -
> yes? If so, what is the "good" value, and how do we compute it?
>

The good value is roundup( $OMPI_COMM_WORLD_LOCAL_SIZE / Context shared
ratio)  (ratio max 4 on my HCA)
Qlogic provided me with simple script to compute this value, i just changed
my mpirun script to call this script , set the PSM_SHAREDCONTEXTS_MAX with
the returned value and the call the mpi binary.
Script attached.

Arnaud


> Regards,
>
> Arnaud
>
>
> On Tue, Nov 29, 2011 at 6:44 PM, Jeff Squyres  wrote:
>
>> On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote:
>>
>> > I do have a contract and i tried to open a case, but their support is
>> ..
>>
>> What happens if you put a delay between the two jobs?  E.g., if you just
>> delay a few seconds before the 2nd job starts?  Perhaps the ipath device
>> just needs a little time before it will be available...?  (that's a total
>> guess)
>>
>> I suggest this because the PSM device will definitely give you better
>> overall performance than the QLogic verbs support.  Their verbs support
>> basically barely works -- PSM is their primary device and the one that we
>> always recommend.
>>
>> > Anyway. I'm stii working on the strange error message from mpirun
>> saying it can't allocate memory when at the same time it also reports that
>> the memory is unlimited ...
>> >
>> >
>> > Arnaud
>> >
>> > On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres 
>> wrote:
>> > I'm afraid we don't have any contacts left at QLogic to ask them any
>> more... do you have a support contract, perchance?
>> >
>> > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote:
>> >
>> > > Hello,
>> > >
>> > > I run into a stange problem with qlogic OFED and openmpi. When i
>> submit (through SGE) 2 jobs on the same node, the second job ends up with:
>> > >
>> > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)
>> > >
>> > > I'm pretty sure the infiniband is working well as the other job runs
>> fine.
>> > >
>> > > Here is details about the configuration:
>> > >
>> > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a
>> switch)
>> > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
>> > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)
>> > >
>> > > -
>> > >
>> > > In order to fix this problem i recompiled openmpi without psm
>> support, but i faced an other problem:
>> > >
>> > > The OpenFabrics (openib) BTL failed to initialize while trying to
>> > > allocate some locked memory.  This typically can indicate that the
>> > > memlock limits are set too low.  For most HPC installations, the
>> > > memlock limits should be set to "unlimited".  The failure occured
>> > > here:
>> > >
>> > >   Local host:compute-0-6.local
>> > >   OMPI source:   btl_openib.c:329
>> > >   Function:  ibv_create_srq()
>> > >   Device:qib0
>> > >   Memlock limit: unlimited
>> > >
>> > >
>> > > ___
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> > --
>> > Jeff Squyres
>> > jsquy...@cisco.com
>> > For corporate legal information go to:
>> > http://www.cisco.com/web/about/doing_business/legal/cri/
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> 

Re: [OMPI users] Qlogic & openmpi

2011-12-05 Thread Ralph Castain

On Dec 5, 2011, at 5:49 AM, arnaud Heritier wrote:

> Hello,
> 
> I found the solution, thanks to Qlogic support.
> 
> The "can't open /dev/ipath, network down (err=26)" message from the ipath 
> driver is really misleading.
> 
> Actually, this is an hardware context problem on the Qlogic PSM. PSM can't 
> allocate any hardware context for the job because  other(s) MPI job(s) have 
> already used all available contexts. In order to avoid this problem, every 
> MPI jobs have to use the  PSM_SHAREDCONTEXTS_MAX variable set with the good 
> value, according to the number of processes that will run on the node. If we 
> don't use this variable, PSM will "greedily" use all contexts with the first 
> mpi job spawned on the node.

Sounds like we should be setting this value when starting the process - yes? If 
so, what is the "good" value, and how do we compute it?

> 
> Regards,
> 
> Arnaud
> 
> 
> On Tue, Nov 29, 2011 at 6:44 PM, Jeff Squyres  wrote:
> On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote:
> 
> > I do have a contract and i tried to open a case, but their support is ..
> 
> What happens if you put a delay between the two jobs?  E.g., if you just 
> delay a few seconds before the 2nd job starts?  Perhaps the ipath device just 
> needs a little time before it will be available...?  (that's a total guess)
> 
> I suggest this because the PSM device will definitely give you better overall 
> performance than the QLogic verbs support.  Their verbs support basically 
> barely works -- PSM is their primary device and the one that we always 
> recommend.
> 
> > Anyway. I'm stii working on the strange error message from mpirun saying it 
> > can't allocate memory when at the same time it also reports that the memory 
> > is unlimited ...
> >
> >
> > Arnaud
> >
> > On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres  wrote:
> > I'm afraid we don't have any contacts left at QLogic to ask them any 
> > more... do you have a support contract, perchance?
> >
> > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote:
> >
> > > Hello,
> > >
> > > I run into a stange problem with qlogic OFED and openmpi. When i submit 
> > > (through SGE) 2 jobs on the same node, the second job ends up with:
> > >
> > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)
> > >
> > > I'm pretty sure the infiniband is working well as the other job runs fine.
> > >
> > > Here is details about the configuration:
> > >
> > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a 
> > > switch)
> > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
> > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)
> > >
> > > -
> > >
> > > In order to fix this problem i recompiled openmpi without psm support, 
> > > but i faced an other problem:
> > >
> > > The OpenFabrics (openib) BTL failed to initialize while trying to
> > > allocate some locked memory.  This typically can indicate that the
> > > memlock limits are set too low.  For most HPC installations, the
> > > memlock limits should be set to "unlimited".  The failure occured
> > > here:
> > >
> > >   Local host:compute-0-6.local
> > >   OMPI source:   btl_openib.c:329
> > >   Function:  ibv_create_srq()
> > >   Device:qib0
> > >   Memlock limit: unlimited
> > >
> > >
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Qlogic & openmpi

2011-12-05 Thread arnaud Heritier
Hello,

I found the solution, thanks to Qlogic support.

The "can't open /dev/ipath, network down (err=26)" message from the ipath
driver is really misleading.

Actually, this is an hardware context problem on the Qlogic PSM. PSM can't
allocate any hardware context for the job because  other(s) MPI job(s) have
already used all available contexts. In order to avoid this problem, every
MPI jobs have to use the  PSM_SHAREDCONTEXTS_MAX variable set with the good
value, according to the number of processes that will run on the node. If
we don't use this variable, PSM will "greedily" use all contexts with the
first mpi job spawned on the node.

Regards,

Arnaud


On Tue, Nov 29, 2011 at 6:44 PM, Jeff Squyres  wrote:

> On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote:
>
> > I do have a contract and i tried to open a case, but their support is
> ..
>
> What happens if you put a delay between the two jobs?  E.g., if you just
> delay a few seconds before the 2nd job starts?  Perhaps the ipath device
> just needs a little time before it will be available...?  (that's a total
> guess)
>
> I suggest this because the PSM device will definitely give you better
> overall performance than the QLogic verbs support.  Their verbs support
> basically barely works -- PSM is their primary device and the one that we
> always recommend.
>
> > Anyway. I'm stii working on the strange error message from mpirun saying
> it can't allocate memory when at the same time it also reports that the
> memory is unlimited ...
> >
> >
> > Arnaud
> >
> > On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres 
> wrote:
> > I'm afraid we don't have any contacts left at QLogic to ask them any
> more... do you have a support contract, perchance?
> >
> > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote:
> >
> > > Hello,
> > >
> > > I run into a stange problem with qlogic OFED and openmpi. When i
> submit (through SGE) 2 jobs on the same node, the second job ends up with:
> > >
> > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)
> > >
> > > I'm pretty sure the infiniband is working well as the other job runs
> fine.
> > >
> > > Here is details about the configuration:
> > >
> > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a
> switch)
> > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
> > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)
> > >
> > > -
> > >
> > > In order to fix this problem i recompiled openmpi without psm support,
> but i faced an other problem:
> > >
> > > The OpenFabrics (openib) BTL failed to initialize while trying to
> > > allocate some locked memory.  This typically can indicate that the
> > > memlock limits are set too low.  For most HPC installations, the
> > > memlock limits should be set to "unlimited".  The failure occured
> > > here:
> > >
> > >   Local host:compute-0-6.local
> > >   OMPI source:   btl_openib.c:329
> > >   Function:  ibv_create_srq()
> > >   Device:qib0
> > >   Memlock limit: unlimited
> > >
> > >
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Qlogic & openmpi

2011-11-29 Thread Jeff Squyres
On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote:

> I do have a contract and i tried to open a case, but their support is ..

What happens if you put a delay between the two jobs?  E.g., if you just delay 
a few seconds before the 2nd job starts?  Perhaps the ipath device just needs a 
little time before it will be available...?  (that's a total guess)

I suggest this because the PSM device will definitely give you better overall 
performance than the QLogic verbs support.  Their verbs support basically 
barely works -- PSM is their primary device and the one that we always 
recommend.

> Anyway. I'm stii working on the strange error message from mpirun saying it 
> can't allocate memory when at the same time it also reports that the memory 
> is unlimited ...
> 
> 
> Arnaud
> 
> On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres  wrote:
> I'm afraid we don't have any contacts left at QLogic to ask them any more... 
> do you have a support contract, perchance?
> 
> On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote:
> 
> > Hello,
> >
> > I run into a stange problem with qlogic OFED and openmpi. When i submit 
> > (through SGE) 2 jobs on the same node, the second job ends up with:
> >
> > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)
> >
> > I'm pretty sure the infiniband is working well as the other job runs fine.
> >
> > Here is details about the configuration:
> >
> > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a switch)
> > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
> > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)
> >
> > -
> >
> > In order to fix this problem i recompiled openmpi without psm support, but 
> > i faced an other problem:
> >
> > The OpenFabrics (openib) BTL failed to initialize while trying to
> > allocate some locked memory.  This typically can indicate that the
> > memlock limits are set too low.  For most HPC installations, the
> > memlock limits should be set to "unlimited".  The failure occured
> > here:
> >
> >   Local host:compute-0-6.local
> >   OMPI source:   btl_openib.c:329
> >   Function:  ibv_create_srq()
> >   Device:qib0
> >   Memlock limit: unlimited
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Qlogic & openmpi

2011-11-28 Thread arnaud Heritier
I do have a contract and i tried to open a case, but their support is
..Anyway. I'm stii working on the strange error message from mpirun
saying it can't allocate memory when at the same time it also reports that
the memory is unlimited ...


Arnaud

On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres  wrote:

> I'm afraid we don't have any contacts left at QLogic to ask them any
> more... do you have a support contract, perchance?
>
> On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote:
>
> > Hello,
> >
> > I run into a stange problem with qlogic OFED and openmpi. When i submit
> (through SGE) 2 jobs on the same node, the second job ends up with:
> >
> > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)
> >
> > I'm pretty sure the infiniband is working well as the other job runs
> fine.
> >
> > Here is details about the configuration:
> >
> > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a
> switch)
> > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
> > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)
> >
> > -
> >
> > In order to fix this problem i recompiled openmpi without psm support,
> but i faced an other problem:
> >
> > The OpenFabrics (openib) BTL failed to initialize while trying to
> > allocate some locked memory.  This typically can indicate that the
> > memlock limits are set too low.  For most HPC installations, the
> > memlock limits should be set to "unlimited".  The failure occured
> > here:
> >
> >   Local host:compute-0-6.local
> >   OMPI source:   btl_openib.c:329
> >   Function:  ibv_create_srq()
> >   Device:qib0
> >   Memlock limit: unlimited
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Qlogic & openmpi

2011-11-27 Thread Arnaud Heritier
Hello,

I run into a stange problem with qlogic OFED and openmpi. When i submit
(through SGE) 2 jobs on the same node, the second job ends up with:

(ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)

I'm pretty sure the infiniband is working well as the other job runs fine.

Here is details about the configuration:

Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a switch)
qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)

-

In order to fix this problem i recompiled openmpi without psm support, but
i faced an other problem:

The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:compute-0-6.local
  OMPI source:   btl_openib.c:329
  Function:  ibv_create_srq()
  Device:qib0
  Memlock limit: *unlimited*