On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote:

> I do have a contract and i tried to open a case, but their support is ......

What happens if you put a delay between the two jobs?  E.g., if you just delay 
a few seconds before the 2nd job starts?  Perhaps the ipath device just needs a 
little time before it will be available...?  (that's a total guess)

I suggest this because the PSM device will definitely give you better overall 
performance than the QLogic verbs support.  Their verbs support basically 
barely works -- PSM is their primary device and the one that we always 
recommend.

> Anyway. I'm stii working on the strange error message from mpirun saying it 
> can't allocate memory when at the same time it also reports that the memory 
> is unlimited ...
> 
> 
> Arnaud
> 
> On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
> I'm afraid we don't have any contacts left at QLogic to ask them any more... 
> do you have a support contract, perchance?
> 
> On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote:
> 
> > Hello,
> >
> > I run into a stange problem with qlogic OFED and openmpi. When i submit 
> > (through SGE) 2 jobs on the same node, the second job ends up with:
> >
> > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)
> >
> > I'm pretty sure the infiniband is working well as the other job runs fine.
> >
> > Here is details about the configuration:
> >
> > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a switch)
> > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
> > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)
> >
> > -------------
> >
> > In order to fix this problem i recompiled openmpi without psm support, but 
> > i faced an other problem:
> >
> > The OpenFabrics (openib) BTL failed to initialize while trying to
> > allocate some locked memory.  This typically can indicate that the
> > memlock limits are set too low.  For most HPC installations, the
> > memlock limits should be set to "unlimited".  The failure occured
> > here:
> >
> >   Local host:    compute-0-6.local
> >   OMPI source:   btl_openib.c:329
> >   Function:      ibv_create_srq()
> >   Device:        qib0
> >   Memlock limit: unlimited
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to