ryone!
_
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa
Sent: Wednesday, June 11, 2014 7:13 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
If that could help Greg,
on the compute nodes I normally add this to /etc/security/limits
> -Original Message-
> From: Fischer, Greg A.
> Sent: Tuesday, June 10, 2014 2:59 PM
> To: Nathan Hjelm
> Cc: Open MPI Users; Fischer, Greg A.
> Subject: RE: [OMPI users] openib segfaults with Torque
>
> [binf316:fischega] $ ulim
quyres)
> >> > wrote:
> >> >
> >> > Mellanox --
> >> >
> >> > What would cause a CQ to fail to be created?
> >> >
> >> > On Jun 11, 2014, at 3:42 PM, "Fischer, Greg A."
>
ed, Jun 11, 2014 at 4:04 PM, Jeff Squyres (jsquyres)
>> > wrote:
>> >
>> > Mellanox --
>> >
>> > What would cause a CQ to fail to be created?
>> >
>> > On Jun 11, 2014, at 3:42 PM, "Fischer, Greg A."
>> > wr
g A."
> > wrote:
> >
> > > Is there any other work around that I might try? Something that
> > avoids UDCM?
> > >
> > > -Original Message-
> > > From: Fischer, Greg A.
> > > Sent: Tuesday, June 1
OMPI users] openib segfaults with Torque
> > >
> > > [binf316:fischega] $ ulimit -m
> > > unlimited
> > >
> > > Greg
> > >
> > > -Original Message-
> > > From: Nathan Hjelm [mailto:h
PM, "Fischer, Greg A."
> wrote:
>
> > Is there any other work around that I might try? Something that
> avoids UDCM?
> >
> > -Original Message-
> > From: Fischer, Greg A.
> > Sent: Tue
mething that avoids
> UDCM?
> >
> > -Original Message-
> > From: Fischer, Greg A.
> > Sent: Tuesday, June 10, 2014 2:59 PM
> > To: Nathan Hjelm
> > Cc: Open MPI Users; Fischer, Greg A.
> > Subject: RE: [OMPI users] openib segfaults with Torqu
une 10, 2014 2:59 PM
> To: Nathan Hjelm
> Cc: Open MPI Users; Fischer, Greg A.
> Subject: RE: [OMPI users] openib segfaults with Torque
>
> [binf316:fischega] $ ulimit -m
> unlimited
>
> Greg
>
> -Original Message-
> From: Nathan Hjelm [mailto:hje...@lanl
Is there any other work around that I might try? Something that avoids UDCM?
-Original Message-
From: Fischer, Greg A.
Sent: Tuesday, June 10, 2014 2:59 PM
To: Nathan Hjelm
Cc: Open MPI Users; Fischer, Greg A.
Subject: RE: [OMPI users] openib segfaults with Torque
[binf316:fischega
[binf316:fischega] $ ulimit -m
unlimited
Greg
-Original Message-
From: Nathan Hjelm [mailto:hje...@lanl.gov]
Sent: Tuesday, June 10, 2014 2:58 PM
To: Fischer, Greg A.
Cc: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Out of curiosity what is the mlock limit on
reg A.
> Cc: Open MPI Users
> Subject: Re: [OMPI users] openib segfaults with Torque
>
>
> Well, thats interesting. The output shows that ibv_create_cq is failing.
> Strange since an identical call had just succeeded (udcm creates two
> completion queues). Some questions th
Greg A.
Cc: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Well, thats interesting. The output shows that ibv_create_cq is failing.
Strange since an identical call had just succeeded (udcm creates two completion
queues). Some questions that might indicate where the failur
m: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres
> (jsquyres)
> Sent: Tuesday, June 10, 2014 10:31 AM
> To: Nathan Hjelm
> Cc: Open MPI Users
> Subject: Re: [OMPI users] openib segfaults with Torque
>
> Greg:
>
> Can you run with "--mca btl
me know if I can provide anything else.
Thanks for looking into this,
Greg
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
Sent: Tuesday, June 10, 2014 10:31 AM
To: Nathan Hjelm
Cc: Open MPI Users
Subject: Re: [OMPI users] openib segf
Greg:
Can you run with "--mca btl_base_verbose 100" on your debug build so that we
can get some additional output to see why UDCM is failing to setup properly?
On Jun 10, 2014, at 10:25 AM, Nathan Hjelm wrote:
> On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres (jsquyres) wrote:
>> I s
On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres (jsquyres) wrote:
> I seem to recall that you have an IB-based cluster, right?
>
> From a *very quick* glance at the code, it looks like this might be a simple
> incorrect-finalization issue. That is:
>
> - you run the job on a single serve
I seem to recall that you have an IB-based cluster, right?
>From a *very quick* glance at the code, it looks like this might be a simple
>incorrect-finalization issue. That is:
- you run the job on a single server
- openib disqualifies itself because you're running on a single server
- openib t
Process 0 exiting
> Process 1 exiting
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Friday, June 06, 2014 10:34 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] openib segfaults with Torque
>
> Huh - how strange. I can't imag
Ralph Castain
Sent: Friday, June 06, 2014 10:34 AM
To: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Huh - how strange. I can't imagine what it has to do with Torque vs rsh - this
is failing when the openib BTL is trying to create the connection, which comes
way afte
nf316:21583] [16] /lib64/libc.so.6(__libc_start_main+0xe6)[0x7f3b58301c36]
> [binf316:21583] [17] ring_c[0x400889]
> [binf316:21583] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 21583 on node 316 exited on
> signal 6 (Abo
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, June 05, 2014 7:57 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Hmmm...I'm not sure how that is going to run with only one proc (I don't know
if the program is pro
Hmmm...I'm not sure how that is going to run with only one proc (I don't know
if the program is protected against that scenario). If you run with -np 2 -mca
btl openib,sm,self, is it happy?
On Jun 5, 2014, at 2:16 PM, Fischer, Greg A. wrote:
> Here’s the command I’m invoking and the terminal
Here's the command I'm invoking and the terminal output. (Some of this
information doesn't appear to be captured in the backtrace.)
[binf316:fischega] $ mpirun -np 1 -mca btl openib,self ring_c
ring_c:
../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:734:
udcm
OpenMPI Users,
After encountering difficulty with the Intel compilers (see the "intermittent
segfaults with openib on ring_c.c" thread), I installed GCC-4.8.3 and
recompiled OpenMPI. I ran the simple examples (ring, etc.) with the openib BTL
in a typical BASH environment. Everything appeared to
25 matches
Mail list logo