Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-10 Thread Fischer, Greg A.
, June 09, 2014 8:24 PM To: Open MPI Users Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c I'm digging out from mail backlog from being at the MPI Forum last week... Yes, from looking at the stack traces, it's segv'ing inside the memory allocator, which typically means some

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-09 Thread Jeff Squyres (jsquyres)
gt;> Greg >> >> -Original Message- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain >> Sent: Wednesday, June 04, 2014 4:48 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] intermittent segfaults with openib on ring_

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
Original Message- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, June 04, 2014 4:48 PM > To: Open MPI Users > Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c > > Urggg...unfortunately, the people who

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, June 04, 2014 4:48 PM To: Open MPI Users Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c Urggg...unfortunately, the people who know the most about that code are all at the MPI Forum

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
/libc.so.6 > #13 0x in ?? () > > Greg > > -Original Message----- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, June 04, 2014 3:49 PM > To: Open MPI Users > Subject: Re: [OMPI users] intermittent segfaults

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
.c:146 >>>> #11 0x2b48f26935ab in orte_init (pargc=0x2b48f6300020, >>>> pargv=0x2b48f63000b8, flags=8) at >>>> ../../openmpi-1.8.1/orte/runtime/orte_init.c:148 >>>> #12 0x2b48f1739d38 in ompi_mpi_init (argc=1, >>>> argv=0x7fffebf0d

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
r, Greg A. >>>> <fisch...@westinghouse.com <mailto:fisch...@westinghouse.com>> wrote: >>>> >>>> >>>> Oops, ulimit was set improperly. I generated a core file, loaded it in >>>> GDB, and ran a backtrace: >>>> C

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
ime/orte_init.c:148 >>> #12 0x2b48f1739d38 in ompi_mpi_init (argc=1, argv=0x7fffebf0d1f8, >>> requested=8, provided=0x0) at >>> ../../openmpi-1.8.1/ompi/runtime/ompi_mpi_init.c:464 >>> #13 0x2b48f1760a37 in PMPI_Init (argc=0x2b48f6300020, >>>

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Gus Correa
0x2b8e4fd00020, bytes=47890224382136) at ../../../../../openmpi-1.8.1/opal/mca/memory/linux/malloc.c:4098 #1 0x in ?? () Is that helpful? Greg *From:*Fischer, Greg A. *Sent:*Wednesday, June 04, 2014 10:17 AM *To:*'Open MPI Users' *Cc:*Fischer, Greg A. *Subject:*RE: [OMPI users] intermittent segfaults

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
> [binf112:05845] [16] ring_c[0x4024ef] > [binf112:05845] [17] /lib64/libc.so.6(__libc_start_main+0xe6)[0x2b2fa4906c36] > [binf112:05845] [18] ring_c[0x4023f9] > [binf112:05845] *** End of error message *** > ------ > mpirun noticed that process rank

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
tes=47890224382136) at ../../../../../openmpi-1.8.1/opal/mca/memory/linux/malloc.c:4098 #1 0x in ?? () Is that helpful? Greg From: Fischer, Greg A. Sent: Wednesday, June 04, 2014 10:17 AM To: 'Open MPI Users' Cc: Fischer, Greg A. Subject: RE: [OMPI users] intermittent segfaults with ope

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
gt; ../../../../../openmpi-1.8.1/opal/mca/memory/linux/malloc.c:4098 > #1 0x in ?? () > > Is that helpful? > > Greg > > From: Fischer, Greg A. > Sent: Wednesday, June 04, 2014 10:17 AM > To: 'Open MPI Users' > Cc: Fischer, Greg A. > Subject: R

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
rom: Fischer, Greg A. Sent: Wednesday, June 04, 2014 10:17 AM To: 'Open MPI Users' Cc: Fischer, Greg A. Subject: RE: [OMPI users] intermittent segfaults with openib on ring_c.c I recompiled with "-enable-debug" but it doesn't seem to be providing any more information or a core dump.

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
On Behalf Of Ralph Castain Sent: Tuesday, June 03, 2014 11:54 PM To: Open MPI Users Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c Sounds odd - can you configure OMPI --enable-debug and run it again? If it fails and you can get a core dump, could you tell us the line nu

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
Sounds odd - can you configure OMPI --enable-debug and run it again? If it fails and you can get a core dump, could you tell us the line number where it is failing? On Jun 3, 2014, at 9:58 AM, Fischer, Greg A. wrote: > Apologies – I forgot to add some of the

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-03 Thread Fischer, Greg A.
Apologies - I forgot to add some of the information requested by the FAQ: 1. OpenFabrics is provided by the Linux distribution: [binf102:fischega] $ rpm -qa | grep ofed ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5 ofed-1.5.4.1-0.11.5 ofed-doc-1.5.4.1-0.11.5 2. Linux Distro /

[OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-03 Thread Fischer, Greg A.
Hello openmpi-users, I'm running into a perplexing problem on a new system, whereby I'm experiencing intermittent segmentation faults when I run the ring_c.c example and use the openib BTL. See an example below. Approximately 50% of the time it provides the expected output, but the other 50%