Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
A gcc-based build is fine.

So, I think this is similar to issue #3992
 in which we seem to have
decided that /usr/bin/cc (clang) is not to be trusted on this platform.

-Paul

On Wed, Aug 30, 2017 at 4:49 PM, Paul Hargrove  wrote:

> Ralph,
>
> See my response to Larry.  The impossibly large value was a figment of
> gdb's imagination.
>
> This system has worked for Open MPI when it was still at 11.0.
> I cannot say if the current problem is w/ FreeBSD-11.1 (e.g. its compiler)
> or with Open MPI.
>
> I am trying a gcc-based build now.
>
> -Pau
>
>
> On Wed, Aug 30, 2017 at 4:22 PM, r...@open-mpi.org 
> wrote:
>
>> Yeah, that caught my eye too as that is impossibly large. We only have a
>> handful of active queues - looks to me like there is some kind of alignment
>> issue.
>>
>> Paul - has this configuration worked with prior versions of OMPI? Or is
>> this something new?
>>
>> Ralph
>>
>> On Aug 30, 2017, at 4:17 PM, Larry Baker  wrote:
>>
>> Paul,
>>
>> (gdb) print base->nactivequeues
>>
>>
>> seems like an extraordinarily large number to me.  I don't know what the
>> implications are of the --enable-debug clang option is.  Any chance the
>> SEGFAULT is a debugging trap when an uninitialized value is encountered?
>>
>> The other thought I had is an alignment trap if, for example,
>> nactivequeues is a 64-bit int but is not 64-bit aligned.  As far as I can
>> tell, nactivequeues is a plain int.  But, what that is on FreeBSD/amd64, I
>> do not know.
>>
>> Should there be more information in dmesg or a system log file with the
>> trap code so you can identify whether it is an instruction fetch (VERY
>> unlikely), an operand fetch, or a store that caused the trap?
>>
>> Larry Baker
>> US Geological Survey
>> 650-329-5608 <(650)%20329-5608>
>> ba...@usgs.gov
>>
>>
>>
>> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove  wrote:
>>
>> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>>--prefix=[...] --enable-debug CC=clang CXX=clang++
>> --disable-mpi-fortran --with-hwloc=/usr/local
>>
>> The CC/CXX setting are to use the system default compilers (rather than
>> gcc/g++ in /usr/local/bin).
>> The --with-hwloc is to avoid issue #3992
>>  (though I have not
>> determined if that impacts this RC).
>>
>> When running ring_c I get a SEGV from orterun, for which a gdb backtrace
>> is given below.
>> The one surprising thing (highlighted) in the backtrace is that both the
>> RHS and LHS of the assignment appear to be valid memory locations.
>> So, if the backtrace is accurate then I am at a loss as to why a SEGV
>> occurs.
>>
>> -Paul
>>
>>
>> Program terminated with signal 11, Segmentation fault.
>> [...]
>> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base=> optimized out>, fd=,
>> events=2, callback=, arg=0x0)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
>> 1779ev->ev_pri = base->nactivequeues / 2;
>> (gdb) print base->nactivequeues
>> $3 = 106201992
>> (gdb) print ev->ev_pri
>> $4 = 0 '\0'
>> (gdb) where
>> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base=> optimized out>, fd=,
>> events=2, callback=, arg=0x0)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
>> #1  0x0008062e1fd2 in pmix_start_progress_thread ()
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
>> #2  0x0008063047e4 in PMIx_server_init (module=0x806545be8,
>> info=0x802e16a00, ninfo=2)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
>> #3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0,
>> info=0x7fffe290)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
>> #4  0x000800889f43 in pmix_server_init ()
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/orte/orted/pmix/pmix_server.c:261
>> #5  0x000803e22d87 in rte_init ()
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
>> #6  0x00080084a45e in orte_init (pargc=0x7fffe988,
>> pargv=0x7fffe980, flags=4)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/orte/runtime/orte_init.c:226
>> #7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/orte/tools/orterun/orterun.c:831
>> #8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
>> at 

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
Ralph,

See my response to Larry.  The impossibly large value was a figment of
gdb's imagination.

This system has worked for Open MPI when it was still at 11.0.
I cannot say if the current problem is w/ FreeBSD-11.1 (e.g. its compiler)
or with Open MPI.

I am trying a gcc-based build now.

-Pau


On Wed, Aug 30, 2017 at 4:22 PM, r...@open-mpi.org  wrote:

> Yeah, that caught my eye too as that is impossibly large. We only have a
> handful of active queues - looks to me like there is some kind of alignment
> issue.
>
> Paul - has this configuration worked with prior versions of OMPI? Or is
> this something new?
>
> Ralph
>
> On Aug 30, 2017, at 4:17 PM, Larry Baker  wrote:
>
> Paul,
>
> (gdb) print base->nactivequeues
>
>
> seems like an extraordinarily large number to me.  I don't know what the
> implications are of the --enable-debug clang option is.  Any chance the
> SEGFAULT is a debugging trap when an uninitialized value is encountered?
>
> The other thought I had is an alignment trap if, for example,
> nactivequeues is a 64-bit int but is not 64-bit aligned.  As far as I can
> tell, nactivequeues is a plain int.  But, what that is on FreeBSD/amd64, I
> do not know.
>
> Should there be more information in dmesg or a system log file with the
> trap code so you can identify whether it is an instruction fetch (VERY
> unlikely), an operand fetch, or a store that caused the trap?
>
> Larry Baker
> US Geological Survey
> 650-329-5608 <(650)%20329-5608>
> ba...@usgs.gov
>
>
>
> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove  wrote:
>
> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>--prefix=[...] --enable-debug CC=clang CXX=clang++
> --disable-mpi-fortran --with-hwloc=/usr/local
>
> The CC/CXX setting are to use the system default compilers (rather than
> gcc/g++ in /usr/local/bin).
> The --with-hwloc is to avoid issue #3992
>  (though I have not
> determined if that impacts this RC).
>
> When running ring_c I get a SEGV from orterun, for which a gdb backtrace
> is given below.
> The one surprising thing (highlighted) in the backtrace is that both the
> RHS and LHS of the assignment appear to be valid memory locations.
> So, if the backtrace is accurate then I am at a loss as to why a SEGV
> occurs.
>
> -Paul
>
>
> Program terminated with signal 11, Segmentation fault.
> [...]
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> 1779ev->ev_pri = base->nactivequeues / 2;
> (gdb) print base->nactivequeues
> $3 = 106201992
> (gdb) print ev->ev_pri
> $4 = 0 '\0'
> (gdb) where
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> #1  0x0008062e1fd2 in pmix_start_progress_thread ()
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
> #2  0x0008063047e4 in PMIx_server_init (module=0x806545be8,
> info=0x802e16a00, ninfo=2)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
> #3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0,
> info=0x7fffe290)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
> #4  0x000800889f43 in pmix_server_init ()
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261
> #5  0x000803e22d87 in rte_init ()
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
> #6  0x00080084a45e in orte_init (pargc=0x7fffe988,
> pargv=0x7fffe980, flags=4)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/runtime/orte_init.c:226
> #7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831
> #8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/tools/orterun/main.c:13
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
> ___
> devel mailing list
> 

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
Larry,

Thanks for the suggestions.

The system logs show only
   Aug 30 14:16:06 freebsd-amd64 kernel: pid 95624 (orterun), uid 19214:
exited on signal 11 (core dumped)

However, while "nactivequeues" is a 4-byte integer it *does* appear to be
misaligned (suggesting to me that "base" is bogus).

(gdb) print sizeof(base->nactivequeues)
$1 = 4
(gdb) print >nactivequeues
$2 = (int *) 0x100f7

Digging deeper it looks like the "base" being used by gdb is bogus.
Going up one stack frame to the caller, we see a totally different base
being passed:

(gdb) up
#1  0x0008062e1fd2 in pmix_start_progress_thread ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
83  event_assign(_ev, ev_base, block_pipe[0],
(gdb) print ev_base
$6 = (pmix_event_base_t *) 0x2ec9500

I am distrusting gdb on this system.

Here is what lldb says:

(lldb) bt
* thread #1, name = 'orterun', stop reason = signal SIGSEGV
  * frame #0:
libopen-pal.so.19`opal_libevent2022_event_assign(ev=0x0008065482c0,
base=, fd=, events=2, callback=,
arg=0x) at event.c:1779
frame #1: mca_pmix_pmix112.so`pmix_start_progress_thread at
progress_threads.c:83
frame #2:
mca_pmix_pmix112.so`PMIx_server_init(module=0x000806545be8,
info=0x000802e16a00, ninfo=2) at pmix_server.c:310
frame #3:
mca_pmix_pmix112.so`pmix1_server_init(module=0x000800b106a0,
info=0x7fffe290) at pmix1_server_south.c:140
frame #4: libopen-rte.so.19`pmix_server_init at pmix_server.c:261
frame #5: mca_ess_hnp.so`rte_init at ess_hnp_module.c:666
frame #6: libopen-rte.so.19`orte_init(pargc=0x7fffe988,
pargv=0x7fffe980, flags=4) at orte_init.c:226
frame #7: orterun`orterun(argc=7, argv=0x7fffea18) at
orterun.c:831
frame #8: orterun`main(argc=7, argv=0x7fffea18) at main.c:13
frame #9: 0x00403a9f orterun`_start + 383
(lldb) up
frame #1: mca_pmix_pmix112.so`pmix_start_progress_thread at
progress_threads.c:83
   80   event_base_free(ev_base);
   81   return NULL;
   82   }
-> 83   event_assign(_ev, ev_base, block_pipe[0],
   84EV_READ, wakeup, NULL);
   85   event_add(_ev, 0);
   86   evlib_active = true;
(lldb) print ev_base
(pmix_event_base_t *) $2 = 0x02ec9500
(lldb) print *ev_base
error: Couldn't apply expression side effects : Couldn't dematerialize a
result variable: couldn't read its memory

So, it looks like the SEGV is due to a bad 2nd argument to event_assign().

-Paul

On Wed, Aug 30, 2017 at 4:17 PM, Larry Baker  wrote:

> Paul,
>
> (gdb) print base->nactivequeues
>
>
> seems like an extraordinarily large number to me.  I don't know what the
> implications are of the --enable-debug clang option is.  Any chance the
> SEGFAULT is a debugging trap when an uninitialized value is encountered?
>
> The other thought I had is an alignment trap if, for example,
> nactivequeues is a 64-bit int but is not 64-bit aligned.  As far as I can
> tell, nactivequeues is a plain int.  But, what that is on FreeBSD/amd64, I
> do not know.
>
> Should there be more information in dmesg or a system log file with the
> trap code so you can identify whether it is an instruction fetch (VERY
> unlikely), an operand fetch, or a store that caused the trap?
>
> Larry Baker
> US Geological Survey
> 650-329-5608 <(650)%20329-5608>
> ba...@usgs.gov
>
>
>
> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove  wrote:
>
> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>--prefix=[...] --enable-debug CC=clang CXX=clang++
> --disable-mpi-fortran --with-hwloc=/usr/local
>
> The CC/CXX setting are to use the system default compilers (rather than
> gcc/g++ in /usr/local/bin).
> The --with-hwloc is to avoid issue #3992
>  (though I have not
> determined if that impacts this RC).
>
> When running ring_c I get a SEGV from orterun, for which a gdb backtrace
> is given below.
> The one surprising thing (highlighted) in the backtrace is that both the
> RHS and LHS of the assignment appear to be valid memory locations.
> So, if the backtrace is accurate then I am at a loss as to why a SEGV
> occurs.
>
> -Paul
>
>
> Program terminated with signal 11, Segmentation fault.
> [...]
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> 1779ev->ev_pri = base->nactivequeues / 2;
> (gdb) print base->nactivequeues
> $3 = 106201992
> (gdb) print ev->ev_pri
> $4 = 0 '\0'
> (gdb) where
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> 

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread r...@open-mpi.org
Yeah, that caught my eye too as that is impossibly large. We only have a 
handful of active queues - looks to me like there is some kind of alignment 
issue.

Paul - has this configuration worked with prior versions of OMPI? Or is this 
something new?

Ralph

> On Aug 30, 2017, at 4:17 PM, Larry Baker  wrote:
> 
> Paul,
> 
>> (gdb) print base->nactivequeues
> 
> 
> seems like an extraordinarily large number to me.  I don't know what the 
> implications are of the --enable-debug clang option is.  Any chance the 
> SEGFAULT is a debugging trap when an uninitialized value is encountered?
> 
> The other thought I had is an alignment trap if, for example, nactivequeues 
> is a 64-bit int but is not 64-bit aligned.  As far as I can tell, 
> nactivequeues is a plain int.  But, what that is on FreeBSD/amd64, I do not 
> know.
> 
> Should there be more information in dmesg or a system log file with the trap 
> code so you can identify whether it is an instruction fetch (VERY unlikely), 
> an operand fetch, or a store that caused the trap?
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov 
> 
> 
> 
>> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove > > wrote:
>> 
>> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>>--prefix=[...] --enable-debug CC=clang CXX=clang++ --disable-mpi-fortran 
>> --with-hwloc=/usr/local
>> 
>> The CC/CXX setting are to use the system default compilers (rather than 
>> gcc/g++ in /usr/local/bin).
>> The --with-hwloc is to avoid issue #3992 
>>  (though I have not determined 
>> if that impacts this RC).
>> 
>> When running ring_c I get a SEGV from orterun, for which a gdb backtrace is 
>> given below.
>> The one surprising thing (highlighted) in the backtrace is that both the RHS 
>> and LHS of the assignment appear to be valid memory locations.
>> So, if the backtrace is accurate then I am at a loss as to why a SEGV occurs.
>> 
>> -Paul
>> 
>> 
>> Program terminated with signal 11, Segmentation fault.
>> [...]
>> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base=> out>, fd=,
>> events=2, callback=, arg=0x0)
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
>> 1779ev->ev_pri = base->nactivequeues / 2;
>> (gdb) print base->nactivequeues
>> $3 = 106201992
>> (gdb) print ev->ev_pri
>> $4 = 0 '\0'
>> (gdb) where
>> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base=> out>, fd=,
>> events=2, callback=, arg=0x0)
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
>> #1  0x0008062e1fd2 in pmix_start_progress_thread ()
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
>> #2  0x0008063047e4 in PMIx_server_init (module=0x806545be8, 
>> info=0x802e16a00, ninfo=2)
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
>> #3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0, 
>> info=0x7fffe290)
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
>> #4  0x000800889f43 in pmix_server_init ()
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261
>> #5  0x000803e22d87 in rte_init ()
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
>> #6  0x00080084a45e in orte_init (pargc=0x7fffe988, 
>> pargv=0x7fffe980, flags=4)
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/runtime/orte_init.c:226
>> #7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831
>> #8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
>> at 
>> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/main.c:13
>> 
>> 
>> 
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov 
>> 
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org 
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> 

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Larry Baker
Paul,

> (gdb) print base->nactivequeues


seems like an extraordinarily large number to me.  I don't know what the 
implications are of the --enable-debug clang option is.  Any chance the 
SEGFAULT is a debugging trap when an uninitialized value is encountered?

The other thought I had is an alignment trap if, for example, nactivequeues is 
a 64-bit int but is not 64-bit aligned.  As far as I can tell, nactivequeues is 
a plain int.  But, what that is on FreeBSD/amd64, I do not know.

Should there be more information in dmesg or a system log file with the trap 
code so you can identify whether it is an instruction fetch (VERY unlikely), an 
operand fetch, or a store that caused the trap?

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove  wrote:
> 
> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>--prefix=[...] --enable-debug CC=clang CXX=clang++ --disable-mpi-fortran 
> --with-hwloc=/usr/local
> 
> The CC/CXX setting are to use the system default compilers (rather than 
> gcc/g++ in /usr/local/bin).
> The --with-hwloc is to avoid issue #3992 
>  (though I have not determined 
> if that impacts this RC).
> 
> When running ring_c I get a SEGV from orterun, for which a gdb backtrace is 
> given below.
> The one surprising thing (highlighted) in the backtrace is that both the RHS 
> and LHS of the assignment appear to be valid memory locations.
> So, if the backtrace is accurate then I am at a loss as to why a SEGV occurs.
> 
> -Paul
> 
> 
> Program terminated with signal 11, Segmentation fault.
> [...]
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> 1779ev->ev_pri = base->nactivequeues / 2;
> (gdb) print base->nactivequeues
> $3 = 106201992
> (gdb) print ev->ev_pri
> $4 = 0 '\0'
> (gdb) where
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> #1  0x0008062e1fd2 in pmix_start_progress_thread ()
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
> #2  0x0008063047e4 in PMIx_server_init (module=0x806545be8, 
> info=0x802e16a00, ninfo=2)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
> #3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0, 
> info=0x7fffe290)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
> #4  0x000800889f43 in pmix_server_init ()
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261
> #5  0x000803e22d87 in rte_init ()
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
> #6  0x00080084a45e in orte_init (pargc=0x7fffe988, 
> pargv=0x7fffe980, flags=4)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/runtime/orte_init.c:226
> #7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831
> #8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/main.c:13
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov 
> 
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
   --prefix=[...] --enable-debug CC=clang CXX=clang++ --disable-mpi-fortran
--with-hwloc=/usr/local

The CC/CXX setting are to use the system default compilers (rather than
gcc/g++ in /usr/local/bin).
The --with-hwloc is to avoid issue #3992
 (though I have not
determined if that impacts this RC).

When running ring_c I get a SEGV from orterun, for which a gdb backtrace is
given below.
The one surprising thing (highlighted) in the backtrace is that both the
RHS and LHS of the assignment appear to be valid memory locations.
So, if the backtrace is accurate then I am at a loss as to why a SEGV
occurs.

-Paul


Program terminated with signal 11, Segmentation fault.
[...]
#0  opal_libevent2022_event_assign (ev=0x8065482c0, base=, fd=,
events=2, callback=, arg=0x0)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
1779ev->ev_pri = base->nactivequeues / 2;
(gdb) print base->nactivequeues
$3 = 106201992
(gdb) print ev->ev_pri
$4 = 0 '\0'
(gdb) where
#0  opal_libevent2022_event_assign (ev=0x8065482c0, base=, fd=,
events=2, callback=, arg=0x0)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
#1  0x0008062e1fd2 in pmix_start_progress_thread ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
#2  0x0008063047e4 in PMIx_server_init (module=0x806545be8,
info=0x802e16a00, ninfo=2)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
#3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0,
info=0x7fffe290)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
#4  0x000800889f43 in pmix_server_init ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261
#5  0x000803e22d87 in rte_init ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
#6  0x00080084a45e in orte_init (pargc=0x7fffe988,
pargv=0x7fffe980, flags=4)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/runtime/orte_init.c:226
#7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831
#8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/main.c:13



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel