Re: [OMPI devel] opal_fifo SEGV from master

2015-07-01 Thread Paul Hargrove
Nathan, Last night's master tarball is still producing a SEGV in opal_fifo on the same Scientific Linux 7.x x86-64 VM as I reported in Feb. Reproducing the SEGV under gdb yields: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x75bb1700 (LWP 16242)] 0x00401

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Paul Hargrove
Nathan, Just FYI: Both systems where I've seen this failure are VMs on a well-loaded server. So, the instruction interleaving (for reproducing races) is likely a bit different than what you would see on ones own laptop or workstation. Also, I don't see the SEGV in every run, but to reproduce it i

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Nathan Hjelm
Yes, seriously. This code is still undergoing testing which is part of the reason it is on master. Once I am confident in the code I will be updating some on my code to use a fifo instead of an opal_list_t and a lock. I don't know if the barrier will make a difference but it is the only place I c

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread George Bosilca
Seriously? George. On Thu, Feb 12, 2015 at 1:00 PM, Nathan Hjelm wrote: > > I think I see the issue. Looks like there is a missing memory barrier > after the head consistency code. I will add one and see if that fixes > your problem. > > BTW, I can't reproduce the issue on any of my systems

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Nathan Hjelm
I think I see the issue. Looks like there is a missing memory barrier after the head consistency code. I will add one and see if that fixes your problem. BTW, I can't reproduce the issue on any of my systems :-/. -Nathan On Thu, Feb 12, 2015 at 02:07:08AM -0800, Paul Hargrove wrote: >Just e

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Paul Hargrove
Just experienced the same failure as below with openmpi-dev-904-g08dceda build with "gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16)" on Scientific Linux 7.x (a RHEL 7 clone). gdb says: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x753b0700 (LWP 19685)] 0x004

[OMPI devel] opal_fifo SEGV from master

2015-02-06 Thread Paul Hargrove
Yes, this time I really mean "fifo", not "lifo". ;-) With last night's master tarball (Open MPI dev-845-ga3275aa) configured with only --prefix and --enable-debug A Linux-86-64 system running debian Wheezy and compiler = "gcc (Debian 4.7.2-5) 4.7.2" Failure from "make check": /home/phargrov/OMP