Nathan,
Last night's master tarball is still producing a SEGV in opal_fifo on the
same Scientific Linux 7.x x86-64 VM as I reported in Feb.
Reproducing the SEGV under gdb yields:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x75bb1700 (LWP 16242)]
0x00401
Nathan,
Just FYI: Both systems where I've seen this failure are VMs on a
well-loaded server.
So, the instruction interleaving (for reproducing races) is likely a bit
different than what you would see on ones own laptop or workstation. Also,
I don't see the SEGV in every run, but to reproduce it i
Yes, seriously. This code is still undergoing testing which is part of
the reason it is on master. Once I am confident in the code I will be
updating some on my code to use a fifo instead of an opal_list_t and a
lock.
I don't know if the barrier will make a difference but it is the only
place I c
Seriously?
George.
On Thu, Feb 12, 2015 at 1:00 PM, Nathan Hjelm wrote:
>
> I think I see the issue. Looks like there is a missing memory barrier
> after the head consistency code. I will add one and see if that fixes
> your problem.
>
> BTW, I can't reproduce the issue on any of my systems
I think I see the issue. Looks like there is a missing memory barrier
after the head consistency code. I will add one and see if that fixes
your problem.
BTW, I can't reproduce the issue on any of my systems :-/.
-Nathan
On Thu, Feb 12, 2015 at 02:07:08AM -0800, Paul Hargrove wrote:
>Just e
Just experienced the same failure as below with openmpi-dev-904-g08dceda
build with "gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16)" on Scientific
Linux 7.x (a RHEL 7 clone).
gdb says:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x753b0700 (LWP 19685)]
0x004
Yes, this time I really mean "fifo", not "lifo". ;-)
With last night's master tarball (Open MPI dev-845-ga3275aa) configured
with only --prefix and --enable-debug
A Linux-86-64 system running debian Wheezy and compiler = "gcc (Debian
4.7.2-5) 4.7.2"
Failure from "make check":
/home/phargrov/OMP