Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM
Hi James, Пт 17 ноя 2017 @ 17:15 James Cowgill : > IMO the best solution is to remove all the ATOMIC_GENERATION_HACK code > and use libatomic, but this will take some porting work because > swi-prolog uses the old __sync primitives everywhere. > > I have attached a hack which marks _generation and _last_generation as > volatile. This seems to work but isn't a long term solution. Thanks for your input! I've informed upstream about the issue you found and your suggestions. Regards, Lev
Bug#881756: Re: Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM
Hi, On 15/11/17 10:35, Adrian Bunk wrote: > On Wed, Nov 15, 2017 at 02:30:54PM +0500, Lev Lamberov wrote: >> Ср 15 ноя 2017 @ 08:06 Adrian Bunk : >> ... >>> Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1: >>> https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc >>> https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe >> ... >> At least, I cannot think of any other reason that 7.6.1-1 >> was built successfully on mips and 7.6.1-2 failed, where the only one >> change is disabling Java tests (due to CVE-2017-1000364). I've uploaded >> 7.6.1-2 on the next day after 7.6.1-1 upload. > > you already quoted the reason: > >> The main issue >> seems to be weaker read/write ordering constraints that break our >> lock-free data structures, resulting in more or less random bugs. > > Based on the mips/powerpc/powerpcspe results one could say that there > is a 50% chance that a build attempt fails. I had a look at this and indeed the build fails on mips randomly usually hanging in the test_cgc test (as mentioned earlier). One thread always hangs in the global_generation function which is only enabled if ATOMIC_GENERATION_HACK is enabled (only on arches without 64-bit atomics). Excerpt from pl-inline.h:465 > static inline gen_t > global_generation(void) > { gen_t g; > gen_t last; > > do > { last = GD->_last_generation; > g = (gen_t)GD->_generation.gen_u<<32 | GD->_generation.gen_l; > } while ( unlikely(g < last) ); > > if ( unlikely(last != g) ) > GD->_last_generation = g; > > return g; > } In the above loop, GD->_generation and GD->_last_generation are not modified within the loop, nor are they declared volatile. Therefore the "last" and "g" assignments are invariant and GCC will hoist them out of the loop (into something like what is shown below). This causes the hangs on mips if g < last for some reason. last = GD->_last_generation; g = (gen_t)GD->_generation.gen_u<<32 | GD->_generation.gen_l; do {} while (g < last); As a side note, I also notice GD->_generation is loaded without any memory barriers on all architectures (pl-incl.h:999) which looks a bit dodgy (although I don't know if it actually breaks anything). IMO the best solution is to remove all the ATOMIC_GENERATION_HACK code and use libatomic, but this will take some porting work because swi-prolog uses the old __sync primitives everywhere. I have attached a hack which marks _generation and _last_generation as volatile. This seems to work but isn't a long term solution. James --- a/src/pl-global.h +++ b/src/pl-global.h @@ -105,9 +105,9 @@ struct PL_global_data } signals; #endif #ifdef O_LOGICAL_UPDATE - ggen_t _generation;/* generation of the database */ + volatile ggen_t _generation;/* generation of the database */ #ifdef ATOMIC_GENERATION_HACK - gen_t_last_generation; /* see pl-inline.h, global_generation() */ + volatile gen_t _last_generation; /* see pl-inline.h, global_generation() */ #endif #endif signature.asc Description: OpenPGP digital signature
Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM
On Wed, Nov 15, 2017 at 02:30:54PM +0500, Lev Lamberov wrote: > Hi Adrian, Hi Lev, > Ср 15 ноя 2017 @ 08:06 Adrian Bunk : >... > > Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1: > > https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc > > https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe >... > At least, I cannot think of any other reason that 7.6.1-1 > was built successfully on mips and 7.6.1-2 failed, where the only one > change is disabling Java tests (due to CVE-2017-1000364). I've uploaded > 7.6.1-2 on the next day after 7.6.1-1 upload. you already quoted the reason: > The main issue > seems to be weaker read/write ordering constraints that break our > lock-free data structures, resulting in more or less random bugs. Based on the mips/powerpc/powerpcspe results one could say that there is a 50% chance that a build attempt fails. That would give a 37.5% probability for 2 builds failing and one succeeding when trying 3 times. Considering the older mips failure and the more frequent powerpc/powerpcspe failures, the 7.6.1-1 success might just have been "luck". > Cheers! > Lev cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM
Hi Adrian, Ср 15 ноя 2017 @ 08:06 Adrian Bunk : > On Wed, Nov 15, 2017 at 12:55:12AM +0500, Lev Lamberov wrote: >>... >> The most strange thing is that 7.6.1-1 built successfully on mips. The >> only difference between 7.6.1-1 and 7.6.1-2 is that java tests (only >> tests) are disabled now (via debian/rules). > > 7.3.33+dfsg-1 failed the same way a year ago: > https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=mips > > I would not rule out that this might be an old bug causing random build > failures, that either just happened twice in the row or became more > likely due to some change somewhere. In the past there were some wierd build problems from time to time. You can find logs and build history in usual place. These build problems were unreproducible and were typically resolved with rebuilding. Not that time. >> Note that the 7.6.1-2 >> version builds successfully on mipsel and mips64el (little-endian), but >> fails on mips (big-endian). > >> The similar problem occures on powerpc [1][2], which also works in >> big-endian mode: >>... > > Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1: > https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc > https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe True. And as I can see powerpcspe also works in big-endian mode. I've informed upstream about the issue. Their answer is as follows: > Interesting. I doubt this is due to big/little endian. The main issue > seems to be weaker read/write ordering constraints that break our > lock-free data structures, resulting in more or less random bugs. A > number of the tests stress these parts of the system. The test_cgc > is one of them, while I'm pretty sure there are no endian issues in > that code. > > Keri and I did a lot of stress-testing and code reviewing for this after > we discovered this was the reason for a crash on ARM. The same problem > easily reproduced on powerpc. After the fixes for 7.6.1, a couple of > runs of the test suite passed ok on powerpc. I only ran many iterations > for tests that causes problems before. Upstream will try to run these stress tests on powerpc and mips again, but they claim that they were not able to reproduce some issues with these tests in 7.6.1. Guess the issue may be related to Debian build environment. At least, I cannot think of any other reason that 7.6.1-1 was built successfully on mips and 7.6.1-2 failed, where the only one change is disabling Java tests (due to CVE-2017-1000364). I've uploaded 7.6.1-2 on the next day after 7.6.1-1 upload. Cheers! Lev
Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM
On Wed, Nov 15, 2017 at 12:55:12AM +0500, Lev Lamberov wrote: >... > The most strange thing is that 7.6.1-1 built successfully on mips. The > only difference between 7.6.1-1 and 7.6.1-2 is that java tests (only > tests) are disabled now (via debian/rules). 7.3.33+dfsg-1 failed the same way a year ago: https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=mips I would not rule out that this might be an old bug causing random build failures, that either just happened twice in the row or became more likely due to some change somewhere. > Note that the 7.6.1-2 > version builds successfully on mipsel and mips64el (little-endian), but > fails on mips (big-endian). > The similar problem occures on powerpc [1][2], which also works in > big-endian mode: >... Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1: https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM
Package: swi-prolog Version: 7.4.2+dfsg-2 Severity: serious Justification: fails to build from source The mips build of swi-prolog failed: Running scripts from core ... E: Build killed with signal TERM after 360 minutes of inactivity The bug can be reproduced on mips porterbox. Hitting Ctrl+X gives: Interrupted test cgc:shift_cgc at /home/dogsleg/swi-prolog-7.6.1+dfsg/src/Tests/core/test_cgc.pl:102 The most strange thing is that 7.6.1-1 built successfully on mips. The only difference between 7.6.1-1 and 7.6.1-2 is that java tests (only tests) are disabled now (via debian/rules). Note that the 7.6.1-2 version builds successfully on mipsel and mips64el (little-endian), but fails on mips (big-endian). The similar problem occures on powerpc [1][2], which also works in big-endian mode: Running scripts from core Makefile:418: recipe for target 'check' failed make[2]: *** [check] Terminated [1] https://buildd.debian.org/status/fetch.php?pkg=swi-prolog&arch=powerpc&ver=7.6.1%2Bdfsg-2&stamp=1510047696&raw=0 [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=869701 -- System Information: Debian Release: buster/sid APT prefers testing APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.13.0-1-amd64 (SMP w/4 CPU cores) Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8), LANGUAGE=ru_RU.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages swi-prolog depends on: ii swi-prolog-nox 7.4.2+dfsg-2 ii swi-prolog-x7.4.2+dfsg-2 swi-prolog recommends no packages. swi-prolog suggests no packages. -- no debconf information