Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM

2017-11-18 Thread Lev Lamberov
Hi James,

Пт 17 ноя 2017 @ 17:15 James Cowgill :
> IMO the best solution is to remove all the ATOMIC_GENERATION_HACK code
> and use libatomic, but this will take some porting work because
> swi-prolog uses the old __sync primitives everywhere.
>
> I have attached a hack which marks _generation and _last_generation as
> volatile. This seems to work but isn't a long term solution.

Thanks for your input! I've informed upstream about the issue you found
and your suggestions.

Regards,
Lev



Bug#881756: Re: Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM

2017-11-17 Thread James Cowgill
Hi,

On 15/11/17 10:35, Adrian Bunk wrote:
> On Wed, Nov 15, 2017 at 02:30:54PM +0500, Lev Lamberov wrote:
>> Ср 15 ноя 2017 @ 08:06 Adrian Bunk :
>> ...
>>> Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1:
>>> https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc
>>> https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe
>> ...
>> At least, I cannot think of any other reason that 7.6.1-1
>> was built successfully on mips and 7.6.1-2 failed, where the only one
>> change is disabling Java tests (due to CVE-2017-1000364). I've uploaded
>> 7.6.1-2 on the next day after 7.6.1-1 upload.
> 
> you already quoted the reason:
> 
>> The main issue
>> seems to be weaker read/write ordering constraints that break our
>> lock-free data structures, resulting in more or less random bugs.
> 
> Based on the mips/powerpc/powerpcspe results one could say that there
> is a 50% chance that a build attempt fails.

I had a look at this and indeed the build fails on mips randomly usually
hanging in the test_cgc test (as mentioned earlier).

One thread always hangs in the global_generation function which is only
enabled if ATOMIC_GENERATION_HACK is enabled (only on arches without
64-bit atomics).

Excerpt from pl-inline.h:465
> static inline gen_t
> global_generation(void)
> { gen_t g;
>   gen_t last;
> 
>   do
>   { last = GD->_last_generation;
> g = (gen_t)GD->_generation.gen_u<<32 | GD->_generation.gen_l;
>   } while ( unlikely(g < last) );
> 
>   if ( unlikely(last != g) )
> GD->_last_generation = g;
> 
>   return g;
> }

In the above loop, GD->_generation and GD->_last_generation are not
modified within the loop, nor are they declared volatile. Therefore the
"last" and "g" assignments are invariant and GCC will hoist them out of
the loop (into something like what is shown below). This causes the
hangs on mips if g < last for some reason.

last = GD->_last_generation;
g = (gen_t)GD->_generation.gen_u<<32 | GD->_generation.gen_l;
do {} while (g < last);

As a side note, I also notice GD->_generation is loaded without any
memory barriers on all architectures (pl-incl.h:999) which looks a bit
dodgy (although I don't know if it actually breaks anything).

IMO the best solution is to remove all the ATOMIC_GENERATION_HACK code
and use libatomic, but this will take some porting work because
swi-prolog uses the old __sync primitives everywhere.

I have attached a hack which marks _generation and _last_generation as
volatile. This seems to work but isn't a long term solution.

James
--- a/src/pl-global.h
+++ b/src/pl-global.h
@@ -105,9 +105,9 @@ struct PL_global_data
   } signals;
 #endif
 #ifdef O_LOGICAL_UPDATE
-  ggen_t   _generation;/* generation of the database */
+  volatile ggen_t  _generation;/* generation of the database */
 #ifdef ATOMIC_GENERATION_HACK
-  gen_t_last_generation;   /* see pl-inline.h, global_generation() */
+  volatile gen_t   _last_generation;   /* see pl-inline.h, global_generation() */
 #endif
 #endif



signature.asc
Description: OpenPGP digital signature


Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM

2017-11-15 Thread Adrian Bunk
On Wed, Nov 15, 2017 at 02:30:54PM +0500, Lev Lamberov wrote:
> Hi Adrian,

Hi Lev,

> Ср 15 ноя 2017 @ 08:06 Adrian Bunk :
>...
> > Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1:
> > https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc
> > https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe
>...
> At least, I cannot think of any other reason that 7.6.1-1
> was built successfully on mips and 7.6.1-2 failed, where the only one
> change is disabling Java tests (due to CVE-2017-1000364). I've uploaded
> 7.6.1-2 on the next day after 7.6.1-1 upload.

you already quoted the reason:

> The main issue
> seems to be weaker read/write ordering constraints that break our
> lock-free data structures, resulting in more or less random bugs.

Based on the mips/powerpc/powerpcspe results one could say that there
is a 50% chance that a build attempt fails.

That would give a 37.5% probability for 2 builds failing and one 
succeeding when trying 3 times.

Considering the older mips failure and the more frequent 
powerpc/powerpcspe failures, the 7.6.1-1 success might
just have been "luck".

> Cheers!
> Lev

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM

2017-11-15 Thread Lev Lamberov
Hi Adrian,

Ср 15 ноя 2017 @ 08:06 Adrian Bunk :
> On Wed, Nov 15, 2017 at 12:55:12AM +0500, Lev Lamberov wrote:
>>...
>> The most strange thing is that 7.6.1-1 built successfully on mips. The
>> only difference between 7.6.1-1 and 7.6.1-2 is that java tests (only
>> tests) are disabled now (via debian/rules).
>
> 7.3.33+dfsg-1 failed the same way a year ago:
> https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=mips
>
> I would not rule out that this might be an old bug causing random build
> failures, that either just happened twice in the row or became more
> likely due to some change somewhere.

In the past there were some wierd build problems from time to time. You
can find logs and build history in usual place. These build problems
were unreproducible and were typically resolved with rebuilding. Not
that time.

>> Note that the 7.6.1-2
>> version builds successfully on mipsel and mips64el (little-endian), but
>> fails on mips (big-endian).
>
>> The similar problem occures on powerpc [1][2], which also works in
>> big-endian mode:
>>...
>
> Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1:
> https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc
> https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe

True. And as I can see powerpcspe also works in big-endian mode.

I've informed upstream about the issue. Their answer is as follows:

> Interesting. I doubt this is due to big/little endian. The main issue
> seems to be weaker read/write ordering constraints that break our
> lock-free data structures, resulting in more or less random bugs. A
> number of the tests stress these parts of the system.  The test_cgc
> is one of them, while I'm pretty sure there are no endian issues in
> that code.
>
> Keri and I did a lot of stress-testing and code reviewing for this after
> we discovered this was the reason for a crash on ARM. The same problem
> easily reproduced on powerpc. After the fixes for 7.6.1, a couple of
> runs of the test suite passed ok on powerpc. I only ran many iterations
> for tests that causes problems before.

Upstream will try to run these stress tests on powerpc and mips again,
but they claim that they were not able to reproduce some issues with
these tests in 7.6.1. Guess the issue may be related to Debian build
environment. At least, I cannot think of any other reason that 7.6.1-1
was built successfully on mips and 7.6.1-2 failed, where the only one
change is disabling Java tests (due to CVE-2017-1000364). I've uploaded
7.6.1-2 on the next day after 7.6.1-1 upload.

Cheers!
Lev



Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM

2017-11-15 Thread Adrian Bunk
On Wed, Nov 15, 2017 at 12:55:12AM +0500, Lev Lamberov wrote:
>...
> The most strange thing is that 7.6.1-1 built successfully on mips. The
> only difference between 7.6.1-1 and 7.6.1-2 is that java tests (only
> tests) are disabled now (via debian/rules).

7.3.33+dfsg-1 failed the same way a year ago:
https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=mips

I would not rule out that this might be an old bug causing random build 
failures, that either just happened twice in the row or became more 
likely due to some change somewhere.

> Note that the 7.6.1-2
> version builds successfully on mipsel and mips64el (little-endian), but
> fails on mips (big-endian).

> The similar problem occures on powerpc [1][2], which also works in
> big-endian mode:
>...

Same randomness on powerpcspe, and both have it already with 7.6.1+dfsg-1:
https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpc
https://buildd.debian.org/status/logs.php?pkg=swi-prolog&arch=powerpcspe

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



Bug#881756: swi-prolog: FTBFS on mips: Build killed with signal TERM

2017-11-14 Thread Lev Lamberov
Package: swi-prolog
Version: 7.4.2+dfsg-2
Severity: serious
Justification: fails to build from source

The mips build of swi-prolog failed:

Running scripts from core ...
E: Build killed with signal TERM after 360 minutes of inactivity

The bug can be reproduced on mips porterbox. Hitting Ctrl+X gives:

Interrupted test cgc:shift_cgc at 
/home/dogsleg/swi-prolog-7.6.1+dfsg/src/Tests/core/test_cgc.pl:102

The most strange thing is that 7.6.1-1 built successfully on mips. The
only difference between 7.6.1-1 and 7.6.1-2 is that java tests (only
tests) are disabled now (via debian/rules). Note that the 7.6.1-2
version builds successfully on mipsel and mips64el (little-endian), but
fails on mips (big-endian).

The similar problem occures on powerpc [1][2], which also works in
big-endian mode:

Running scripts from core Makefile:418: 
recipe for target 'check' failed
make[2]: *** [check] Terminated

[1] 
https://buildd.debian.org/status/fetch.php?pkg=swi-prolog&arch=powerpc&ver=7.6.1%2Bdfsg-2&stamp=1510047696&raw=0

[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=869701


-- System Information:
Debian Release: buster/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.13.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8), 
LANGUAGE=ru_RU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages swi-prolog depends on:
ii  swi-prolog-nox  7.4.2+dfsg-2
ii  swi-prolog-x7.4.2+dfsg-2

swi-prolog recommends no packages.

swi-prolog suggests no packages.

-- no debconf information