Re: libc recently more aggressive about pthread locks in stable ?

2016-11-13 Thread Lucas Nussbaum
On 12/11/16 at 18:51 -0200, Henrique de Moraes Holschuh wrote:
> Lucas,
> 
> Thanks for trying a build run with TSX enabled.
> 
> On Sat, 12 Nov 2016, Lucas Nussbaum wrote:
> > I did an archive rebuild on Amazon EC2 using m4.16xlarge instances, that
> > use a CPU with TSX enabled.
> 
> What microcode revision is that Xeon E5-2686 running?

microcode: CPU0 sig=0x406f1, pf=0x1, revision=0xb14

(That's just on one node. I'm assuming that all nodes had the same
microcode revision, which is probably a reasonable bet)

Lucas



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-12 Thread Henrique de Moraes Holschuh
Lucas,

Thanks for trying a build run with TSX enabled.

On Sat, 12 Nov 2016, Lucas Nussbaum wrote:
> I did an archive rebuild on Amazon EC2 using m4.16xlarge instances, that
> use a CPU with TSX enabled.

What microcode revision is that Xeon E5-2686 running?

> I've filed bugs for the packages that failed during that rebuild, but
> don't fail on m4.large instances:
> https://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=qa-ftbfs-2016;users=debian...@lists.debian.org

We still need that instrumented libc if one is to test applications,
though, as most packages have little in the way of automated regression
test suites.  And people need to test the packages (using the
applications) with such an instrumented libc installed (or running on a
box with TSX active).

-- 
  Henrique Holschuh



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-12 Thread Lucas Nussbaum
On 07/11/16 at 21:52 +0100, Lucas Nussbaum wrote:
> Hi,
> 
> On 06/11/16 at 17:41 -0200, Henrique de Moraes Holschuh wrote:
> > On Sun, 06 Nov 2016, Ben Hutchings wrote:
> > > It's worth noting that TSX is broken in 'Haswell' processors and is
> > > supposed to be disabled via a microcode update.  I don't know whether
> > > glibc avoids using it on these processors if the microcode update is
> > > not applied.  (Linux doesn't appear to hide the feature flags.)
> > 
> > It does avoid it.  For glibc libpthreads, Debian has blacklisted Intel
> > TSX use [in libpthreads] on all of Haswell and much of Broadwell.
> > 
> > But anything else *will* attempt to use it, people query cpuid directly
> > for these things.  You need a hypervisor that filters cpuid().
> 
> How can one know what glibc does on a given CPU? (preferably without
> access to the hardware)
> 
> I could try to run an archive rebuild on hardware where glibc leverages
> TSX to see what happens.

I did an archive rebuild on Amazon EC2 using m4.16xlarge instances, that
use a CPU with TSX enabled.

I've filed bugs for the packages that failed during that rebuild, but
don't fail on m4.large instances:
https://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=qa-ftbfs-2016;users=debian...@lists.debian.org

It's not impossible that some of them are caused by problems with
building in parallel, unrelated to TSX.

L.



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-08 Thread Henrique de Moraes Holschuh
On Mon, 07 Nov 2016, Lucas Nussbaum wrote:
> On 06/11/16 at 17:41 -0200, Henrique de Moraes Holschuh wrote:
> > On Sun, 06 Nov 2016, Ben Hutchings wrote:
> > > It's worth noting that TSX is broken in 'Haswell' processors and is
> > > supposed to be disabled via a microcode update.  I don't know whether
> > > glibc avoids using it on these processors if the microcode update is
> > > not applied.  (Linux doesn't appear to hide the feature flags.)
> > 
> > It does avoid it.  For glibc libpthreads, Debian has blacklisted Intel
> > TSX use [in libpthreads] on all of Haswell and much of Broadwell.
> > 
> > But anything else *will* attempt to use it, people query cpuid directly
> > for these things.  You need a hypervisor that filters cpuid().
> 
> How can one know what glibc does on a given CPU? (preferably without
> access to the hardware)
> 
> I could try to run an archive rebuild on hardware where glibc leverages
> TSX to see what happens.

IMHO it would be better to instrument the locks in glibc with asserts,
instead.  You could use anything to test for pthread API violations,
then.

That said, if you are going to test Intel TSX for real, you need a
Desktop Skylake-based Core i5/i7 or Xeon E3v5 that reports "RTM" in
/proc/cpuinfo.  Some won't.

Not every Skylake model will have it enabled in the first place, and
apparently the firmware can (and some _do_) disable it, especially on
the mobile side.

Please ensure the Skylake firmware has microcode 0x9d/0x9e or later, or
install the latest version of the non-free intel-microcode package.  The
risk of unpredictable behaviour is quite real otherwise, and could mess
up the test results (and corrupt data).

Skylake errata are a nightmare. Note the AVX, AVX2, eDRAM (L4?), and TSX
ones, as well as the power-management ones:

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v5-spec-update.pdf
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf

Don't attempt to test TSX with perf or intel PT running.  perf is likely
to cause too many aborts, and Intel PT is an errata hell.

As for Broadwell, I don't know which processors would still have TSX
enabled in the first place when running the latest microcode, and we
blacklist most of them in glibc anyway (because almost all Broadwell-*
specification updates list it as either unavailable or unusable), so
they're not a very viable option to test this.

-- 
  Henrique Holschuh



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-08 Thread Lucas Nussbaum
On 07/11/16 at 21:52 +0100, Lucas Nussbaum wrote:
> Hi,
> 
> On 06/11/16 at 17:41 -0200, Henrique de Moraes Holschuh wrote:
> > On Sun, 06 Nov 2016, Ben Hutchings wrote:
> > > It's worth noting that TSX is broken in 'Haswell' processors and is
> > > supposed to be disabled via a microcode update.  I don't know whether
> > > glibc avoids using it on these processors if the microcode update is
> > > not applied.  (Linux doesn't appear to hide the feature flags.)
> > 
> > It does avoid it.  For glibc libpthreads, Debian has blacklisted Intel
> > TSX use [in libpthreads] on all of Haswell and much of Broadwell.
> > 
> > But anything else *will* attempt to use it, people query cpuid directly
> > for these things.  You need a hypervisor that filters cpuid().
> 
> How can one know what glibc does on a given CPU? (preferably without
> access to the hardware)

Answering myself, the relevant patch is
https://sources.debian.net/src/glibc/2.24-5/debian/patches/amd64/local-blacklist-for-Intel-TSX.diff/

Lucas



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-07 Thread Lucas Nussbaum
Hi,

On 06/11/16 at 17:41 -0200, Henrique de Moraes Holschuh wrote:
> On Sun, 06 Nov 2016, Ben Hutchings wrote:
> > It's worth noting that TSX is broken in 'Haswell' processors and is
> > supposed to be disabled via a microcode update.  I don't know whether
> > glibc avoids using it on these processors if the microcode update is
> > not applied.  (Linux doesn't appear to hide the feature flags.)
> 
> It does avoid it.  For glibc libpthreads, Debian has blacklisted Intel
> TSX use [in libpthreads] on all of Haswell and much of Broadwell.
> 
> But anything else *will* attempt to use it, people query cpuid directly
> for these things.  You need a hypervisor that filters cpuid().

How can one know what glibc does on a given CPU? (preferably without
access to the hardware)

I could try to run an archive rebuild on hardware where glibc leverages
TSX to see what happens.

Lucas



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Henrique de Moraes Holschuh
On Sun, 06 Nov 2016, Adrian Bunk wrote:
> On Sun, Nov 06, 2016 at 05:41:34PM -0200, Henrique de Moraes Holschuh wrote:
> > On Sun, 06 Nov 2016, Ben Hutchings wrote:
> > > It's worth noting that TSX is broken in 'Haswell' processors and is
> > > supposed to be disabled via a microcode update.  I don't know whether
> > > glibc avoids using it on these processors if the microcode update is
> > > not applied.  (Linux doesn't appear to hide the feature flags.)
> > 
> > It does avoid it.  For glibc libpthreads, Debian has blacklisted Intel
> > TSX use [in libpthreads] on all of Haswell and much of Broadwell.
> > 
> > But anything else *will* attempt to use it, people query cpuid directly
> > for these things.  You need a hypervisor that filters cpuid().
> 
> All users who are using intel-microcode from non-free instead of running 
> outdated microcode with known errata should be OK here?

Last time I checked, it looked like an yes for Skylake as far as Intel
TSX is concerned.

I don't know about the other processors, such as Broadwell-E.

-- 
  Henrique Holschuh



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Aurelien Jarno
On 2016-11-06 01:12, Henrique de Moraes Holschuh wrote:
> On Sat, 05 Nov 2016, Ian Jackson wrote:
> > Looking at the code, I think that gs in jessie is plainly violating
> > the rules about the use of pthread locks.  On my partner's machine,
> 
> Per logs from message #15 on bug #842796:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=842796#15
> 
> SIGSEGV on __lll_unlock_elision is a signature (IME with very high
> confidence) of an attempt to unlock an already unlocked lock while
> running under hardware lock elision.
> 
> 
> Well, unlocking an already unlocked lock is a pthreads API rule
> violation, and it is going to crash the process on something that
> implements hardware lock elision.
> 
> These would be Intel x86 processors with TSX enabled[1] for Debian
> 8/jessie.  For Debian 9/stretch and for unstable, I believe it also
> includes IBM Power8, and s390x systems -- AFAIK they won't forgive an
> attempt to unlock an unlocked lock any more than Intel TSX does.
> 
> [1] Broadwell-E, Skylake, and later processors, as well as Xeon *v5
> processors.  I am not sure if we blacklisted any of the Xeon *v4
> or not, and too tired to look their model numbers up right now.
> 
> Unfortunately, when hardware lock elision support was added to glibc
> upstream, libpthreads was *not* changed to properly assert() this
> forbidden condition on the non-hardware-elision codepaths.  Such an
> assert() would have given us consistent behavior, thus flushing the bugs
> out in the open... at the cost of a performance hit (I have no idea how
> severe), and much screaming.

This has not been done has it would have a severe performance hit. That
said error checking mutexes also exist in GLIBC, and have been designed
exactly for that, ie they trade performance for correctness.

> To be fair: it is likely nobody upstream had any idea of just how much
> code got libpthreads usage wrong... and we certainly didn't know better
> in Debian, either.  Well, now we're going to find out :-(
> 
> BTW, AFAIK libpthreads still doesn't have any such assert(), so there's
> likely a lot of such buggy code in unstable still.  This is going to
> cause trouble for Debian stretch, too.

I don't expect it to be worse than jessie, actually probably better as
some of the bugs have been fixed by the various upstreams in the
meantime. Also remember that TSX is just making the bug more visible. It
means that users without TSX might experience hangs instead. There are
actually two "hang bugs" reporting against ghostscript, that could be
fixed by fixing the TSX bug.

[...]

> If the problem is too widespread and too hard to fix on a large number
> of packages, I suppose we could ask the glibc maintainers to consider
> disabling hardware lock elision support in stable through a stable
> update.
> 
> Such a change to glibc would likely requires some patches to ensure it
> *really* disabled Intel TSX opcode/instruction insertion, but I think we
> already ship all of them as part of the Intel TSX blacklist.  The result
> would need real-world testing on an up-to-date Skylake box as well as
> objdump inspection to ensure *no* TSX-related instructions leaked into
> the binaries.

We can disable multiarch by passing "--enable-lock-elision". There is no
risk that the instructions are leaked into the binaries except of course
for static binaries. That said so far we talk about a few packages only.
A lot of bugs have already been fixed during the jessie release cycle, I
remember sending patches for that.

> And what should we do about Debian stretch, then?

As said above disabling TSX in glibc is just hidding issues to users. We
should instead try to detect as many bugs as possible (possibly fixing
the corresponding bugs in jessie). One way would be to get a box with
TSX instructions and use it for the reproducible builds and/or the
autopkgtests.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


signature.asc
Description: PGP signature


Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Adrian Bunk
On Sun, Nov 06, 2016 at 05:41:34PM -0200, Henrique de Moraes Holschuh wrote:
> On Sun, 06 Nov 2016, Ben Hutchings wrote:
> > It's worth noting that TSX is broken in 'Haswell' processors and is
> > supposed to be disabled via a microcode update.  I don't know whether
> > glibc avoids using it on these processors if the microcode update is
> > not applied.  (Linux doesn't appear to hide the feature flags.)
> 
> It does avoid it.  For glibc libpthreads, Debian has blacklisted Intel
> TSX use [in libpthreads] on all of Haswell and much of Broadwell.
> 
> But anything else *will* attempt to use it, people query cpuid directly
> for these things.  You need a hypervisor that filters cpuid().

All users who are using intel-microcode from non-free instead of running 
outdated microcode with known errata should be OK here?

Running outdated microcode is a bad idea, and noone is making 
Debian-specific workarounds for all the other CPU errata.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Ian Jackson
Henrique de Moraes Holschuh writes ("Re: libc recently more aggressive about 
pthread locks in stable ?"):
> Per logs from message #15 on bug #842796:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=842796#15
> 
> SIGSEGV on __lll_unlock_elision is a signature (IME with very high
> confidence) of an attempt to unlock an already unlocked lock while
> running under hardware lock elision.

I don't know anything about hardware lock elision...

> Well, unlocking an already unlocked lock is a pthreads API rule
> violation, and it is going to crash the process on something that
> implements hardware lock elision.

... but you are of course correct about this.  I debugged the problem
with ghostscript, and it was indeed violating the pthreads rules.  I
have filed #843324 with a patch for Debian to backport the
corresponding upstream fix.  I don't understand the wider logic in
ghostscript; the bug was in the colour space management code and
occurred when a function was called with two pointer arguments which
were actually aliases of the same colourspace-related data structure.
Converting ghostscript to use recursive mutexes was IMO clearly
correct and fixed the bug.

> If the problem is too widespread and too hard to fix on a large number
> of packages, I suppose we could ask the glibc maintainers to consider
> disabling hardware lock elision support in stable through a stable
> update.

I think this would be a good idea.

ogg123 and ghostscript are hardly obscure programs.  It's difficult to
know how bad this problem is, but we would like stable to be useful
even on recent hardware.

> And what should we do about Debian stretch, then?

Perhaps we could add the assert you suggest, on non-lock-elision
hardware.  Whether to do that would depend on its performance impact.

TBH I wonder whether we really want to be giving an evidently shonky
codebase boobytrapped mutexes by default.  We could change the default
mutex type to recursive and make all of these bugs go away.

Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Henrique de Moraes Holschuh
On Sun, 06 Nov 2016, Ben Hutchings wrote:
> It's worth noting that TSX is broken in 'Haswell' processors and is
> supposed to be disabled via a microcode update.  I don't know whether
> glibc avoids using it on these processors if the microcode update is
> not applied.  (Linux doesn't appear to hide the feature flags.)

It does avoid it.  For glibc libpthreads, Debian has blacklisted Intel
TSX use [in libpthreads] on all of Haswell and much of Broadwell.

But anything else *will* attempt to use it, people query cpuid directly
for these things.  You need a hypervisor that filters cpuid().

-- 
  Henrique Holschuh



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Jeff Epler
[resending with correct Cc:]

I believe that similar bugs have been afflicting hurd and kfreebsd debian ports
for some time.  In retrospect, it's too bad these reports weren't given more
attention, because it could have made things better for Linux platforms as well.
:-/

see e.g.,
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=671785#48

Jeff



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Ben Hutchings
On Sat, 2016-11-05 at 20:32 +0100, Christian Seiler wrote:
> On 11/05/2016 08:13 PM, Ian Jackson wrote:
> > I have just been debugging a ghostscript segfault on jessie amd64.
> > 
> > Looking at the code, I think that gs in jessie is plainly violating
> > the rules about the use of pthread locks.  On my partner's machine,
> > this makes it segfault on termination (with some input files, at
> > least).  On my machine it works just fine.  The code in sid is better.
> > 
> > I recently encountered what seems to be a similar bug in ogg123 in
> > stable.  #842796.
> > 
> > Has something changed in jessie's libc recently ?  I find it difficult
> > to imagine that these bugs would have been missed earlier during the
> > life of jessie.
> 
> Recently Frank Fegert discovered a problem with locking in open-iscsi
> that only occurs on new hardware. The code previously was wrong, but
> earlier CPUs were more forgiving when it came to this error and it
> couldn't be triggered.
> 
> Frank wrote about the problem in his blog in great detail:
> http://www.bityard.org/blog/2016/08/05/debugging_segfaults_open-iscsi_iscsiuio_intel_broadwell
[...]

This is not really a case of older CPUs being 'more forgiving'; they
had no locking operations[*] and nothing to forgive.  However, glibc
uses transactional memory (TSX) on the newer CPUs that implement it,
and that new code does result in the CPU detecting some locking errors.

It's worth noting that TSX is broken in 'Haswell' processors and is
supposed to be disabled via a microcode update.  I don't know whether
glibc avoids using it on these processors if the microcode update is
not applied.  (Linux doesn't appear to hide the feature flags.)

* The LOCK prefix is for 'bus locking' during a single instruction,
i.e. making it atomic.  The CPU can't know what higher-level operation
it's being used for.

Ben.

-- 
Ben Hutchings
The world is coming to an end.  Please log off.



signature.asc
Description: This is a digitally signed message part


Re: libc recently more aggressive about pthread locks in stable ?

2016-11-05 Thread Henrique de Moraes Holschuh
On Sat, 05 Nov 2016, Ian Jackson wrote:
> Looking at the code, I think that gs in jessie is plainly violating
> the rules about the use of pthread locks.  On my partner's machine,

Per logs from message #15 on bug #842796:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=842796#15

SIGSEGV on __lll_unlock_elision is a signature (IME with very high
confidence) of an attempt to unlock an already unlocked lock while
running under hardware lock elision.


Well, unlocking an already unlocked lock is a pthreads API rule
violation, and it is going to crash the process on something that
implements hardware lock elision.

These would be Intel x86 processors with TSX enabled[1] for Debian
8/jessie.  For Debian 9/stretch and for unstable, I believe it also
includes IBM Power8, and s390x systems -- AFAIK they won't forgive an
attempt to unlock an unlocked lock any more than Intel TSX does.

[1] Broadwell-E, Skylake, and later processors, as well as Xeon *v5
processors.  I am not sure if we blacklisted any of the Xeon *v4
or not, and too tired to look their model numbers up right now.

Unfortunately, when hardware lock elision support was added to glibc
upstream, libpthreads was *not* changed to properly assert() this
forbidden condition on the non-hardware-elision codepaths.  Such an
assert() would have given us consistent behavior, thus flushing the bugs
out in the open... at the cost of a performance hit (I have no idea how
severe), and much screaming.

To be fair: it is likely nobody upstream had any idea of just how much
code got libpthreads usage wrong... and we certainly didn't know better
in Debian, either.  Well, now we're going to find out :-(

BTW, AFAIK libpthreads still doesn't have any such assert(), so there's
likely a lot of such buggy code in unstable still.  This is going to
cause trouble for Debian stretch, too.

> Has something changed in jessie's libc recently ?  I find it difficult
> to imagine that these bugs would have been missed earlier during the
> life of jessie.

The required hardware was not widely available at the time, the
knowledge of how hardware lock elision would really behave was sparse
outside of Intel and IBM -- so people either didn't know, or did not
grasp the importance of the fact that the hardware would be utterly
intolerant to something that the old code was too lenient about -- and
libpthreads was not instrumented to compensate for that.

I actually recommended that it would be safer to disable lock elision
for jessie[2]: the sharp corners nature of the code in glibc 2.19 scared
me, as well as just how messed up the implementation on Intel processors
were at the time.  Unfortunately, I didn't push for it at all: I didn't
know how correct I were at the time[3].

[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=762195#50

The hard truth is that nobody in Debian knew how deep those murky waters
were at the time[3], and I don't think glibc upstream developers did
either.  So, we limited ourselves in Debian to blacklisting the
processors where Intel (either for sure, or highly likely) screwed it up
beyond repair.

[3] A number of subtle Intel TSX errata were fixed by Skylake and
Broadwell microcode updates, and the latest ones are quite recent.
The until-then latent (or subtle) broken locking bugs in
applications/libs becoming high-hitter crashers as more users get
newer computers, etc.

Anyway, any library or application that hits this issue has broken
locking, plain and simple.

A package crashing from this issue very likely requires a stable update
to fix the locking (which won't always be a trivial fix, either), even
if we changed libpthreads to disable lock elision support and it stopped
the crashes -- even if it wouldn't crash anymore, the locking would
still be broken and therefore suspect of not being as effective as it
would have to be to ensure correct operation at all times.

> I will try to make a patch to fix ghostscript, or at least file a
> proper bug.  But, if there was a libc change, would it be possible to
> revert it or make some kind of workaround ?

If the problem is too widespread and too hard to fix on a large number
of packages, I suppose we could ask the glibc maintainers to consider
disabling hardware lock elision support in stable through a stable
update.

Such a change to glibc would likely requires some patches to ensure it
*really* disabled Intel TSX opcode/instruction insertion, but I think we
already ship all of them as part of the Intel TSX blacklist.  The result
would need real-world testing on an up-to-date Skylake box as well as
objdump inspection to ensure *no* TSX-related instructions leaked into
the binaries.

And what should we do about Debian stretch, then?

Some references:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=824191
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=762195

-- 
  Henrique Holschuh



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-05 Thread Aurelien Jarno
On 2016-11-05 19:13, Ian Jackson wrote:
> I have just been debugging a ghostscript segfault on jessie amd64.
> 
> Looking at the code, I think that gs in jessie is plainly violating
> the rules about the use of pthread locks.  On my partner's machine,
> this makes it segfault on termination (with some input files, at
> least).  On my machine it works just fine.  The code in sid is better.
> 
> I recently encountered what seems to be a similar bug in ogg123 in
> stable.  #842796.
> 
> Has something changed in jessie's libc recently ?  I find it difficult
> to imagine that these bugs would have been missed earlier during the
> life of jessie.

I think you just got a new machine with a CPU supporting the TSX
instructions, which are more picky about following the pthreads
semantics.

Unfortunately given Intel fuck-up on TSX implementation in Haswell and
some Broadwell CPUs, they had to disable TSX instructions though firmware
updates, which in turns means we haven't got all packages in Jessie
tested by a wide set of people.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-05 Thread Ian Jackson
Ian Jackson writes ("libc recently more aggressive about pthread locks in 
stable ?"):
> I have just been debugging a ghostscript segfault on jessie amd64.
...
> I recently encountered what seems to be a similar bug in ogg123 in
> stable.  #842796.
> 
> Has something changed in jessie's libc recently ?  I find it difficult
> to imagine that these bugs would have been missed earlier during the
> life of jessie.
> 
> I will try to make a patch to fix ghostscript, or at least file a
> proper bug.  But, if there was a libc change, would it be possible to
> revert it or make some kind of workaround ?

FYI, the ghostscript bug, with patch for jessie, is #843324.
sid's ghostscript is fine and I think stretch's is too.

Ian.

-- 
Ian Jackson    These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-05 Thread Christian Seiler
On 11/05/2016 08:13 PM, Ian Jackson wrote:
> I have just been debugging a ghostscript segfault on jessie amd64.
> 
> Looking at the code, I think that gs in jessie is plainly violating
> the rules about the use of pthread locks.  On my partner's machine,
> this makes it segfault on termination (with some input files, at
> least).  On my machine it works just fine.  The code in sid is better.
> 
> I recently encountered what seems to be a similar bug in ogg123 in
> stable.  #842796.
> 
> Has something changed in jessie's libc recently ?  I find it difficult
> to imagine that these bugs would have been missed earlier during the
> life of jessie.

Recently Frank Fegert discovered a problem with locking in open-iscsi
that only occurs on new hardware. The code previously was wrong, but
earlier CPUs were more forgiving when it came to this error and it
couldn't be triggered.

Frank wrote about the problem in his blog in great detail:
http://www.bityard.org/blog/2016/08/05/debugging_segfaults_open-iscsi_iscsiuio_intel_broadwell

I haven't looked in detail at your problem, but I could easily
imagine that the problem you're experiencing with other packages is
similar, especially since you mentioned migrating to new hardware.

Hope that helps.

Regards,
Christian

PS: In case someone was wondering: the specific problem with
open-iscsi is now fixed in sid, testing and jessie-backports; jessie
is not affected because we didn't yet build the component with the
issue there.