Re: Handling s390 libc ABI change in Debian

2014-07-15 Thread Carlos O'Donell
On Tue, Jul 15, 2014 at 1:18 AM, Aurelien Jarno aure...@debian.org wrote:
 On Mon, Jul 14, 2014 at 11:14:42PM -0400, Carlos O'Donell wrote:
 On Mon, Jul 14, 2014 at 4:36 PM, Aurelien Jarno aure...@debian.org wrote:
  glibc 2.19 has changed the libc ABI on s390, more specifically the
  setjmp/longjmp functions [1] [2]. Symbol versioning is used to handle
  some cases, but it doesn't work when a jmp_buf variable is embedded
  into a structure, as it changes the size of the structure. The result
  is that mixing programs or libraries built with 2.18 with ones built
  with 2.19 do not work anymore, usually they end up with a segmentation
  fault. Some persons from this list have experienced that with perl.

 That is not true. This is an over generalization of the problem. You
 can use libraries built with 2.18 and 2.19 and they work just fine.

 I agree I probably a bit over exaggerated here, but the problem is real,
 breakages do happen, and some persons on this mailing list have already
 experienced them.

 The extent of the problem in correct language is listed here:
 https://sourceware.org/glibc/wiki/Release/2.19#Packaging_Changes

 This seems to minimize the problem, listing only perl. In practice we
 have seen much more breakages, part of them being due to the change of
 the __pthread_unwind_buf_t struct.

That is a change that nobody reported. You're the first to mention it
and that does make it more serious. We have discussed this upstream
and I agree that we need more versioning of the interfaces there to
support the change fully.

  We first thought it was limited to a few packages (even if all perl is
  already more than that), but as time goes more and more issues are
  found. libpng and gauche are also affected, the issue with mono is
  also likely due to this ABI change.

 That is new information, and it is important for distributions to
 relay this information back upstream where the decision for a SO bump
 can be made.

 I can follow up with a list affected packages, but we are slowly
 discovering them one by one, so it might takes time. So far we have:

 * Mixing modules/libraries built with pre-2.19 and 2.19 libc
 - perl
 - libpng

You can never support a mixed-ABI environment with versioning.

You must update all of those packages at once.

The best we could do is warn the user of the incompatibility at
runtime and refuse to load the module via dlopen, or refuse to start
the application at startup.

 * Using libc 2.19 without rebuilding anything:
 - gauche
 - mono

This we believe to be pthread issues.

  According to upstream [3], the problem is that Debian doesn't do a mass
  rebuild, which is the strategy chosen by Red Hat to handle^Wworkaround
  this issue. This means some programs might segfault during the upgrade,
  or on partially upgraded systems.

 I apologize if you took what I wrote to mean that. I did not mean it
 was Debian's problem, but rather that Debian suffered the most because
 they don't do rebuilds. The two are orthogonal. You face a situation
 that is unique to the framework used to build the distribution.

 Please engage upstream to champion a SO name bump for libc for

 I think that would be the correct solution. That said as it is not
 something trivial and thus not done often, it's an opportunity to push
 for more ABI changes if some others are envisaged in the future.

The problems are worse.

I just tried to simulate this on x86-64 and there are serious problems.

In most libraries you can load multiple different copies and it won't conflict.

Here libc.so.6 and libc.so.7 or libc.so.6.1 all conflict in the same
namespace and worse control aspects of the implementation like TLS. It
doesn't work to bump the SONAME.

We would have to implement a coordination framework amongst all the
SONAME bumped libc's for all of the basic functionality that had to
keep working. That would force future libcs to stay compatible
internally with other libcs and that would be very difficult to
maintain.

I am starting to think that a tooling option to fail to load mixed-ABI
objects is the only option, with user rebuilds happening after that.

  Now we have to chose a strategy for Debian. I see multiple options:
 
  1) Ignore the issue and just rebuild (binNMU) the packages that seems
  affected when we discover them. This means partial upgrades will likely
  be broken, and that we might discover some broken packages only after
  the jessie release.
 
  2) Rebuild (binNMU) all packages. This means partial upgrades will
  likely be broken.
 
  3) Bump the soname of affected packages and rebuild their reverse
  dependencies. It is the solution that is currently being implemented for
  perl. It clearly won't scale if more broken packages (and even for
  libpng) are discovered as it requires a source upload and a transition
  handled by the release team. It also means breaking the ABI compatibility
  with other distributions.
 
  4) Bump the libc soname to libc.so.6.1 and do a libc

Re: Current and upcoming toolchain changes for jessie

2013-06-19 Thread Carlos O'Donell
On Tue, Jun 18, 2013 at 8:02 PM, John David Anglin dave.ang...@bell.net wrote:
 Hi Aurelien,


 On 18-Jun-13, at 6:05 PM, Aurelien Jarno wrote:

 This is true that they have recently contacted me through another email
 address, but I haven't found time to work on that. Just stay tuned.


 That's great news.

 Helge and I have been working away as best we can to maintain the port.  I
 know everybody is busy and this is a significant effort.

... and I'm incredibly busy with upstream glibc, but I'm trying my
best to ensure
hppa keeps building upstream glibc.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAE2sS1g+63BWgBfwV8pHzRAQrpzv2V_AhZ9AmPsm4Y-h8=g...@mail.gmail.com



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex-__data.__owner == 0' failed.

2008-10-27 Thread Carlos O'Donell
On Sat, Oct 25, 2008 at 1:21 PM, Julien Danjou [EMAIL PROTECTED] wrote:
 Is there anything from an outsider that could help?

I've seen this on-and-off again on the hppa-linux port. The issue has,
in my experience, been a compiler problem. My standard operating
procedure is to methodically add volatile to the atomic.h operations
until it goes away, and then work out the compiler mis-optimization.

The bug is almost always a situation where the lll_unlock is scheduled
before owner = 0, and the assert catches the race condition where you
unlock but have not yet cleared the owner.

$0.02.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex-__data.__owner == 0' failed.

2008-10-27 Thread Carlos O'Donell
On Mon, Oct 27, 2008 at 10:05 AM, Andrew Haley [EMAIL PROTECTED] wrote:
 I've seen this on-and-off again on the hppa-linux port. The issue has,
 in my experience, been a compiler problem. My standard operating
 procedure is to methodically add volatile to the atomic.h operations
 until it goes away, and then work out the compiler mis-optimization.

 The bug is almost always a situation where the lll_unlock is scheduled
 before owner = 0, and the assert catches the race condition where you
 unlock but have not yet cleared the owner.

 Are you sure this is a compiler problem?  Unless you use explicit atomic
 memory accesses or volatile the compiler is supposed to re-order memory
 access.  Perhaps I'm misunderstanding you.

Sorry, parsing the above statement requires knowing something about
how lll_unlock is implemented in glibc.

The lll_unlock function is supposed to be a memory barrier.

The function is usually an explicit atomic operation, or a volatile
asm implementing the futex syscall i.e. INTERNAL_SYSCALL macro.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex-__data.__owner == 0' failed.

2008-10-27 Thread Carlos O'Donell
On Mon, Oct 27, 2008 at 11:27 AM, Andrew Haley [EMAIL PROTECTED] wrote:
 I understand all that, but the question still stands: is the compiler
 really moving a memory write past a memory barrier?  ISTR we did have
 a discussion on gcc-list about that, but it was a while ago and should
 now be fixed.

This issue no longer affects the PA port, but I can't speak for s390.

The PA port is the only port for which I do regular gcc / glibc testing.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: a small C program to test xdm's /dev/mem reading on your architecture

2002-08-26 Thread Carlos O'Donell

Branden,

 The long story, for those interested:
 http://lists.debian.org/debian-x/2002/debian-x-200208/msg00091.html
 (and read the whole thread)
 The short story:
 I need people with root on machines of your given architecture to
 compile and run the attached C program.  It consists of code borrowed
 from xdm's genauth.c program.
 

Done. I've submitted the output for HPPA boxes running 32 and 64-bit
kernels. Looks like they pass without any problem. I'll pass on the 
test results for an older 715/50 box (the other boxes were an A500 and
C3K).

Secondly, I will open a big can a whoopass on anyone else I see
complaining about /dev/mem reading without providng a patch :)

Branden is merely asking you to _test_ something.

c.