Re: nature of GOT bugs (was Re: Please reenable GCJ on mips

2005-10-09 Thread Nathanael Nerode
  * or a bug in ld.so -- inability to handle correctly specified multiple 
GOTs 
  for more than 16k global symbols
Thiemo wrote:
 
 That (it shouldn't segfault), and/or potentially also a bug in ld which
 leads to failure for large MultiGOT binaries.

Rocking.  It looks like most people involved (Daniel, Andreas, ...) weren't 
aware that the main problem was ld.so segfaulting until very recently -- when 
I asked, nobody pointed to the dynamic linker until you did.  Thanks.

I'll try to report this to glibc bugzilla if nobody beats me to it (it clearly 
seems to be upstream, not Debian-specific); that will make a good place for 
coordinating information on this situation, and also something to point to 
next time someone like me asks Why the *?% isn't gcj built on MIPS?.  Many 
eyes make light work and all that.

Again, thanks, and apologies to you, Matthias, and Daniel for my tone.  It 
gets very frustrating when nobody's written anything down, everyone points in 
different directions, and most of them are confused.  All better now.  :-)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



nature of GOT bugs (was Re: Please reenable GCJ on mips

2005-10-08 Thread Nathanael Nerode
Thanks *very much* for your help explaining this mess.

Thiemo Seufer wrote:
   - A too large object file can overflow plain GOT. This is not only
 MIPS-specific, it affects several architecture's toolchains,
Right, it would affect any architecture which does silly things like having a 
16-bit limit for GOT indices.

 and 
 was exposed pre-sarge (IIRC most virulently on sparc) by a
 broken/deficient libtool which relinked things into a single huge
 object file.
 libtool was fixed, and the remaining cases (like a huge blob of
 generated C code for python bindings) learned to split the C
 source to some smaller pieces, which also helped link speed.
 For MIPS, and if the need arises, this could be worked around by
 using XGOT, but see below. The real fix would be a MultiGOT
 extension to the object format, which is possible in a downward
 compatible way but not implemented yet.
From your description, I take it this does not apply to shared libraries or 
executables, only to individual .o files?  So there is a MultiGOT extension 
to the specification for shared libraries and executables, but not for 
intermediate object files?  :-O  (Or... see below for my other hypothesis.)

- MultiGOT works fine, until the limit of 16k _dynamic_ symbols is 
 hit. A executable/library with larger exported GOT will build
 without warning but will cause ld.so to segfault. This is the main
 bug, and hard to debug (a statically built gdb may help here).
Okay.

Is this considered
* a bug in the MultiGOT specification -- no specification for how to handle 
more than 16k global symbols properly on MIPS
* or a bug in ld.so -- inability to handle correctly specified multiple GOTs 
for more than 16k global symbols
From your description I'm guessing this one is the case.  In which case it's a 
bug in *glibc* which isn't in glibc Bugzilla.  Which is understandable 
considering how new glibc Bugzilla is.

* Or is this actually an artifact of the first problem?  Perhaps MultiGOT uses 
trickery to allow symbols within a single executable or shared library to 
work -- because they aren't externally visible, they can use whatever 
convention ld sees fit -- but it can't be used at an interface boundary, 
because there's not actually a real specification for it.  In this case an 
actual MultiGOT extension to the executable/library format would solve the 
problem.  But wait -- that doesn't make sense.  *This* bug does not appear to 
hit anyone but MIPS.  That means that everyone else knows how to advertise 
more than 16k of exported symbols in a library.  (Or that there's something 
funny about the MIPS ABI which causes it to require the export of a lot more 
symbols than anyone else requires.)

  - XGOT and MultiGOT are mutually exclusive, because the MultiGOT
handling ignores XGOT relocations. This is arguably not a bug,
since MultiGOT is supposed to supersede XGOT.
It is arguably a bug, I guess, that MultiGOT does not in actual fact supersede 
XGOT, failing in significant cases where XGOT works.  :-P



Re: Please reenable GCJ on mips

2005-10-08 Thread Daniel Jacobowitz
On Fri, Oct 07, 2005 at 05:39:26PM +0200, Thiemo Seufer wrote:
 We have for MIPS:
 
   - The plain GOT mode, where a GOT has a maximum size of 2^16 byte,
 with 16k symbols.
 
   - The XGOT mode, with unlimited (2^32 byte) size, which increases
 code size by 15-20%, and reduces perfomance accordingly.
 
   - MultiGOT, which creates several GOTs in a single binary for the
 final link, but uses only one GOT for imports/exports. The object
 files still have only plain GOT.
 
 MultiGOT is supposed to be the current best solution, and XGOT is
 supposed to be obsoleted by it. Normally plain GOT is used, and the
 linker switches to MultiGOT in the final link if the GOT grows too
 big.

Yes.

   - MultiGOT works fine, until the limit of 16k _dynamic_ symbols is
 hit. A executable/library with larger exported GOT will build
 without warning but will cause ld.so to segfault. This is the main
 bug, and hard to debug (a statically built gdb may help here).
 This hits currently (at least) the gcj shared library runtime,
 the ghc executable, and libgklayout.so in mozilla*. A workaround
 involving XGOT is possible in some cases, and was done for the
 mozillae (and some others, grepping for -xgot in build logs seems
 to be the most reliable way to find them all). Dynamically linked
 executables/shared libraries with any of the different internal GOT
 models are freely mixable.

If you'll give me an explicit testcase, I will volunteer to debug this
for you; I have lots of practice debugging ld.so.  Is this really the
main bug at this point?  I.E. multigot binaries not working rather than
not linking?

-- 
Daniel Jacobowitz
CodeSourcery, LLC


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-08 Thread Thiemo Seufer
Daniel Jacobowitz wrote:
[snip]
- MultiGOT works fine, until the limit of 16k _dynamic_ symbols is
  hit. A executable/library with larger exported GOT will build
  without warning but will cause ld.so to segfault. This is the main
  bug, and hard to debug (a statically built gdb may help here).
  This hits currently (at least) the gcj shared library runtime,
  the ghc executable, and libgklayout.so in mozilla*. A workaround
  involving XGOT is possible in some cases, and was done for the
  mozillae (and some others, grepping for -xgot in build logs seems
  to be the most reliable way to find them all). Dynamically linked
  executables/shared libraries with any of the different internal GOT
  models are freely mixable.
 
 If you'll give me an explicit testcase, I will volunteer to debug this
 for you; I have lots of practice debugging ld.so.

Unfortunately the testcase is mozilla's libgklayout.so, Which isn't
exactly handy. I'll try to come up with something better the next days.

 Is this really the
 main bug at this point?  I.E. multigot binaries not working rather than
 not linking?

It was that way the last time I looked.


Thiemo


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-08 Thread Daniel Jacobowitz
On Sat, Oct 08, 2005 at 09:11:05PM +0200, Thiemo Seufer wrote:
 Daniel Jacobowitz wrote:
 [snip]
 - MultiGOT works fine, until the limit of 16k _dynamic_ symbols is
   hit. A executable/library with larger exported GOT will build
   without warning but will cause ld.so to segfault. This is the main
   bug, and hard to debug (a statically built gdb may help here).
   This hits currently (at least) the gcj shared library runtime,
   the ghc executable, and libgklayout.so in mozilla*. A workaround
   involving XGOT is possible in some cases, and was done for the
   mozillae (and some others, grepping for -xgot in build logs seems
   to be the most reliable way to find them all). Dynamically linked
   executables/shared libraries with any of the different internal GOT
   models are freely mixable.
  
  If you'll give me an explicit testcase, I will volunteer to debug this
  for you; I have lots of practice debugging ld.so.
 
 Unfortunately the testcase is mozilla's libgklayout.so, Which isn't
 exactly handy. I'll try to come up with something better the next days.

It'll do if you can tell me exactly how to reproduce the problem; I
never volunteered to look at this before because I didn't have a
Debian/MIPS setup, but now I do.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-08 Thread Daniel Jacobowitz
On Sat, Oct 08, 2005 at 03:18:23PM -0400, Nathanael Nerode wrote:
 What I keep hearing is that no one has reported the bug(s), and nobody except 
 Thiemo Seufer has even described it/them adequately.  This is a bug or bugs 
 which is not documented in the documentation or bug databases for glibc, 
 binutils, gcc, Debian, or anywhere else.  It's apparently a substantial and 
 reproducible bug which hits any library or executable with really large 
 numbers of exported symbols.  The GCC documentation suggests a fix (xgot) 
 which doesn't actually work.

No, Nathanael, this is not what you keep _hearing_.  It's what you keep
_saying_.  I'm aware that you seem to spend a lot of time listening to
yourself and I've gotten quite tired of hearing you repeat yourself.

It doesn't have a clear entry in any bugzilla because there's a lot of
confusion over various bits of (A) whether particular things are bugs,
or (B) whose bugs they are.  But the people who have encountered it,
which is not limited strictly to Thiemo obviously, are familiar with
the problem.

 Now, I understand this sort of stuff not being dealt with for a while.  But 
 the nature of the problem has supposedly been known for a year or more, and 
 so a little documentation of known limitations is really the least I'd 
 expect.

It's free software.  You're welcome to figure out the problem,
preferably with less insulting the reviewers, and submit a patch to the
documentation.

 m68k is known to be in a situation where serious toolchain bugs are not 
 reported upstream.  I thought previously that it was the only such 
 architecture.

I'm pretty sure this has been reported upstream.  It's not in the bug
tracking system upstream.  That's not the same thing.  Who do you think
would fix it?  Hint, probably me or Thiemo.  No one else has been
interested in working on this stuff in the past.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-08 Thread Nathanael Nerode
Daniel Jacobowitz wrote:
 It's a lot of work to fix and no one has done it.  That's not the same
 thing at all.

That's nice, but there's still a real problem unrelated to that.

An example of a relatively healthy bug which is a lot of work to fix and no 
one has done it is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6257, the 
problem of distinguishing between #include cstdio and #include stdio.h in 
C++ programs and getting the collection of included symbols correct for both 
cases.  There's a fairly substantial amount of information on the problems 
and attempted solutions.

Another example is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5911 -- the 
screwball way Ada is built in GCC -- where the work to fix it is totally 
straightforward, but very large and very tedious.

What I keep hearing is that no one has reported the bug(s), and nobody except 
Thiemo Seufer has even described it/them adequately.  This is a bug or bugs 
which is not documented in the documentation or bug databases for glibc, 
binutils, gcc, Debian, or anywhere else.  It's apparently a substantial and 
reproducible bug which hits any library or executable with really large 
numbers of exported symbols.  The GCC documentation suggests a fix (xgot) 
which doesn't actually work.

That is bad.
* Either ld or gcc (or both) should note in its documentation that xgot is 
incompatible with multigot.  Alternatively, there should be a bug report 
against ld because of this.  I haven't determined which is considered correct 
yet.

* The failure of multigot to support 16K of exported symbols is a bug 
somewhere, but I'm still not clear whether it's an ABI limitation or or a bug 
in the dynamic linker.  If the latter, it needs to be reported.  If the 
former, it needs to be documented.

Now, I understand this sort of stuff not being dealt with for a while.  But 
the nature of the problem has supposedly been known for a year or more, and 
so a little documentation of known limitations is really the least I'd 
expect.

m68k is known to be in a situation where serious toolchain bugs are not 
reported upstream.  I thought previously that it was the only such 
architecture.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: nature of GOT bugs (was Re: Please reenable GCJ on mips

2005-10-08 Thread Thiemo Seufer
Nathanael Nerode wrote:
 Thanks *very much* for your help explaining this mess.
 
 Thiemo Seufer wrote:
- A too large object file can overflow plain GOT. This is not only
  MIPS-specific, it affects several architecture's toolchains,
 Right, it would affect any architecture which does silly things like having a 
 16-bit limit for GOT indices.
 
  and 
  was exposed pre-sarge (IIRC most virulently on sparc) by a
  broken/deficient libtool which relinked things into a single huge
  object file.
  libtool was fixed, and the remaining cases (like a huge blob of
  generated C code for python bindings) learned to split the C
  source to some smaller pieces, which also helped link speed.
  For MIPS, and if the need arises, this could be worked around by
  using XGOT, but see below. The real fix would be a MultiGOT
  extension to the object format, which is possible in a downward
  compatible way but not implemented yet.
 From your description, I take it this does not apply to shared libraries or 
 executables, only to individual .o files?  So there is a MultiGOT extension 
 to the specification for shared libraries and executables, but not for 
 intermediate object files?

Yes.

 :-O  (Or... see below for my other hypothesis.)

No need for hypotheses.

 - MultiGOT works fine, until the limit of 16k _dynamic_ symbols is 
  hit. A executable/library with larger exported GOT will build
  without warning but will cause ld.so to segfault. This is the main
  bug, and hard to debug (a statically built gdb may help here).
 Okay.
 
 Is this considered
 * a bug in the MultiGOT specification -- no specification for how to handle 
 more than 16k global symbols properly on MIPS

No.

 * or a bug in ld.so -- inability to handle correctly specified multiple GOTs 
 for more than 16k global symbols

That (it shouldn't segfault), and/or potentially also a bug in ld which
leads to failure for large MultiGOT binaries.

 From your description I'm guessing this one is the case.  In which case it's 
 a 
 bug in *glibc* which isn't in glibc Bugzilla.  Which is understandable 
 considering how new glibc Bugzilla is.
 
 * Or is this actually an artifact of the first problem?

No. It is limited to dynamic symbols.


Thiemo


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Wolfgang Baer

Hi Nathanael,

Nathanael Nerode wrote:

And for pkg-java-maintainers:
* Why was kaffe deliberately broken on mips and mipsel?  


It was never _deliberately_ broken on mips and mipsel. On mipsel jikes
went broken and as there is no gcj available we have no option to compile
kaffe there at the moment.

For mips there was one upload which switched to ecj as we have done
for arm to reenable this architecture. But this was reverted at once
in the next upload to jikes (as jikes on mips currently still works).

* If this was being done with the intention of removing kaffe on those 
architectures, why isn't there a bug against ftp.debian.org requesting the 
removal of the obsolete binaries?  For mipsel, at least, this is still 
needed.


You are right that this bug has not yet been filed. Done now.

Wolfgang


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Andreas Barth
Hi,

* Nathanael Nerode ([EMAIL PROTECTED]) [051007 04:42]:
 Matthias Klose wrote:
   If
  you think, that availability of compilers on some architectures
  should be release criterium, please bring that up with the release
  team first.
 That's not at all what I think.
 
 I think that if there are known binutils bugs for your architecture, which 
 supposedly prevent the build of multiple packages --
 /either/ forwarding them upstream
 /or/ fixing them if they're Debian-specific
 /or/ closing them if they're bogus
 within a reasonable amount of time (less than a year)
 should be a requirement for a port to be considered.

Actually, there is one criterion missing: Does this bug really hurt us
bad (enough)? And my current answer to this is no, but of course, you
might want to persuade me. :)

My current understanding of this bug is that mips Inc. is working on a
new abi that will fix this (and other) issues way better than the
current xgot-vs-multigot-way allows us to do. There was some discussion
between Thiemo, someone from mips, Matthias Klose (doko) and me on the
porters meeting in Oldenburg, and I'm quite optimistic that it's on
a good way. However, making such a massive change is not something that
can be done very fast.

So, I think we can say that this bug is even forwarded to upstream, as
mips Inc is aware of it and working on a fix.


Cheers,
Andi


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Nathanael Nerode

Andreas Barth wrote:

Actually, there is one criterion missing: Does this bug really hurt us
bad (enough)? And my current answer to this is no, but of course, you
might want to persuade me. :)

...


So, I think we can say that this bug is even forwarded to upstream, as
mips Inc is aware of it and working on a fix.


I begin to get the picture.

Apparently the MIPS ABI is just plain broken.  It contains some sort of 
impassable hard limit on relocation table size, breaking random packages at 
random times with no possible fix.  Nobody can fix this without changing the 
ABI.


Lovely.  Good grief, I would not want to support this architecture under 
those circumstances, but as long as it doesn't interfere with supporting 
other architectures, if you think you can do it, that's fine.


It seems to me that at a minimum, whenever this bug gets hit any fallout 
should be prevented from interfering
with any other architectures.  In other words, a GOT table overflow on MIPS 
should immediately mean ignoring MIPS for purposes of testing propagation of 
that package and all indirectly dependent packages.


It's a pity there isn't an automated way to do this (except to ignore MIPS 
for all testing progression).  Perhaps the MIPS porters could file the 
appropriate requests for removal of obsolete binaries promptly, or 
something.


What the kaffe maintainers did -- uploading a package which was deliberately 
unbuildable on mips, without requesting removal of old mips binaries, and 
without restricting the architecture list for kaffe -- was really bizarre 
and misleading, and I can't think of a single good reason to do it that way.


It would also be a kindness if the reason why GCJ was disabled on MIPS 
(MIPS ABI is broken) was listed clearly somewhere.



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Thiemo Seufer
Nathanael Nerode wrote:
 Matthias Klose wrote:
   If
  you think, that availability of compilers on some architectures
  should be release criterium, please bring that up with the release
  team first.
 That's not at all what I think.
 
 I think that if there are known binutils bugs for your architecture, which 
 supposedly prevent the build of multiple packages --
 /either/ forwarding them upstream
 /or/ fixing them if they're Debian-specific
 /or/ closing them if they're bogus
 within a reasonable amount of time (less than a year)
 should be a requirement for a port to be considered.
 
 Does the release team agree or disagree?
 
 According to Thiemo Seufer, MIPS has failed this criterion.

You are mistaken (since I'm also upstream). I notice you seem to
triage pre-sarge bug reports, maybe you want to ask the participants
of the bugs discussion first before jumping to conclusions.

 He said that GCJ is not present and does not build due to an ld bug which 
 also 
 affected ghc (http://lists.debian.org/debian-gcc/2005/10/msg00051.html).  
 However, contrary to his claim, there are no bug reports filed regarding this 
 for ghc.  The only such bug I could find was 
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274738.  This bug is *not* 
 reported upstream.  It has had no activity since November 2004.  According to 
 David Daney (http://lists.debian.org/debian-mips/2004/10/msg00016.html) and 
 indeed Matthias Klose 
 (http://lists.debian.org/debian-mips/2004/10/msg00020.html) it is 
 unreproducible.

I asked David at that time for which configuration he got actually
working large executables/libraries which don't segfault on startup.
I got no response.


Thiemo


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Thiemo Seufer
Andreas Barth wrote:
 Hi,
 
 * Nathanael Nerode ([EMAIL PROTECTED]) [051007 04:42]:
  Matthias Klose wrote:
If
   you think, that availability of compilers on some architectures
   should be release criterium, please bring that up with the release
   team first.
  That's not at all what I think.
  
  I think that if there are known binutils bugs for your architecture, which 
  supposedly prevent the build of multiple packages --
  /either/ forwarding them upstream
  /or/ fixing them if they're Debian-specific
  /or/ closing them if they're bogus
  within a reasonable amount of time (less than a year)
  should be a requirement for a port to be considered.
 
 Actually, there is one criterion missing: Does this bug really hurt us
 bad (enough)? And my current answer to this is no, but of course, you
 might want to persuade me. :)
 
 My current understanding of this bug is that mips Inc. is working on a
 new abi that will fix this (and other) issues way better than the
 current xgot-vs-multigot-way allows us to do. There was some discussion
 between Thiemo, someone from mips, Matthias Klose (doko) and me on the
 porters meeting in Oldenburg, and I'm quite optimistic that it's on
 a good way. However, making such a massive change is not something that
 can be done very fast.

Actually, the new ABI isn't more than a plan by now, and it's scope
and eventual implementation isn't decided upon yet. The bug we discuss
here may go away with an ABI change, but it surely is fixable in the
current one as well.


Thiemo


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Thiemo Seufer
Nathanael Nerode wrote:
 Andreas Barth wrote:
 Actually, there is one criterion missing: Does this bug really hurt us
 bad (enough)? And my current answer to this is no, but of course, you
 might want to persuade me. :)
 ...
 
 So, I think we can say that this bug is even forwarded to upstream, as
 mips Inc is aware of it and working on a fix.
 
 I begin to get the picture.
 
 Apparently the MIPS ABI is just plain broken.  It contains some sort of 
 impassable hard limit on relocation table size, breaking random packages at 
 random times with no possible fix.  Nobody can fix this without changing 
 the ABI.

That's wrong.

 Lovely.  Good grief, I would not want to support this architecture under 
 those circumstances, but as long as it doesn't interfere with supporting 
 other architectures, if you think you can do it, that's fine.
 
 It seems to me that at a minimum, whenever this bug gets hit any fallout 
 should be prevented from interfering
 with any other architectures.  In other words, a GOT table overflow on MIPS 
 should immediately mean ignoring MIPS for purposes of testing propagation 
 of that package and all indirectly dependent packages.

Which is what happened before sarge by removing the affected packages
for mips/mipsel.


Thiemo


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Daniel Jacobowitz
On Fri, Oct 07, 2005 at 05:16:28AM -0400, Nathanael Nerode wrote:
 I begin to get the picture.
 
 Apparently the MIPS ABI is just plain broken.  It contains some sort of 
 impassable hard limit on relocation table size, breaking random packages at 
 random times with no possible fix.  Nobody can fix this without changing 
 the ABI.
 
 Lovely.  Good grief, I would not want to support this architecture under 
 those circumstances, but as long as it doesn't interfere with supporting 
 other architectures, if you think you can do it, that's fine.

You don't get the picture.  In fact the above is completely wrong.  I
recall explaining this to you yesterday.

It's a lot of work to fix and no one has done it.  That's not the same
thing at all.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Nathanael Nerode
 Apparently the MIPS ABI is just plain broken.  It contains some sort of 
 impassable hard limit on relocation table size, breaking random packages at 
 random times with no possible fix.  Nobody can fix this without changing 
 the ABI.


Thiemo Seufer wrote:
That's wrong.

OK.  Can somebody *describe* the damned bug?  Is it a hard limit on GOT size 
which can't be exceeded (but breaks a predictable collection of packages)?  
Is it a bug in the way ld constructs the GOT (say, not subdividing it 
properly into multiple GOTs for 'multigot')?  Is it a bug in the way the gcj 
Makefile *uses* ld, preventing ld from having the right information to 
construct the GOT?  (If it's the latter, I can almost certainly fix it; I'm a 
configury maintainer upstream for GCC.)

Or is it a  mystical bug which nobody can actually describe which causes GOT 
overflow in mysterious cases for mysterious reasons not to be questioned by 
mortal men?  Because that's what it seems like, what with all these vague and 
meaningless comments I'm hearing.  Does anyone actually know what the problem 
is?


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-07 Thread Thiemo Seufer
Nathanael Nerode wrote:
  Apparently the MIPS ABI is just plain broken.  It contains some sort of 
  impassable hard limit on relocation table size, breaking random packages 
  at 
  random times with no possible fix.  Nobody can fix this without changing 
  the ABI.
 
 
 Thiemo Seufer wrote:
 That's wrong.
 
 OK.  Can somebody *describe* the damned bug?  Is it a hard limit on GOT size 
 which can't be exceeded (but breaks a predictable collection of packages)?  
 Is it a bug in the way ld constructs the GOT (say, not subdividing it 
 properly into multiple GOTs for 'multigot')?  Is it a bug in the way the gcj 
 Makefile *uses* ld, preventing ld from having the right information to 
 construct the GOT?  (If it's the latter, I can almost certainly fix it; I'm a 
 configury maintainer upstream for GCC.)
 
 Or is it a  mystical bug which nobody can actually describe which causes GOT 
 overflow in mysterious cases for mysterious reasons not to be questioned by 
 mortal men?  Because that's what it seems like, what with all these vague and 
 meaningless comments I'm hearing.  Does anyone actually know what the problem 
 is?

As far as I debugged it before sarge, it is a collection of bugs, or
insufficiencies which may qualify as bugs depending on the context.
Btw, I haven't seen anyone mentioning this problem except for Debian
and Gentoo, it seems the embedded mips vendors have not much use for
insanely large binaries. Debian is pushing the envelope here.

We have for MIPS:

  - The plain GOT mode, where a GOT has a maximum size of 2^16 byte,
with 16k symbols.

  - The XGOT mode, with unlimited (2^32 byte) size, which increases
code size by 15-20%, and reduces perfomance accordingly.

  - MultiGOT, which creates several GOTs in a single binary for the
final link, but uses only one GOT for imports/exports. The object
files still have only plain GOT.

MultiGOT is supposed to be the current best solution, and XGOT is
supposed to be obsoleted by it. Normally plain GOT is used, and the
linker switches to MultiGOT in the final link if the GOT grows too
big.

The problems, as far as I know about them:

  - A too large object file can overflow plain GOT. This is not only
MIPS-specific, it affects several architecture's toolchains, and
was exposed pre-sarge (IIRC most virulently on sparc) by a
broken/deficient libtool which relinked things into a single huge
object file.
libtool was fixed, and the remaining cases (like a huge blob of
generated C code for python bindings) learned to split the C
source to some smaller pieces, which also helped link speed.
For MIPS, and if the need arises, this could be worked around by
using XGOT, but see below. The real fix would be a MultiGOT
extension to the object format, which is possible in a downward
compatible way but not implemented yet.

  - MultiGOT works fine, until the limit of 16k _dynamic_ symbols is
hit. A executable/library with larger exported GOT will build
without warning but will cause ld.so to segfault. This is the main
bug, and hard to debug (a statically built gdb may help here).
This hits currently (at least) the gcj shared library runtime,
the ghc executable, and libgklayout.so in mozilla*. A workaround
involving XGOT is possible in some cases, and was done for the
mozillae (and some others, grepping for -xgot in build logs seems
to be the most reliable way to find them all). Dynamically linked
executables/shared libraries with any of the different internal GOT
models are freely mixable.

  - XGOT and MultiGOT are mutually exclusive, because the MultiGOT
handling ignores XGOT relocations. This is arguably not a bug,
since MultiGOT is supposed to supersede XGOT. The resulting binary
will crash at the first XGOT relocation outside the plain GOT
limit. The fact that the linker accepts XGOT when being in MultiGOT
mode is a bug. Worse, upstream binutils always invoke MultiGOT
linking, even when all object files are XGOT.
Fixing it requires the addition of a XGOT flag to the object file.
Currently there is a XGOT ELF header flag, but it was used by SGI
for unknown purpose, reusing it may not be advisable. Obsoleting
XGOT altogether may be more advisable...

The partial XGOT workaround involves:

  - switching ld from MultiGOT to XGOT linking once it sees more than
16k exported symbols. This is, in the general picture, a step
backwards, and thus a debian-specific binutils patch. It needs to
be removed once ld.so is fixed, and all -xgot users (see below)
will need to be changed at that time.

  - Add -xgot (or -Wa,-xgot) to the CFLAGS for objects which go in the
big binary. Make sure everything else has _no_ -xgot flag, since
this may break a MultiGOT link below the critical size. Make sure
the same is true for static library objects.

The workaround fails as soon as it needs to link in non-xgot files,

Please reenable GCJ on mips

2005-10-06 Thread Nathanael Nerode
I notice that GCJ ( company) are not built on mips or mipsel.
What I can't figure out is why.

GCJ is supported for mips*-*-linux* (except for mips64*-*-linux*, which
is not supported) upstream in the 4.0 series, and I couldn't find any
reported bugs on problems specific to it.

GCJ was orginially disabled for mips way back when gcc 3.0 was uploaded in
2001 because libffi was not ported to mips.  It has been ported to mips/mipsel
long, long ago by now (and is in fact supplied on mips in the current packages.)

Matthias, could you please reenable building of GCJ et al on mips, unless you
know something I don't?

This would probably make kaffe work on mipsel automatically as well (and
allow it to work with ecj on mips).  Cc:ing pkg-java-maintainers in case they'd
like to switch kaffe back to ecj on mips.

-- 
Nathanael Nerode  [EMAIL PROTECTED]

A thousand reasons. http://www.thousandreasons.org/
Lies, theft, war, kidnapping, torture, rape, murder...
Get me out of this fascist nightmare!


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-06 Thread Thiemo Seufer
Nathanael Nerode wrote:
 I notice that GCJ ( company) are not built on mips or mipsel.
 What I can't figure out is why.
 
 GCJ is supported for mips*-*-linux* (except for mips64*-*-linux*, which
 is not supported) upstream in the 4.0 series, and I couldn't find any
 reported bugs on problems specific to it.

The toolchain (more specifically: ld) can't handle the gcj runtime
build yet. Btw, ghc triggers the same problem, there were several
reports filed for the various instances of this bug.


Thiemo


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-06 Thread Nathanael Nerode
[EMAIL PROTECTED]:
The toolchain (more specifically: ld) can't handle the gcj runtime
build yet. Btw, ghc triggers the same problem, there were several
reports filed for the various instances of this bug.

Ah, I see.  So it's binutils which can't handle it.  No wonder I couldn't find 
any bug reports.  :-P

Is this the GOT handling problem noted at 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274738 ?
If so, http://lists.debian.org/debian-mips/2004/10/msg00016.html
and http://lists.debian.org/debian-mips/2004/10/msg00020.html
indicate that this is not reproducible.Furthermore it doesn't appear to 
have been tested since then.

I can't find any reference to this bug in the ghc6 or ghc5 bug reports, 
current or archived.  There is also no report for this bug in binutils 
Bugzilla (http://sourceware.org/bugzilla) or on the old bug-binutils mailing 
list (http://lists.gnu.org/archive/html/bug-binutils/ ).

This is no way to get a bug fixed.  If this is seriously the level of 
attention to mips and mipsel, Debian support for them should be dropped.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-06 Thread Matthias Klose
Nathanael Nerode writes:
 This is no way to get a bug fixed.  If this is seriously the level of 
 attention to mips and mipsel, Debian support for them should be dropped.

sorry, this attitude has nothing to do with release management, it's
just ranting.  The problem is addressed, known to the right people.
Just ask if you cannot find some information.  There are other
architectures where java builds, but isn't usable, same for Ada.  If
you think, that availability of compilers on some architectures
should be release criterium, please bring that up with the release
team first.

  Matthias



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-06 Thread Nathanael Nerode
Nathanael Nerode writes:
 This is no way to get a bug fixed.  If this is seriously the level of 
 attention to mips and mipsel, Debian support for them should be dropped.

Matthias Klose wrote:
sorry, this attitude has nothing to do with release management, it's
just ranting.

The problem is addressed, known to the right people.
Sure doesn't look like it; at the very least, there's a failure of openness in 
the processes here.  This really is no way to get a bug fixed.  The failure 
to report the bug upstream was what really got to me.

Just ask if you cannot find some information.
All right.
* What's wrong with ld on mips/mipsel?
* What's the last time a gcj build was tested on mips/mipsel, what version of 
ld was it tested with, and where are the results?
* Why isn't the problem reported upstream to binutils?  I know it's not, since 
I checked.
* If it's Debian-specific, has it been tracked to a particular part of 
Debian's configuration of binutils?  If not, which mips porter is working on 
that?

And for pkg-java-maintainers:
* Why was kaffe deliberately broken on mips and mipsel?  
* If this was being done with the intention of removing kaffe on those 
architectures, why isn't there a bug against ftp.debian.org requesting the 
removal of the obsolete binaries?  For mipsel, at least, this is still 
needed.

-- 
Nathanael Nerode  [EMAIL PROTECTED]

This space intentionally left blank.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-06 Thread Matthias Klose
Nathanael Nerode writes:
 Nathanael Nerode writes:
  This is no way to get a bug fixed.  If this is seriously the level of 
  attention to mips and mipsel, Debian support for them should be dropped.
 
 Matthias Klose wrote:
 sorry, this attitude has nothing to do with release management, it's
 just ranting.
 
 The problem is addressed, known to the right people.
 Sure doesn't look like it; at the very least, there's a failure of openness 
 in 
 the processes here.  This really is no way to get a bug fixed.  The failure 
 to report the bug upstream was what really got to me.
 
 Just ask if you cannot find some information.
 All right.
 * What's wrong with ld on mips/mipsel?
 * What's the last time a gcj build was tested on mips/mipsel, what version of 
 ld was it tested with, and where are the results?

current gcc-4.0 and gcc-snapshot packages, using current binutils packages.

 * Why isn't the problem reported upstream to binutils?  I know it's not, 
 since 
 I checked.

AFAIK it's not just a binutils problem.

 * If it's Debian-specific, has it been tracked to a particular part of 
 Debian's configuration of binutils?  If not, which mips porter is working on 
 that?

it's not Debian specific. 

 And for pkg-java-maintainers:
 * Why was kaffe deliberately broken on mips and mipsel?  
 * If this was being done with the intention of removing kaffe on those 
 architectures, why isn't there a bug against ftp.debian.org requesting the 
 removal of the obsolete binaries?  For mipsel, at least, this is still 
 needed.
 
 -- 
 Nathanael Nerode  [EMAIL PROTECTED]
 
 This space intentionally left blank.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Please reenable GCJ on mips

2005-10-06 Thread Nathanael Nerode
Matthias Klose wrote:
  If
 you think, that availability of compilers on some architectures
 should be release criterium, please bring that up with the release
 team first.
That's not at all what I think.

I think that if there are known binutils bugs for your architecture, which 
supposedly prevent the build of multiple packages --
/either/ forwarding them upstream
/or/ fixing them if they're Debian-specific
/or/ closing them if they're bogus
within a reasonable amount of time (less than a year)
should be a requirement for a port to be considered.

Does the release team agree or disagree?

According to Thiemo Seufer, MIPS has failed this criterion.

He said that GCJ is not present and does not build due to an ld bug which also 
affected ghc (http://lists.debian.org/debian-gcc/2005/10/msg00051.html).  
However, contrary to his claim, there are no bug reports filed regarding this 
for ghc.  The only such bug I could find was 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=274738.  This bug is *not* 
reported upstream.  It has had no activity since November 2004.  According to 
David Daney (http://lists.debian.org/debian-mips/2004/10/msg00016.html) and 
indeed Matthias Klose 
(http://lists.debian.org/debian-mips/2004/10/msg00020.html) it is 
unreproducible.

(The ChangeLog for Debian's GCC package mentions that GCJ was disabled for 
MIPS way back when gcc-3.0 was uploaded because libffi was not ported at that 
time.  It has been ported for a long time by now.  There is no other mention 
in the GCC changelog as to why GCJ is disabled for mips and mipsel.  The only 
other explanation I have found is Thiemo's.)

-- 
Nathanael Nerode  [EMAIL PROTECTED]

A thousand reasons. http://www.thousandreasons.org/
Lies, theft, war, kidnapping, torture, rape, murder...
Get me out of this fascist nightmare!


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]