[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2022-07-23 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #35 from Luke Kenneth Casson Leighton  ---
On Sat, Jul 23, 2022 at 3:04 PM amodra at gmail dot com
 wrote:

> And "new algorithm needed" is really saying "rewrite the linker".

i mention this very early on in this bugreport: back in the early 90s
it was indeed rewritten,
to remove Dr Stallman's algorithms, on the flawed assumption

  "640k^H^H^H^H 4GB should be enough for anybody".

> That's low
> priority.  Also, there are other linkers, eg. gold and lld, that are much 
> newer
> than ld.bfd.

gold suffers from similar problems - i was able to make it keel over just as
easily.

i've not heard of lld before: if it likewise makes the same flawed assumption
that going into swap is acceptable, it will likewise result in the exact same
problem.

>  They don't do much better at memory usage, do they?

if Dr Stallman's carefully-crafted original algorithms had been
left in place, which, just as in gcc, made *really certain* to only use
*resident* RAM,  we would not be having this conversation as
this bugreport would not need to be raised.

the fundamental flawed assumption is that it's "ok to use swap".

the sheer overwhelming amount of cross-referencing required in a
linker *100% guarantee* that even 10 kbytes over resident RAM
will result in thrashing.

any rewrite or redesign that does not take that into account is 100%
guaranteed to be problematic. this is just how it is: it's basic fundamental
computer science that a linker *has* to jump around across the
entirety of *all* of the objects it's trying to link.  this makes the
"Working Set" *equal* to 100% of the available Swap, which is
unfortunately the very definition of "thrash conditions".

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-26 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #30 from Luke Kenneth Casson Leighton  ---
cross-reference here, raised priority critical bug in the debian bugtracker
as well:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=919882

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-16 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #29 from Luke Kenneth Casson Leighton  ---
i tried the same massive 6GB link as was carried out under an i386 (32-bit)
chroot.  this time both of them succeeded.  ld-bfd with --no-keep-memory
succeeded as before with a warning, using only 280mb during the linker
phase (the number of functions called had been increase:
python evil_linker_torture.py 800 400 20 800)

ld-gold *also* succeeded, once again requiring 6.5 GB of resident RAM
to carry out the link [on a 64-bit system].

it would appear that the options recommended to use in comment #25 do
not prevent ld-gold from mallocing the full memory of the full size
of the target executable.

consequently, attempting to link a 6 GB executable on a 32-bit system
(with an obvious limit of 4GB) is guaranteed to fail.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-16 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #28 from Luke Kenneth Casson Leighton  ---
(In reply to Luke Kenneth Casson Leighton from comment #27)

> ld-bfd - with "--no-keep-memory" - only requires 750 MB of resident RAM, to
> link the exact same 6GB executable.

(and aside from a warning "i686-linux-gnu-ld.bfd: warning: cannot find entry
symbol _start; defaulting to 08048094",  succeeded.

correction: i had added one too many "0s" onto the evil python command.
however after correction, the results are exactly the same:

debian-i386-chroot$ python evil_linker_torture.py 80 40 20 8000

* make maing FAILs
* make main SUCCEEDs (except only using 85mb for the linker phase)

so this is far more complex and involved than it first appears.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-16 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

Luke Kenneth Casson Leighton  changed:

   What|Removed |Added

  Attachment #11522|0   |1
is obsolete||

--- Comment #27 from Luke Kenneth Casson Leighton  ---
Created attachment 11540
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11540=edit
updated version of liinker torturer

i decided to run an i386 debian chroot, using a variant of
evil_linker_torture.py
and running these options:
$python evil_linker_torture.py 80 40 20 8

the results of the link (an error) are below:

$ make -j8 maing
i686-linux-gnu-ld.gold src0.o src1.o src10.o src11.o src12.o src13.o src14.o
src15.o src16.o src17.o src18.o src19.o src2.o src20.o src21.o src22.o src23.o
src24.o src25.o src26.o src27.o src28.o src29.o src3.o src30.o src31.o src32.o
src33.o src34.o src35.o src36.o src37.o src38.o src39.o src4.o src40.o src41.o
src42.o src43.o src44.o src45.o src46.o src47.o src48.o src49.o src5.o src50.o
src51.o src52.o src53.o src54.o src55.o src56.o src57.o src58.o src59.o src6.o
src60.o src61.o src62.o src63.o src64.o src65.o src66.o src67.o src68.o src69.o
src7.o src70.o src71.o src72.o src73.o src74.o src75.o src76.o src77.o src78.o
src79.o src8.o src9.o -g -g -g --no-mmap-output-file --no-map-whole-files
--no-keep-files-mapped --no-keep-memory -o maing
i686-linux-gnu-ld.gold: internal error in convert_types, at
../../gold/gold.h:192

this is with the following version:
$ gold --version
GNU gold (GNU Binutils for Debian 2.28) 1.14

the most likely reason is that the size of the executable is over 6GB, and a
32-bit version of gold cannot cope.

when run on 64-bit it does fine, the only strange thing being that it still
requires 7GB of resident RAM to link a 6GB executable.

ld-bfd - with "--no-keep-memory" - only requires 750 MB of resident RAM, to
link the exact same 6GB executable.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-13 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #26 from Luke Kenneth Casson Leighton  ---
(In reply to Ian Lance Taylor from comment #25)
> When using gold the key options are --no-mmap-output-file
> --no-map-whole-files --no-keep-files-mapped.  Can you confirm that those
> options--all of them together--were tried with gold?

hi ian,

as i mentioned to hj, i personally do not have safe resources
(resource that would not be damaged by doing so).  i am acting as a
go-between, alerting various people to the nature of this bug.

i will contact them and alert them to the options that you describe.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-09 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #24 from Luke Kenneth Casson Leighton  ---
hiya nick, thanks for trying out the torture program.  basically the
parameters there generate a 6.1mb object file (with gcc 7.3), and 3000x
that equals an 18 gbytes executable.

so, it's possible to work out what needs to be done: increase the 2nd
or 3rd parameter directly proportionately so as to ensure that the object
file increases to where the available RAM will be exceeded.

regarding ld-gold:
https://lists.debian.org/debian-devel/2019/01/msg00069.html

so no, it doesn't work. mike hommey tried gnu gold for firefox
on debian 32-bit: everything he's tried has failed.  that leaves
cross-compiling using a 64-bit system as literally *the* only
option (which is completely unacceptable as a band-aid "solution")

regarding "-g -g -g": it increases the amount of debug information,
and consequently is a quick-hack way to increase the size of the
output binary.


regarding the evil idea of letting the limit be hit and weeding out
applications that try it, on the basis that it's pretty insane to
have such massive static executables: i really like it :)

... except... the first casualty is already being hit, and that's
*all* 32-bit hardware.  armhf, armel, i686, MIPS32 and a few more
besides.  all distros supporting 32-bit hardware are currently
going through hell, and/or are *DROPPING* 32-bit support entirely,
whilst 64-bit hardware continues to "accept" the insane inexorable
increase in static executable size.

so, perfectly good 32-bit hardware is being thrown into landfill
because there's absolutely no way they can get hold of a modern
distro that works on it...

... all because of this one bug that dates back to a short-sighted
decision from the late 1990s.

hence why i raised this to priority one critical level a couple of
days ago.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-08 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #22 from Luke Kenneth Casson Leighton  ---
Created attachment 11522
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11522=edit
repro test case

attached is a test file that can generate a Makefile and associated
header and c files that will easily exceed the capacity of a 64-bit
system to cope with.

here are arguments to the script that will cause GNU ld to attempt
to create an EIGHTEEN GIGABYTE executable.

$ python evil_linker_torture.py 3000 400 200 50

a 32-bit system will be completely unable to cope with this, as it
hopelessly exceeds the 4GB resident limit by 450%.  when compiled on
a 64-bit system it was necessary to terminate it with prejudice, as
by the time it got to 9.5GB resident memory it was in danger of putting
the compile host into severe and irrecoverable swap thrashing.


if the Makefile is modified to include the option
"-Wl,--no-keep-memory", the following output is
generated and the errors result in the link phase terminating
unsuccessfully.


ld: warning: cannot find entry symbol _start; defaulting to 00401000
ld: src9.o: in function `fn_9_0':
/home/lkcl/src/ld_torture/src9.c:3006:(.text+0x27): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1149_322' defined
in .text section in src1149.o
ld: /home/lkcl/src/ld_torture/src9.c:3008:(.text+0x41): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1387_379' defined
in .text section in src1387.o
ld: /home/lkcl/src/ld_torture/src9.c:3014:(.text+0x8f): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1821_295' defined
in .text section in src1821.o
ld: /home/lkcl/src/ld_torture/src9.c:3015:(.text+0x9c): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1082_189' defined
in .text section in src1082.o
ld: /home/lkcl/src/ld_torture/src9.c:3016:(.text+0xa9): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_183_330' defined
in .text section in src183.o
ld: /home/lkcl/src/ld_torture/src9.c:3024:(.text+0x111): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_162_394' defined
in .text section in src162.o
ld: /home/lkcl/src/ld_torture/src9.c:3026:(.text+0x12b): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_132_235' defined
in .text section in src132.o
ld: /home/lkcl/src/ld_torture/src9.c:3028:(.text+0x145): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1528_316' defined
in .text section in src1528.o
ld: /home/lkcl/src/ld_torture/src9.c:3029:(.text+0x152): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1178_357' defined
in .text section in src1178.o
ld: /home/lkcl/src/ld_torture/src9.c:3031:(.text+0x16c): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1180_278' defined
in .text section in src1180.o
ld: /home/lkcl/src/ld_torture/src9.c:3035:(.text+0x1a0): additional
relocation overflows omitted from the output
^Cmake: *** Deleting file `main'
make: *** [main] Interrupt

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-08 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

Luke Kenneth Casson Leighton  changed:

   What|Removed |Added

   Priority|P2  |P1
   Severity|normal  |critical

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-08 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #21 from Luke Kenneth Casson Leighton  ---
to emphasise that this is strategically becoming an absolutely critical bug:
https://lists.debian.org/debian-devel/2019/01/msg00081.html

here it has been reported that even when using -Wl,--no-keep-memory, firefox
completely fails to build on a 32-bit system.

the debian developers are presently testing cross-compiling 32-bit packages
from 64-bit hosts.

they report that ubuntu has *already* moved over to this procedure.

32 bit distributions are no longer self-hosting.

this bug is now a priority 1 critical bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2019-01-08 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #20 from Luke Kenneth Casson Leighton  ---
ok so i spoke to dr stallman a couple of weeks ago, and he confirmed that code
that is near-identical to that which i described in the very first comment of
this bugreport was REMOVED some time in the late 1990s, by persons not familiar
with the type of issues that linking has to deal with.

the original code that dr stallman wrote did two things:

(1) checked to make absolutely sure that it stayed within the bounds of
RESIDENT available memory, if it could.
(2) that it ONLY loaded into memory the maximum number of object files that
would ensure that it remained within bounds of resident available
memory, if it could.

this code is essential to research and restore its functionality.  this is
NOT a 32-bit-only problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-12-21 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #19 from Luke Kenneth Casson Leighton  ---
(In reply to H.J. Lu from comment #18)
> (In reply to Luke Kenneth Casson Leighton from comment #17)
> > https://issues.guix.info/issue/33676
> > 
> > so we have a successful report that the advised option helps.
> > 
> 
> Have you tried my users/hjl/pr18028 branch?

 as i mentioned before, i (personally) do not have the resources
 to try anything out: i am acting as a go-between, to find people
 who *can* try out different branches.

 i took a look at the diffs:


https://github.com/hjl-tools/binutils-gdb/compare/users/hjl/pr18028#diff-e65a96fc956244cba3a031705b7b737aR3484

 some comments:

 bfd/linker.c line 3492 - i see what's going on.  this is great,
 it *in principle* makes sure that the amount of memory used is
 not exceeded.

 bfd/linker.c line 3484 - this is completely arbitrary.  this is
 NOT repeat NOT, as i have already said, and repeat, NOT limited
 to 32 bit.  64-bit systems ALSO HAVE THE EXACT SAME PROBLEM.
 this test needs to be removed.

 ld/ldmain.c: line 275 - specifying half the memory is arbitrary.

 so, as i said: it is not enough.  what if the amount of memory
 used by other programs exceeeds half the available memory?

 conditions where that will occur immediately: make -j2.

 one ld process will take half the memory

 the other ld process will take half the memory.

 now BOTH processes will enter thrashing.

 as i said, right in the original report: it is necessary to DYNAMICALLY
 check the amount of available memory, just like gcc does.

 in that way, ld will remain DYNAMICALLY under the limit, it will STAY
 in resident memory.


 ld must be prevented from going into swap space, at all costs, basically.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-12-21 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #17 from Luke Kenneth Casson Leighton  ---
https://issues.guix.info/issue/33676

so we have a successful report that the advised option helps.

please note: the advised option is **NOT** repeat **NOT** a solution.
destroying all of the memory and throwing away useful information
cannot possibly be called a "solution".

ld *really does* need to make *optimal* use of memory, by restoring
the techniques that were used decades ago, and to use dynamic 
analysis of the amount of available *RESIDENT* memory, so as to
very very specifically avoid swapping.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-06-29 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #16 from Luke Kenneth Casson Leighton  ---
the following came up in a debian discussion and is copied here:

Florian Weimer 
8:31 PM (14 minutes ago)

to Luke, Steve, ARM, debian-release, debian-admin, team, debian-gcc,
debian-glibc 
* Luke Kenneth Casson Leighton:

>  that is not a surprise to hear: the massive thrashing caused by the
> linker phase not being possible to be RAM-resident will be absolutely
> hammering the drives beyond reasonable wear-and-tear limits.  which is
> why i'm recommending people try "-Wl,--no-keep-memory".

Note that ld will sometimes stuff everything into a single RWX segment
as a result, which is not desirable.

Unfortunately, without significant investment into historic linker
technologies (with external sorting and that kind of stuff), I don't
think it is viable to build 32-bit software natively in the near
future.  Maybe next year only a few packages will need exceptions, but
the number will grow with each month.  Building on 64-bit kernels will
delay the inevitable because more address space is available to user
space, but that's probably 12 to 18 month extended life-time for
native building.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-03-15 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #15 from Luke Kenneth Casson Leighton  ---
(In reply to H.J. Lu from comment #14)
> (In reply to Luke Kenneth Casson Leighton from comment #13)
> >  i have 16 GB of DDR4 2400 mhz RAM on my laptop... and because when
> > that system goes into swap (it has an NVMe) its loadavg goes over 120
> > and it is absolutely guaranteed to crash about 30 seconds later,
> > adding more RAM is *not* the solution.
> > 
> >  however much more RAM is added, there *will* be a piece of software
> > within 1-5 years which requires more RAM for the linker phase than any
> > system provides.
> > 
> 
> Please try if "-Wl,--no-keep-memory" works.

i'll alert some people and see if they are in a position to try that.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-03-14 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #13 from Luke Kenneth Casson Leighton  ---
On Wed, Mar 14, 2018 at 12:26 PM, hjl.tools at gmail dot com
 wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22831
>
> --- Comment #12 from H.J. Lu  ---
> (In reply to Luke Kenneth Casson Leighton from comment #11)
>> (In reply to H.J. Lu from comment #10)
>>  there are two issues:
>>
>>  1.  32-bit system
>>  2.  64-bit system
>>
>>  both 32-bit and 64-bit are affected by this issue.
>>
>>  the patch that you wrote however looks like it only addresses
>>  32-bit.
>
> True.  My patch is a starting point.  I'd like to know if it helps
> 32-bit system or not.  If it doesn't address the issue for 32-bit
> system, my approach won't for 64-bit system.

 unfortutely i cannot risk damaging my system by carrying out any
tests (because any tests will result in a loadavg over 120 and 30
seconds later it is guaranteed to hard crash).  so we will have to
wait for someone else to test the patch.

>>  that leaves 64-bit systems still affected.
>
> You can always and should get more RAM for 64-bit system.

 i have 16 GB of DDR4 2400 mhz RAM on my laptop... and because when
that system goes into swap (it has an NVMe) its loadavg goes over 120
and it is absolutely guaranteed to crash about 30 seconds later,
adding more RAM is *not* the solution.

 however much more RAM is added, there *will* be a piece of software
within 1-5 years which requires more RAM for the linker phase than any
system provides.

 how does gcc do compilation?  how does it stay within the bounds of
available memory?

l.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-03-14 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #11 from Luke Kenneth Casson Leighton  ---
(In reply to H.J. Lu from comment #10)

> What is your main issue?

 i do not (personally) have an issue, hj.  this is a flaw that is
 independent of me (personally).

 do you mean to ask, "what is THE main issue?"


> 32-bit system or 64-bit system?

 there are two issues:

 1.  32-bit system
 2.  64-bit system

 both 32-bit and 64-bit are affected by this issue.

 the patch that you wrote however looks like it only addresses
 32-bit.

 that leaves 64-bit systems still affected.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-03-14 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #9 from Luke Kenneth Casson Leighton  ---
(In reply to H.J. Lu from comment #8)

> Have you tried users/hjl/pr18028 branch?

no, hj, i have not, because it is a fix for a 32-bit system,
not a 64-bit system.  what do you need to know to make it
clear that this is a problem that occurs on a 64-bit system
as well as a 32-bit system?

is there a particular reason why you are not answering my
questions?  in particular can i refer you to my questions
at the end of comment #6?  i am trying to understand if
there is anything unclear about my questions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-03-13 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #7 from Luke Kenneth Casson Leighton  ---
hi hjl,

so how are you getting on with analysing this problem? is there anything
that is unclear that i can assist you with understanding?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-02-28 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #6 from Luke Kenneth Casson Leighton  ---
(In reply to H.J. Lu from comment #5)

> Please read my suggestion again and follow it to the letter.

sorry, hjl, i appreciate you're busy so are providing extremely
short responses: please read again what i wrote.  i am *not* the
person installing or running this.  i am acting merely as a
*messenger* after seeing and experiencing reports from at least
FIVE separate teams over the past SIX years of increasingly
difficult build problems due to this increasingly-important bug.

i am NOT the person who will be running any of the suggestions
that you are giving (because my laptop will potentially be damaged
by doing so and i cannot risk that), i will be RELAYING the suggestions
to various people across the internet, making them AWARE that you
are willing to tackle this particular problem.

therefore i require and seek CLARITY on EXACTLY what it is
that i am going to tell people, BEFORE suggesting to them that
they come and look at this bugreport.

is there anything that is unreasonable about that?

if so, please let me know.

https://github.com/hjl-tools/binutils-gdb/commit/de060bbcc7cca9dce213dc6593887a8e
 

ok so after re-reading twice, i eventually spotted the (misordered)
branch name.  can i suggest in future, rather than refer to the
main branch, to instead post people the link *directly* to the
branch, like this?

https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/pr18028

it was a simple mistake, much more helpful to say "you missed that
i suggested trying a branch named xyz".

now, i took a quick look, and there is an assumption in the
patch, that the problem will *exclusively* occur on 32-bit systems.

this is not the case.

there are actually *two* inter-related problems.

the FIRST is that the amount of memory used for linking e.g. firefox
is so insane that it now requires 7 GIGABYTES of resident memory in
order to avoid thrashing... this is simply impossible to do on a
32-bit system.

the SECOND is that the linker phase GOES INTO THRASHING IN THE FIRST
PLACE and has done for many many years now INCLUDING on 64-BIT SYSTEMS.

if you read the original bug-report you will see that i said that
one 64-bit x86 laptop that i had, 6 years ago, only had 2GB of RAM.

the one that i have now has *16* GB of RAM but because it is an NVMe
SSD and an ultra-expensive laptop (USD $2500) i cannot risk the NVMe
drive getting damaged so swap is *DISABLED*.  despite this, it still
goes into total meltdown (loadavg over 100) whenever memory usage
approaches 16GB.

for both these systems - both of them *64-bit* systems *NOT* 32-bit
systems - going into swap-space is an absolute unmitigated disaster,
but this is now considered to be NORMAL that a build should go from
taking about 1 hour to link if it is below the 100% resident memory
usage threshold to taking SEVERAL DAYS in some cases if it goes even
the TINIEST FRACTION above the available resident memory...
because distros *do not have any choice in the matter*.

this is why i suggested the algorithm above, because the algorithm
above was part of an exercise set by extremely competent lecturers
at Imperial College University during an era where available memory
was a tiny fraction of what it is now, and running in virtual memory
was simply flat-out inconceivable because most systems were still
16-bit let alone 32-bit.

so.

questions:

1) would the proposed patch - which reduces virtual memory usage for
32-bit systems to half that of the available memory - *actually*
fix the problem as described on a *64-bit* system?

2) what would happen if *more than half* of the available virtual
memory is taken up by programs that happen to be running at the
same time as the linker phase?  consider the cases where, in
complex builds, there may be a REALLY LARGE chain of applications
that have spawned any given usage of the ld executable, such that
the expectation that there will *be* half of the total amount of
virtual memory space even *available* is not actually true.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-02-28 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #4 from Luke Kenneth Casson Leighton  ---
(In reply to H.J. Lu from comment #3)
> Please try users/hjl/pr18028 branch at
> 
> https://github.com/hjl-tools/binutils-gdb

hi hjl, i will point some people at this, it may be some time
as one of them is the debian-riscv team, they may be maintaining
special patches so it might not be as straightforward as just
cloning the above branch.

others severely affected include armhf systems (max 2GB RAM, 32-bit)
and i note that the patch from a couple of days ago mentions
"enabled by default on x86", is that correct?  what options would
be needed to try this out?

also i note from the patch commit message it says "change maximum
page size", how would that stop severe / critical thrashing?  how
would it reduce memory usage to only that which is available on
the actual system (like when gcc performs compiles, it only uses
available memory)?

did i miss something?  is there another patch in that branch which
i did not see?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-02-11 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #2 from Luke Kenneth Casson Leighton  ---
hi HJ, thanks for that advice - bear in mind that i am not actually
directly involved in any of the projects that are experiencing these
insane levels of thrashing.  we (you and i) are therefore talking
"to the general wider internet".

so, for anyone *finding* this bugreport, HJ is recommending that,
if you are a build maintainer and the linker phase is going into
insane thrashing, that you try 2.30 and the option HJ recommends.

now.

here's the thing, HJ: distros have to "fix" the version of binutils
and make it the default / standard for sometimes up to 18 months.
also, there's no *guarantee* that they will ever hear about this option.

can i recommend, if reports start coming in that it works, that this
option be either enabled *by default*... or... that instead, there be
an "auto-resident-memory detect" option similar to that used in gcc,
where it detects available free resident RAM for compiling and uses
that and that alone?

... and then when *that* is stable... make *that* the default.

what do you think?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/22831] New: ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed

2018-02-10 Thread lkcl at lkcl dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=22831

Bug ID: 22831
   Summary: ld causes massive thrashing if object files are not
fully memory-resident: new algorithm needed
   Product: binutils
   Version: unspecified
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: ld
  Assignee: unassigned at sourceware dot org
  Reporter: lkcl at lkcl dot net
  Target Milestone: ---

ok so this is quite complex / comprehensive, as it's a performance-related
bug that is slowly getting more and more critical as software such as
firefox and chromium (particularly when compiled with debug symbols enabled)
get inexorably larger and larger.

back in 2008 i decided to add gobject bindings to webkit.  had a really nice
laptop, 2gb of RAM, blah blah processor, blah blah disk space, took an hour
to do the compile phase, couldn't care less, plodded along, did the job.
switched off -g because it just took far too long, and didn't need it.

one day i needed to debug something particularly tricky, and i accidentally
did a recompile and ended up with like 100mb object files instead of the
usual 5mb or whatever they are.

when it came to the linker phase all hell broke loose.  the laptop went into
total meltdown, loadavg shot up to 50 and above, the X server became completely
unresponsive, i had to hold my fingers down on the ctrl-alt-f1 key combination
for over a minute to get to console and recover the machine.

turns out that it had gone into complete and utter swap-space thrashing.

on further investigation and thought i realised that the huge amount of
cross-referencing that goes on at the linker phase basically means that if
*EVEN ONE* object file is not permanently ram-resident the entire system
goes into total thrashing meltdown.

some of debian's smaller build systems now spend SEVERAL DAYS performing the
linker phase for these insanely-large binaries.

now, i completely ignored this because it's not my problem, not my
responsibility, nothing to do with me, blah blah, you're the maintainers
of binutils, you're doing a superb job of coordinating things.

however i don't believe anyone has informed you quite how out-of-hand things
are getting.  firefox now requires SEVEN GIGABYTES of RESIDENT MEMORY to
perform the final linker phase.

that's beyond the memory address space of every single 32-bit processor
on the planet, which is very very serious.  

why is it serious?  well, it's because it makes perfectly good
32-bit hardware look completely useless, and HUNDREDS OF MILLIONS OF
COMPUTERS will END UP IN LANDFILL very very quickly over the next 1-2
years unless you respond and do something about this bug.

in looking at this:
https://sourceware.org/bugzilla/show_bug.cgi?id=12682

whatever it is, it's basically... not enough.  trying to *reduce* memory
overhead is simply not radical enough.  why?  because what happens is,
in 2-3 years time as the inexorable and insane increase in complexity
of software goes up, yet another limit will be hit, and another, and another.

so the entire linker algorithm and the impact on the system needs to be
looked at.

back in 1989 our year was given a very interesting problem to solve.  we
were asked, "if you had to do an ultra-large matrix multiply where the
disk storage capacity far exceeded the available RAM, how would you go about
it?"

whilst most people did a "best guess" algorithm, i analysed the problem and
realised that the amount of time *pre-analysing* the problem was entirely
computationally-based and that the actual multiply would, by virtue of being
I/O bound, be far and above the most time-consuming part.

so i took a brute-force approach.

the algorithm basically said, "ok, if you allow N object files to be
resident at a time until all of their functions are linked with all
other object files, and you allow M object files to be brought in
temporarily to link with the N-sized group, brute-force go through
ALL possible combinations of values of N and M to give the best
estimate for the linker time".

it's clearly going to be a little bit more complex than that, as the
amount of available resident RAM is going to dynamically change depending
on system load.

also, it's not quite N "object files" it's "a size of RAM where object
files happen to sit, resident permanently until all functions are linked"

but you get the general idea, the important thing being to remember that
at the end of the "row" (when any one "N" group is completed) you don't
throw the M group out immediately, you bring in the *new* N group and
link with the current (resident) M group... *then* you throw out the
current M group and bring in a new one.

this saves O(M) time on an O(N*M) algorithm where you have absolutely no
idea if making M or N large is significant or not... hence the O(M)
time saving cannot be