Hi,
I'm posting this to rust-dev as well to solicit help from anyone who
might smell a "familiar bug" lurking. We've been hunting a somewhat
mysterious win32-specific crash lately, when monomorphization is turned
on, it seems that the stage2 librustc.dll we generate has ... something
broken in its exports. Client code that links against it just randomly
jumps into heap poison (0xbaadf00d) rather than the correct target
symbols. It seems to do with binary size. Maybe. Marijn left me with
this today, so this is a summary of what I found so far:
-- snip --
It appears that the windows crash we're seeing is -- or may be! --
related to binary size, as you suggested. Somehow. I started playing
around with synthetic tests (10,000 functions each of which does
#error("hi")) and found I could reproduce the bug in isolation from the
rustc build process. Here is what I have discovered:
text data bss filename
2093961 412701 116 large.dll ok (3000 syms)
2790301 550701 116 large.dll ok (4000 syms)
3138473 619701 116 large.dll ok (4500 syms)
3216465 635157 116 large.dll ok (4612 syms)
3255461 642885 116 large.dll ok (4668 syms)
3260332 643847 116 large.dll ok (4675 syms)
3262419 644265 116 large.dll ok (4678 syms)
3116452 644399 116 large.dll bad (4679 syms) <-- whoops
3117153 644541 116 large.dll bad (4680 syms)
3118543 644817 116 large.dll bad (4682 syms)
3128285 646749 116 large.dll bad (4696 syms)
3148482 650747 116 large.dll bad (4725 syms)
3339977 688701 116 large.dll bad (5000 syms)
That is, there's a threshold around the 3.2mb text segment mark where
"something goes wrong" and things stop working. I checked this with -O
to confirm it's "size" not "number of symbols"; with -O the symbol-count
threshold is a fair bit higher.
What happens here is odd: the DLL _shrinks_, because it's losing
valuable material. Its .rdata section (and relocs) gets eviscerated:
ok (4678 syms):
section size
.text 2923576
.data 18852
.note.rustc 624221
.rdata 56452 <-- a bunch of data
.eh_frame 156
.bss 116
.edata 191407
.idata 1136
.CRT 24
.tls 32
.reloc 90828 <-- a bunch of relocs
.debug_aranges 192
.debug_pubnames 709
.debug_pubtypes 1034
.debug_info 8451
.debug_abbrev 1838
.debug_line 1402
.debug_frame 900
.debug_str 192
.debug_loc 2285
.debug_ranges 48
Total 3923851
bad (4679 syms):
section size
.text 2924200
.data 18852
.note.rustc 624355
.rdata 316 <--- mostly gone
.eh_frame 156
.bss 116
.edata 191448
.idata 1136
.CRT 24
.tls 32
.reloc 332 <-- likewise
.debug_aranges 192
.debug_pubnames 709
.debug_pubtypes 1034
.debug_info 8451
.debug_abbrev 1838
.debug_line 1402
.debug_frame 900
.debug_str 192
.debug_loc 2285
.debug_ranges 48
Total 3778018
Further unfortunate details:
- Microsoft's link.exe can't process the DLLs we're making.
- Even the non-broken ones.
- It can't process the DLLs gcc makes either.
- It claims they're corrupt. #llvm hackers think this is common
for stuff generated by gnu tools on windows, and that clang
_might_ do better, but since it uses gnu ld for the final link,
it might do the same anyways.
Attempting to reduce this to "not even rust's fault", I tried to
reproduce using straight C files of unusual size. Here I did run into a
bug -- a limit of 65535 symbols beyond which gcc starts silently
mis-assigning DLL-import ordinals -- but on further investigation I
_think_ that's just a design limit of the DLL import/export scheme.
Link.exe refuses to touch such a file, complaining. So I _think_ that's
an unrelated bug. Aside from that I haven't managed to reproduce it
outside of "stuff generated from rust" yet. I'll have a go with objcopy
tomorrow.
Unfortunately in most other respects, the "good" and "bad" DLLs I have
sitting here look ... reasonably well-formed. They both have valid
PE/COFF headers and reasonably well-structured section tables. I've run
them through a number of diagnostic tools and even objdumped them and
fed that into kdiff3 for comparison. The bad one just, for unclear
reasons, has lost a big chunk of its midsection. The rest seems ok.
I have a _hunch_ that the problem lies in bfd or ld, the gnu side of the
equation, and not in the stuff coming out of llvm-mc. The reason I say
this is that the .o files coming out of llvm-mc (if I use --save-temps
on rustc) both have "reasonable" sizes:
large-ok.o :
section size addr
.text 2920967 0
.data 18844 0
.bss$linkonce__ZN5large9loglevel2E 4 0
.note.rustc 624221 0
Total 3564036
large-bad.o :
section size addr
.text 2921591 0
.data 18844 0
.bss$linkonce__ZN5large9loglevel2E 4 0
.note.rustc 624355 0
Total 3564794
That is, things only seem to go bad once we pass these through gcc (and
collect2, ld) for linkage into a DLL. Where _exactly_ it's going wrong,
however, remains a mystery to me. I'll look more tomorrow if you haven't
found it in the meantime. Suggestions welcome.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev