Hi,

I'm posting this to rust-dev as well to solicit help from anyone who might smell a "familiar bug" lurking. We've been hunting a somewhat mysterious win32-specific crash lately, when monomorphization is turned on, it seems that the stage2 librustc.dll we generate has ... something broken in its exports. Client code that links against it just randomly jumps into heap poison (0xbaadf00d) rather than the correct target symbols. It seems to do with binary size. Maybe. Marijn left me with this today, so this is a summary of what I found so far:

-- snip --

It appears that the windows crash we're seeing is -- or may be! -- related to binary size, as you suggested. Somehow. I started playing around with synthetic tests (10,000 functions each of which does #error("hi")) and found I could reproduce the bug in isolation from the rustc build process. Here is what I have discovered:

   text    data     bss  filename
2093961  412701     116  large.dll    ok (3000 syms)
2790301  550701     116  large.dll    ok (4000 syms)
3138473  619701     116  large.dll    ok (4500 syms)
3216465  635157     116  large.dll    ok (4612 syms)
3255461  642885     116  large.dll    ok (4668 syms)
3260332  643847     116  large.dll    ok (4675 syms)
3262419  644265     116  large.dll    ok (4678 syms)
3116452  644399     116  large.dll   bad (4679 syms)  <-- whoops
3117153  644541     116  large.dll   bad (4680 syms)
3118543  644817     116  large.dll   bad (4682 syms)
3128285  646749     116  large.dll   bad (4696 syms)
3148482  650747     116  large.dll   bad (4725 syms)
3339977  688701     116  large.dll   bad (5000 syms)

That is, there's a threshold around the 3.2mb text segment mark where "something goes wrong" and things stop working. I checked this with -O to confirm it's "size" not "number of symbols"; with -O the symbol-count threshold is a fair bit higher.

What happens here is odd: the DLL _shrinks_, because it's losing valuable material. Its .rdata section (and relocs) gets eviscerated:

ok (4678 syms):

section              size
.text             2923576
.data               18852
.note.rustc        624221
.rdata              56452 <-- a bunch of data
.eh_frame             156
.bss                  116
.edata             191407
.idata               1136
.CRT                   24
.tls                   32
.reloc              90828 <-- a bunch of relocs
.debug_aranges        192
.debug_pubnames       709
.debug_pubtypes      1034
.debug_info          8451
.debug_abbrev        1838
.debug_line          1402
.debug_frame          900
.debug_str            192
.debug_loc           2285
.debug_ranges          48
Total             3923851


bad (4679 syms):

section              size
.text             2924200
.data               18852
.note.rustc        624355
.rdata                316 <--- mostly gone
.eh_frame             156
.bss                  116
.edata             191448
.idata               1136
.CRT                   24
.tls                   32
.reloc                332 <-- likewise
.debug_aranges        192
.debug_pubnames       709
.debug_pubtypes      1034
.debug_info          8451
.debug_abbrev        1838
.debug_line          1402
.debug_frame          900
.debug_str            192
.debug_loc           2285
.debug_ranges          48
Total             3778018

Further unfortunate details:

  - Microsoft's link.exe can't process the DLLs we're making.
  - Even the non-broken ones.
  - It can't process the DLLs gcc makes either.
  - It claims they're corrupt. #llvm hackers think this is common
    for stuff generated by gnu tools on windows, and that clang
    _might_ do better, but since it uses gnu ld for the final link,
    it might do the same anyways.

Attempting to reduce this to "not even rust's fault", I tried to reproduce using straight C files of unusual size. Here I did run into a bug -- a limit of 65535 symbols beyond which gcc starts silently mis-assigning DLL-import ordinals -- but on further investigation I _think_ that's just a design limit of the DLL import/export scheme. Link.exe refuses to touch such a file, complaining. So I _think_ that's an unrelated bug. Aside from that I haven't managed to reproduce it outside of "stuff generated from rust" yet. I'll have a go with objcopy tomorrow.

Unfortunately in most other respects, the "good" and "bad" DLLs I have sitting here look ... reasonably well-formed. They both have valid PE/COFF headers and reasonably well-structured section tables. I've run them through a number of diagnostic tools and even objdumped them and fed that into kdiff3 for comparison. The bad one just, for unclear reasons, has lost a big chunk of its midsection. The rest seems ok.

I have a _hunch_ that the problem lies in bfd or ld, the gnu side of the equation, and not in the stuff coming out of llvm-mc. The reason I say this is that the .o files coming out of llvm-mc (if I use --save-temps on rustc) both have "reasonable" sizes:

large-ok.o  :
section                                 size   addr
.text                                2920967      0
.data                                  18844      0
.bss$linkonce__ZN5large9loglevel2E         4      0
.note.rustc                           624221      0
Total                                3564036

large-bad.o  :
section                                 size   addr
.text                                2921591      0
.data                                  18844      0
.bss$linkonce__ZN5large9loglevel2E         4      0
.note.rustc                           624355      0
Total                                3564794

That is, things only seem to go bad once we pass these through gcc (and collect2, ld) for linkage into a DLL. Where _exactly_ it's going wrong, however, remains a mystery to me. I'll look more tomorrow if you haven't found it in the meantime. Suggestions welcome.

-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to