[rust-dev] Linking

Rafael Ávila de Espíndola Fri, 24 Dec 2010 21:30:39 -0800

I have spent some time thinking about how we should implement linking.This is a request for comments on what I have found (don't worry, willbe back to the parser in one sec :-) ).


* Introduction

There are two rust features that make it "strange" from the linker'sperspective.

*) No global namespace. Every crate has a local namespace, but there isno global one.*) Flexible way to select crates. The user can ask for a version, a nameor any other metadata.

Not having a global namespace is not a big problem for exportingsymbols. We just mangle them. For example, a crate could export thefunction "mod1.foo" and the constant "mod2.foo".


* Handling undefined references on the shared libraries/executables.

Not having a global namespace is a problem for declaring dependencies.It requires that the symbol table map each undefined reference to asymbol in a particular crate.

No current linker (static or dynamic) supports all this. I propose thatwe design a system that can be implemented by providing our own linkers,but that can degrade somewhat gracefully to use the system linkers whenpossible.

Consider first the namespace problem. A program that uses the functionfoo from crate1 and the function foo from crate2 will have two undefinedreferences to foo. Both Mach-O and PE/COFF should work. It is possibleto implement direct binding on ELF, but It think only Solaris did it.

I tested that this works on OS X by creating a C program that referencessymbols from two libraries and then using an hex editor to rename thereferences :-)


The original C program:

int main(void) {
  foobar1();
  foobar2();
  printf("%d %d\n", foozed1, foozed2);
}

after editing it:

$ dyldinfo -lazy_bind main
lazy binding information (from lazy_bind part of dyld info):
segment section          address    index  dylib            symbol
__DATA  __la_symbol_ptr  0x100001048 0x0000 libSystem        _exit
__DATA  __la_symbol_ptr  0x100001050 0x000C libf1            _foobar2
__DATA  __la_symbol_ptr  0x100001058 0x001B libf2            _foobar2
__DATA  __la_symbol_ptr  0x100001060 0x002A libSystem        _printf
$ dyldinfo -bind main
bind information:

segment section address type weak addenddylib symbol__DATA __nl_symbol_ptr 0x100001028 pointer 0libf1 _foozed2__DATA __nl_symbol_ptr 0x100001030 pointer 0libf2 _foozed2__DATA __nl_symbol_ptr 0x100001038 pointer 0libSystem dyld_stub_binder

Note how now we have two undefined references to foobar2 and foozed2,but each points to a different library. The dynamic linker does theexpected and two different functions and variables are used. A similartrick should work on Windows but I haven't tested it.

On ELF things are a bit harder. Hopefully we will get direct binding onLinux some day, but we have to decide what to do before that.

One way to do it is to write a table mapping undefined references toDT_NEEDED. Just like how it is done to implement direct binding. Thestartup code can then patch up the GOT and PLT. It is wasteful, butprobably better then creating a new format. Using the regular symboltable also lets the user run nm, objdump, readelf, etc.


* How to create the shared libraries/executables.

Using an hex editor is probably not something we want in the buildpipeline :-) We have to figure out how to generate these strange libraries.


The normal pipeline is *.rs -> .bc -> .s -> .o -> .so/.dylib/.dll

While rust and the shared objects are able to represent the dependencieswe want, the middle stages are not since those dependencies are normallycomputed by the static linker.

There was a bit of discussion about making LLVM able to producelibraries and executables directly. To get there the IL would have to beextended a bit so that a declaration could say what module it resolesto. While this would probably be the perfect solution for us, we needsomething before that.

We could extend the IL, the assembly files and the static linker, butthat doesn't look a lot easier then implementing direct .so emission anyway.

A possible hack is to mangle the undefined references, link normally andthen edit the shared library. In the example of a crate using a function'foo' from crate 'bar' and another function 'foo' from crate 'zed', wewold first produce a .o that had undefined references to bar.foo andzed.foo (the prefixes can be arbitrary). The linker will propagate theseundefined references to the .so/.dylib/.dll and we can edit them.


* How to handle the selection of crates

During static linking our linker can do any search we want, but we don'talways control the dynamic linker. Even if we do implement a new dynamiclinker, having support for searching every system library for the onethat declares the metadata item 'color = "blue"' is probably a bit tooexpensive.

We should define a small set of metadata items (name and version mostlikely), that a crate must define. These can then be added to the nameof the file (crate-foobar-2.3.dylib) and are the only metadata that wesearch for at runtime. The startup code could still check that 'color ="blue"' and produce an error if not.


* A Hack to avoid most of this for now.

We can just add a global_prefix item to the crate files. If notdeclared it could default to a sha1 for example. This would already be alot better than what we have in java since the user would still say


use foo;
import foo.bar;

and just have to add a

global_prefix "org.mozilla....."

once to foo's crate.

Opinions?

Cheers,
Rafael

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

[rust-dev] Linking

Reply via email to