I have spent some time thinking about how we should implement linking. This is a request for comments on what I have found (don't worry, will be back to the parser in one sec :-) ).

* Introduction

There are two rust features that make it "strange" from the linker's perspective.

*) No global namespace. Every crate has a local namespace, but there is no global one. *) Flexible way to select crates. The user can ask for a version, a name or any other metadata.

Not having a global namespace is not a big problem for exporting symbols. We just mangle them. For example, a crate could export the function "mod1.foo" and the constant "mod2.foo".

* Handling undefined references on the shared libraries/executables.

Not having a global namespace is a problem for declaring dependencies. It requires that the symbol table map each undefined reference to a symbol in a particular crate.

No current linker (static or dynamic) supports all this. I propose that we design a system that can be implemented by providing our own linkers, but that can degrade somewhat gracefully to use the system linkers when possible.

Consider first the namespace problem. A program that uses the function foo from crate1 and the function foo from crate2 will have two undefined references to foo. Both Mach-O and PE/COFF should work. It is possible to implement direct binding on ELF, but It think only Solaris did it.

I tested that this works on OS X by creating a C program that references symbols from two libraries and then using an hex editor to rename the references :-)

The original C program:

int main(void) {
  foobar1();
  foobar2();
  printf("%d %d\n", foozed1, foozed2);
}

after editing it:

$ dyldinfo -lazy_bind main
lazy binding information (from lazy_bind part of dyld info):
segment section          address    index  dylib            symbol
__DATA  __la_symbol_ptr  0x100001048 0x0000 libSystem        _exit
__DATA  __la_symbol_ptr  0x100001050 0x000C libf1            _foobar2
__DATA  __la_symbol_ptr  0x100001058 0x001B libf2            _foobar2
__DATA  __la_symbol_ptr  0x100001060 0x002A libSystem        _printf
$ dyldinfo -bind main
bind information:
segment section address type weak addend dylib symbol __DATA __nl_symbol_ptr 0x100001028 pointer 0 libf1 _foozed2 __DATA __nl_symbol_ptr 0x100001030 pointer 0 libf2 _foozed2 __DATA __nl_symbol_ptr 0x100001038 pointer 0 libSystem dyld_stub_binder

Note how now we have two undefined references to foobar2 and foozed2, but each points to a different library. The dynamic linker does the expected and two different functions and variables are used. A similar trick should work on Windows but I haven't tested it.

On ELF things are a bit harder. Hopefully we will get direct binding on Linux some day, but we have to decide what to do before that.

One way to do it is to write a table mapping undefined references to DT_NEEDED. Just like how it is done to implement direct binding. The startup code can then patch up the GOT and PLT. It is wasteful, but probably better then creating a new format. Using the regular symbol table also lets the user run nm, objdump, readelf, etc.

* How to create the shared libraries/executables.

Using an hex editor is probably not something we want in the build pipeline :-) We have to figure out how to generate these strange libraries.

The normal pipeline is *.rs -> .bc -> .s -> .o -> .so/.dylib/.dll

While rust and the shared objects are able to represent the dependencies we want, the middle stages are not since those dependencies are normally computed by the static linker.

There was a bit of discussion about making LLVM able to produce libraries and executables directly. To get there the IL would have to be extended a bit so that a declaration could say what module it resoles to. While this would probably be the perfect solution for us, we need something before that.

We could extend the IL, the assembly files and the static linker, but that doesn't look a lot easier then implementing direct .so emission anyway.

A possible hack is to mangle the undefined references, link normally and then edit the shared library. In the example of a crate using a function 'foo' from crate 'bar' and another function 'foo' from crate 'zed', we wold first produce a .o that had undefined references to bar.foo and zed.foo (the prefixes can be arbitrary). The linker will propagate these undefined references to the .so/.dylib/.dll and we can edit them.

* How to handle the selection of crates

During static linking our linker can do any search we want, but we don't always control the dynamic linker. Even if we do implement a new dynamic linker, having support for searching every system library for the one that declares the metadata item 'color = "blue"' is probably a bit too expensive.

We should define a small set of metadata items (name and version most likely), that a crate must define. These can then be added to the name of the file (crate-foobar-2.3.dylib) and are the only metadata that we search for at runtime. The startup code could still check that 'color = "blue"' and produce an error if not.

* A Hack to avoid most of this for now.

We can just add a global_prefix item to the crate files. If not declared it could default to a sha1 for example. This would already be a lot better than what we have in java since the user would still say

use foo;
import foo.bar;

and just have to add a

global_prefix "org.mozilla....."

once to foo's crate.

Opinions?

Cheers,
Rafael

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to