I have spent some time thinking about how we should implement linking.
This is a request for comments on what I have found (don't worry, will
be back to the parser in one sec :-) ).
* Introduction
There are two rust features that make it "strange" from the linker's
perspective.
*) No global namespace. Every crate has a local namespace, but there is
no global one.
*) Flexible way to select crates. The user can ask for a version, a name
or any other metadata.
Not having a global namespace is not a big problem for exporting
symbols. We just mangle them. For example, a crate could export the
function "mod1.foo" and the constant "mod2.foo".
* Handling undefined references on the shared libraries/executables.
Not having a global namespace is a problem for declaring dependencies.
It requires that the symbol table map each undefined reference to a
symbol in a particular crate.
No current linker (static or dynamic) supports all this. I propose that
we design a system that can be implemented by providing our own linkers,
but that can degrade somewhat gracefully to use the system linkers when
possible.
Consider first the namespace problem. A program that uses the function
foo from crate1 and the function foo from crate2 will have two undefined
references to foo. Both Mach-O and PE/COFF should work. It is possible
to implement direct binding on ELF, but It think only Solaris did it.
I tested that this works on OS X by creating a C program that references
symbols from two libraries and then using an hex editor to rename the
references :-)
The original C program:
int main(void) {
foobar1();
foobar2();
printf("%d %d\n", foozed1, foozed2);
}
after editing it:
$ dyldinfo -lazy_bind main
lazy binding information (from lazy_bind part of dyld info):
segment section address index dylib symbol
__DATA __la_symbol_ptr 0x100001048 0x0000 libSystem _exit
__DATA __la_symbol_ptr 0x100001050 0x000C libf1 _foobar2
__DATA __la_symbol_ptr 0x100001058 0x001B libf2 _foobar2
__DATA __la_symbol_ptr 0x100001060 0x002A libSystem _printf
$ dyldinfo -bind main
bind information:
segment section address type weak addend
dylib symbol
__DATA __nl_symbol_ptr 0x100001028 pointer 0
libf1 _foozed2
__DATA __nl_symbol_ptr 0x100001030 pointer 0
libf2 _foozed2
__DATA __nl_symbol_ptr 0x100001038 pointer 0
libSystem dyld_stub_binder
Note how now we have two undefined references to foobar2 and foozed2,
but each points to a different library. The dynamic linker does the
expected and two different functions and variables are used. A similar
trick should work on Windows but I haven't tested it.
On ELF things are a bit harder. Hopefully we will get direct binding on
Linux some day, but we have to decide what to do before that.
One way to do it is to write a table mapping undefined references to
DT_NEEDED. Just like how it is done to implement direct binding. The
startup code can then patch up the GOT and PLT. It is wasteful, but
probably better then creating a new format. Using the regular symbol
table also lets the user run nm, objdump, readelf, etc.
* How to create the shared libraries/executables.
Using an hex editor is probably not something we want in the build
pipeline :-) We have to figure out how to generate these strange libraries.
The normal pipeline is *.rs -> .bc -> .s -> .o -> .so/.dylib/.dll
While rust and the shared objects are able to represent the dependencies
we want, the middle stages are not since those dependencies are normally
computed by the static linker.
There was a bit of discussion about making LLVM able to produce
libraries and executables directly. To get there the IL would have to be
extended a bit so that a declaration could say what module it resoles
to. While this would probably be the perfect solution for us, we need
something before that.
We could extend the IL, the assembly files and the static linker, but
that doesn't look a lot easier then implementing direct .so emission anyway.
A possible hack is to mangle the undefined references, link normally and
then edit the shared library. In the example of a crate using a function
'foo' from crate 'bar' and another function 'foo' from crate 'zed', we
wold first produce a .o that had undefined references to bar.foo and
zed.foo (the prefixes can be arbitrary). The linker will propagate these
undefined references to the .so/.dylib/.dll and we can edit them.
* How to handle the selection of crates
During static linking our linker can do any search we want, but we don't
always control the dynamic linker. Even if we do implement a new dynamic
linker, having support for searching every system library for the one
that declares the metadata item 'color = "blue"' is probably a bit too
expensive.
We should define a small set of metadata items (name and version most
likely), that a crate must define. These can then be added to the name
of the file (crate-foobar-2.3.dylib) and are the only metadata that we
search for at runtime. The startup code could still check that 'color =
"blue"' and produce an error if not.
* A Hack to avoid most of this for now.
We can just add a global_prefix item to the crate files. If not
declared it could default to a sha1 for example. This would already be a
lot better than what we have in java since the user would still say
use foo;
import foo.bar;
and just have to add a
global_prefix "org.mozilla....."
once to foo's crate.
Opinions?
Cheers,
Rafael
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev