Ok. Let's develop a workaround for ELF and then maybe someday petition
ELF's linker-authors to support direct binding. Michael Meeks appears to
have already authored a patch, though Ulrich doesn't like it.

http://sourceware.org/ml/binutils/2005-10/msg00436.html

I would advice against any plan that involves Ulrich. We need a workaround for ELF for now, but it is a useful thought experiment to know that the workaround can be dropped one day. Maybe when LLVM support for object files develops. They intend to support a static linker, the dynamic one is not that much harder. Another option is eglibc that might be more open to it.

I tested that this works on OS X by creating a C program that references
symbols from two libraries and then using an hex editor to rename the
references :-)

Can't you coax their linker into doing so with some kind of
direct-binding flag?

The static linker? How do you even get the information to it? Lets say a file has a call to two functions called foo in two different crates. Without crate dependent mangling, the .s file will have two "call foo" statements.


Using an hex editor is probably not something we want in the build
pipeline :-) We have to figure out how to generate these strange
libraries.

The normal pipeline is *.rs -> .bc -> .s -> .o -> .so/.dylib/.dll

His patch encoded a new .direct section, I think. But we could approach
it differently. I don't particularly want to *depend* on a solution that
requires glibc changes. I think you're right that we'll need to work
around for ELF, at least. Maybe everything, since LLVM itself has
similar limitations.

Michael's patch adds that functionally to .so files. So they become as good as .dll or .dylib. The problem is that the information put in there is created by the static linker, not provide by the compiler. The compiler has no channel to pass it.

A simple way to look at it is that the "compile, link" stages can looked at as "compile, resolve, link". In rust we want the compiler to do the first two. In a traditional ELF system most of the resolve is done at runtime after the link and in darwin and windows it is done at static link time.

A possible hack is to mangle the undefined references, link normally and
then edit the shared library. In the example of a crate using a function
'foo' from crate 'bar' and another function 'foo' from crate 'zed', we
wold first produce a .o that had undefined references to bar.foo and
zed.foo (the prefixes can be arbitrary). The linker will propagate these
undefined references to the .so/.dylib/.dll and we can edit them.

Nah, too painful.

Agreed.

During static linking our linker can do any search we want, but we don't
always control the dynamic linker. Even if we do implement a new dynamic
linker, having support for searching every system library for the one
that declares the metadata item 'color = "blue"' is probably a bit too
expensive.

Agreed, sigh. This is the part where it's actually useful to have a
'-Bdirect' flag supported in the linker, eh?

More or less, that flag would bind a name to a library, not control exactly what runtime file provides that library.

* A Hack to avoid most of this for now.

We can just add a global_prefix item to the crate files. If not declared
it could default to a sha1 for example. This would already be a lot
better than what we have in java since the user would still say

use foo;
import foo.bar;

and just have to add a

global_prefix "org.mozilla....."

once to foo's crate.

Opinions?

Going down this road makes me further question whether we are just
fighting a losing battle against baked-in assumptions in the system tools.

I think it is just a very slow victory :-)

(We discussed this today on IRC, so I'm just writing up what we got to)

Let's propose this instead: "ideally no global namespace, but we're
going to fake it just a bit on this toolchain": IOW provide only
"vanishingly small chance of collision in the top-level global
namespace". Assume CHF is some cryptographic hash, sha1 or sha3 or
whatever. Perhaps truncated to something non-horrible like 16 hex digits
(64-bits). Then:

1. Define two crate metadata tags as mandatory. Name + version. Call
these CNAME and CVERS.
2. Define CMETA as CHF(all non-$CNAME, non-$CVERS metadata in a crate).
3. Define $CMH as CHF($CMETA)
4. Compile a crate down to $CNAME-$CMH-$CVERS.dylib
>
Each crate is "almost globally uniquely named" and the search done by
the compiler will almost-always (with vanishingly small odds against) be
repeated precisely by the runtime linker. If the linker supports
versioned symbols, it can use that logic to handle type-compatible
version upgrades on the same-named crate.

Then for symbols:

1. Define typeof(sym) to be a canonical encoding of a type, with nominal
types expanding out to a mangled name as below:
1. Define STH(sym) as CHF($CNAME,$CMH,typeof(symbol))
2. Flatten names by module path, so
mod M { fn x() { .. } }
gets flattened to:
M.x
3. Define mangled(M.x) as $sth....@$cvers
4. Emit the contents of symbol M.x named by global symbol mangled(M.x)
5. Emit a string typeof(M.x) into local symbol "M.x", put in a
non-mapped section that is only made use of during compilation and/or
debugging.
6. .ll, .s, .o and .dylib don't need to learn anything new. Debuggers
need to learn how to work with the local symbols to extract type info
and how to demangle the global names.

I assumed debuggers would just use DWARF as they do now..


This means that the linker, if it's clever enough to do per-symbol
versioning, can do so and support an upgrade on a type-compatible
function. Each function winds up prefixed by the crate metadata and its
type signature compressed to a fixed size, but the full type info can be
recovered (eg. by the compiler or debugger) by reading the local symbols.

Sound ok? Is this what we were agreeing to on IRC?

I think so. Just one suggestion and two observation.

It might be better to define CMETA with a white list. As a user I would probably be surprised if fixing a typo in the my_random_note metadata item required relinking all programs using this crate.

The reason why we need some user visible metadata to be hashed is because two equally typed and named functions in two crates can collide and the user can then add a "prefix" (the name is not important) metadata to at least one of the crates to avoid the collision.

I do hope that we will have host tools that support multiple namespaces one day (not just the dynamic linker) :-)

-Graydon

Cheers,
Rafael
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to