On 10-12-24 09:30 PM, Rafael Ávila de Espíndola wrote:

Not having a global namespace is not a big problem for exporting
symbols. We just mangle them. For example, a crate could export the
function "mod1.foo" and the constant "mod2.foo".

Agreed.

Consider first the namespace problem. A program that uses the function
foo from crate1 and the function foo from crate2 will have two undefined
references to foo. Both Mach-O and PE/COFF should work. It is possible
to implement direct binding on ELF, but It think only Solaris did it.

Ok. Let's develop a workaround for ELF and then maybe someday petition ELF's linker-authors to support direct binding. Michael Meeks appears to have already authored a patch, though Ulrich doesn't like it.

http://sourceware.org/ml/binutils/2005-10/msg00436.html

I tested that this works on OS X by creating a C program that references
symbols from two libraries and then using an hex editor to rename the
references :-)

Can't you coax their linker into doing so with some kind of direct-binding flag?


On ELF things are a bit harder. Hopefully we will get direct binding on
Linux some day, but we have to decide what to do before that.

One way to do it is to write a table mapping undefined references to
DT_NEEDED. Just like how it is done to implement direct binding. The
startup code can then patch up the GOT and PLT. It is wasteful, but
probably better then creating a new format. Using the regular symbol
table also lets the user run nm, objdump, readelf, etc.

Yeah. That's what Michael's patch does, I think.

* How to create the shared libraries/executables.

Using an hex editor is probably not something we want in the build
pipeline :-) We have to figure out how to generate these strange libraries.

The normal pipeline is *.rs -> .bc -> .s -> .o -> .so/.dylib/.dll

His patch encoded a new .direct section, I think. But we could approach it differently. I don't particularly want to *depend* on a solution that requires glibc changes. I think you're right that we'll need to work around for ELF, at least. Maybe everything, since LLVM itself has similar limitations.

A possible hack is to mangle the undefined references, link normally and
then edit the shared library. In the example of a crate using a function
'foo' from crate 'bar' and another function 'foo' from crate 'zed', we
wold first produce a .o that had undefined references to bar.foo and
zed.foo (the prefixes can be arbitrary). The linker will propagate these
undefined references to the .so/.dylib/.dll and we can edit them.

Nah, too painful.

During static linking our linker can do any search we want, but we don't
always control the dynamic linker. Even if we do implement a new dynamic
linker, having support for searching every system library for the one
that declares the metadata item 'color = "blue"' is probably a bit too
expensive.

Agreed, sigh. This is the part where it's actually useful to have a '-Bdirect' flag supported in the linker, eh?

We should define a small set of metadata items (name and version most
likely), that a crate must define. These can then be added to the name
of the file (crate-foobar-2.3.dylib) and are the only metadata that we
search for at runtime. The startup code could still check that 'color =
"blue"' and produce an error if not.

* A Hack to avoid most of this for now.

We can just add a global_prefix item to the crate files. If not declared
it could default to a sha1 for example. This would already be a lot
better than what we have in java since the user would still say

use foo;
import foo.bar;

and just have to add a

global_prefix "org.mozilla....."

once to foo's crate.

Opinions?

Going down this road makes me further question whether we are just fighting a losing battle against baked-in assumptions in the system tools.

(We discussed this today on IRC, so I'm just writing up what we got to)

Let's propose this instead: "ideally no global namespace, but we're going to fake it just a bit on this toolchain": IOW provide only "vanishingly small chance of collision in the top-level global namespace". Assume CHF is some cryptographic hash, sha1 or sha3 or whatever. Perhaps truncated to something non-horrible like 16 hex digits (64-bits). Then:

1. Define two crate metadata tags as mandatory. Name + version. Call
   these CNAME and CVERS.
2. Define CMETA as CHF(all non-$CNAME, non-$CVERS metadata in a crate).
3. Define $CMH as CHF($CMETA)
4. Compile a crate down to $CNAME-$CMH-$CVERS.dylib

Each crate is "almost globally uniquely named" and the search done by the compiler will almost-always (with vanishingly small odds against) be repeated precisely by the runtime linker. If the linker supports versioned symbols, it can use that logic to handle type-compatible version upgrades on the same-named crate.

Then for symbols:

1. Define typeof(sym) to be a canonical encoding of a type, with nominal
   types expanding out to a mangled name as below:
1. Define STH(sym) as CHF($CNAME,$CMH,typeof(symbol))
2. Flatten names by module path, so
      mod M { fn x() { .. } }
   gets flattened to:
      M.x
3. Define mangled(M.x) as $sth....@$cvers
4. Emit the contents of symbol M.x named by global symbol mangled(M.x)
5. Emit a string typeof(M.x) into local symbol "M.x", put in a
   non-mapped section that is only made use of during compilation and/or
   debugging.
6. .ll, .s, .o and .dylib don't need to learn anything new. Debuggers
   need to learn how to work with the local symbols to extract type info
   and how to demangle the global names.

This means that the linker, if it's clever enough to do per-symbol versioning, can do so and support an upgrade on a type-compatible function. Each function winds up prefixed by the crate metadata and its type signature compressed to a fixed size, but the full type info can be recovered (eg. by the compiler or debugger) by reading the local symbols.

Sound ok? Is this what we were agreeing to on IRC?

-Graydon


_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to