On 19/03/2011 4:02 PM, Patrick Walton wrote:
I'm looking at the generated assembly code for std.rc (which is now
compiling, although it fails to link due to a strange mangling LLVM is
performing on duplicate native symbols).
Wooo! That's incredible. You're a star. I'm totally impressed with the
dozens and dozens of fixes you've plowed through to get to this point.
It's inspiring to watch! I'm trying to keep up; sorry I've been slower.
Even with all of LLVM's
optimizations, our hash insertion code has 4x the instruction count of
that of glib.
Our hash-insert code is 4x the size of glib's hash-insert code
or
Our hash-insert code is 4x the size of all of glib combined
?
I'd believe either right now, and if we're only talking about the
former, I'd be really surprised / pleased! Low-hanging fruit is called
that for a reason, and we've hardly spent a minute fixing even the
simplest of systemic cost centers in the optimized code. Off the top of
my head, we'll at least get big wins out of:
- Fixing the tydesc crate-relative encoding issue.
- Opportunistic const-ifying of coincidentally const expressions.
- Static removal of a bunch of redundant refcount operations.
- Stripping out redundant size calculations, unused derived
tydescs, non-escapting allocations, and similar indirections.
- Teaching LLVM how to make C calls itself, as a calling convention,
not via the current cumbersome call-through-asm-glue path.
- Digging into the unique pointer issue.
And I'm sure a little profiling and code-inspection will make a number
of other issues jump right out. As you've found..
One major reason for this is that we have enormous overhead when calling
upcalls like get_type_desc() and size_of(). These calls are completely
opaque to LLVM. Even if we fixed the crate-relative encoding issues,
these calls would still be opaque to LLVM.
size_of? Hm. That's not an upcall. I thought we were generating the size
calculation code inline (on demand, from GEP_tup_like).
Most upcalls are trivial (get_type_desc() is an exception; I don't know
why it needs to exist, actually).
It needs to exist to acquire derived type descriptors, dynamically. They
are not static. Though we can probably do a little analysis and figure
out which cases are degenerate -- are static -- and dodge the upcall.
And/or consolidate multiple redundant upcalls occurring in the same
frame / execution context. We're doing everything as simple as possible now.
For those, it would be great to inline
them. To do that, we need LTO, which basically means that we compile
rustrt with clang and link the resulting .bc together with the .bc that
rustc yields before doing LLVM's optimization passes. I think this would
be a huge win; we would remove all the upcall glue and make these
low-level calls, of which there are quite a lot, no longer opaque to LLVM.
Thoughts?
Yes. This is something we'll almost certainly wind up doing. Some
runtime support logic is called rarely enough to live in a shared
object; some is custom-enough to require compiler-generation on a
case-by-case basis, and some is somewhat generic (so can be written
once, in rust or C++) but reused-and-inlined all over a compilation
unit. That stuff will probably wind up migrating to glue.bc and get
LTO'ed into every compilation unit. Andreas has been anticipating this
kind of easy inlining between C++ support code and rust code since this
time last year; it's one of the reasons he was so keen on using LLVM :)
To get there we'll need to (at least) have completed the removal of the
asm glue bits and taught LLVM how to make native calls (stack-to-stack)
as a calling convention. Probably some other bits of LLVM hacking, and
lots of build-system hacking, and shoving things around in the runtime.
But I absolutely intend to get there.
-Graydon
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev