[rust-dev] Statically linking rustrt

2011-03-19 Thread Patrick Walton
I'm looking at the generated assembly code for std.rc (which is now 
compiling, although it fails to link due to a strange mangling LLVM is 
performing on duplicate native symbols). Even with all of LLVM's 
optimizations, our hash insertion code has 4x the instruction count of 
that of glib.


One major reason for this is that we have enormous overhead when calling 
upcalls like get_type_desc() and size_of(). These calls are completely 
opaque to LLVM. Even if we fixed the crate-relative encoding issues, 
these calls would still be opaque to LLVM.


Most upcalls are trivial (get_type_desc() is an exception; I don't know 
why it needs to exist, actually). For those, it would be great to inline 
them. To do that, we need LTO, which basically means that we compile 
rustrt with clang and link the resulting .bc together with the .bc that 
rustc yields before doing LLVM's optimization passes. I think this would 
be a huge win; we would remove all the upcall glue and make these 
low-level calls, of which there are quite a lot, no longer opaque to LLVM.


Thoughts?

Patrick
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically linking rustrt

2011-03-19 Thread Graydon Hoare

On 19/03/2011 4:02 PM, Patrick Walton wrote:

I'm looking at the generated assembly code for std.rc (which is now
compiling, although it fails to link due to a strange mangling LLVM is
performing on duplicate native symbols).


Wooo! That's incredible. You're a star. I'm totally impressed with the 
dozens and dozens of fixes you've plowed through to get to this point. 
It's inspiring to watch! I'm trying to keep up; sorry I've been slower.



Even with all of LLVM's
optimizations, our hash insertion code has 4x the instruction count of
that of glib.


Our hash-insert code is 4x the size of glib's hash-insert code
  or
Our hash-insert code is 4x the size of all of glib combined
?

I'd believe either right now, and if we're only talking about the 
former, I'd be really surprised / pleased! Low-hanging fruit is called 
that for a reason, and we've hardly spent a minute fixing even the 
simplest of systemic cost centers in the optimized code. Off the top of 
my head, we'll at least get big wins out of:


  - Fixing the tydesc crate-relative encoding issue.
  - Opportunistic const-ifying of coincidentally const expressions.
  - Static removal of a bunch of redundant refcount operations.
  - Stripping out redundant size calculations, unused derived
tydescs, non-escapting allocations, and similar indirections.
  - Teaching LLVM how to make C calls itself, as a calling convention,
not via the current cumbersome call-through-asm-glue path.
  - Digging into the unique pointer issue.

And I'm sure a little profiling and code-inspection will make a number 
of other issues jump right out. As you've found..



One major reason for this is that we have enormous overhead when calling
upcalls like get_type_desc() and size_of(). These calls are completely
opaque to LLVM. Even if we fixed the crate-relative encoding issues,
these calls would still be opaque to LLVM.


size_of? Hm. That's not an upcall. I thought we were generating the size 
calculation code inline (on demand, from GEP_tup_like).



Most upcalls are trivial (get_type_desc() is an exception; I don't know
why it needs to exist, actually).


It needs to exist to acquire derived type descriptors, dynamically. They 
are not static. Though we can probably do a little analysis and figure 
out which cases are degenerate -- are static -- and dodge the upcall. 
And/or consolidate multiple redundant upcalls occurring in the same 
frame / execution context. We're doing everything as simple as possible now.



For those, it would be great to inline
them. To do that, we need LTO, which basically means that we compile
rustrt with clang and link the resulting .bc together with the .bc that
rustc yields before doing LLVM's optimization passes. I think this would
be a huge win; we would remove all the upcall glue and make these
low-level calls, of which there are quite a lot, no longer opaque to LLVM.

Thoughts?


Yes. This is something we'll almost certainly wind up doing. Some 
runtime support logic is called rarely enough to live in a shared 
object; some is custom-enough to require compiler-generation on a 
case-by-case basis, and some is somewhat generic (so can be written 
once, in rust or C++) but reused-and-inlined all over a compilation 
unit. That stuff will probably wind up migrating to glue.bc and get 
LTO'ed into every compilation unit. Andreas has been anticipating this 
kind of easy inlining between C++ support code and rust code since this 
time last year; it's one of the reasons he was so keen on using LLVM :)


To get there we'll need to (at least) have completed the removal of the 
asm glue bits and taught LLVM how to make native calls (stack-to-stack) 
as a calling convention. Probably some other bits of LLVM hacking, and 
lots of build-system hacking, and shoving things around in the runtime. 
But I absolutely intend to get there.


-Graydon
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev