Greetings you delightful Rustic people,
Sound the death knell for segmented stacks.
The new runtime does not implement segmented stacks and it never will.
This decision has been brewing for some time, and now I'd like to make
it official and lay out the reasoning for posterity.
To recap, segmented stacks are a strategy for growing stacks
incrementally, starting from a single small stack segment, and appending
further segments as needed in a linked list as the stack grows. This
allows threads to be very compact so that many can be created
concurrently. The previous Rust runtime did implement segmented stacks,
and that experience makes us believe that the performance tradeoffs
required are not compatible with Rust's goals as a high performance
systems language.
The performance problems associated with segmented stacks can be
summerized as: switching stacks has a cost. Though that cost can be
minimized through extensive tuning and micro-optimization (much of which
we admittedly did not pursue), it will never be free, and we've
concluded that the effort and complexity of continuing down that route
is not justified.
We've seen the problem manifest primarily in three areas:
* "stack thrashing" - when out of stack a function call will force an
allocation of a new segment, which is later freed upon return. This is
expensive even when the new stack segmented is cached. The result is
unpredictable and severe drops in performance whenever a stack boundary
happens to fall inside a tight loop. ([Similar concerns][1] are pushing
Go away from a segmented stack scheme as well - they call it the "hot
split" problem).
[1]:
https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub
* FFI - foreign code typically expects to have large stacks, so using
the FFI often requires switching stacks. Avoiding this overhead would
require an elaborate and inherently unsafe system of annotation (#8822),
increasing a burden on the FFI interface.
* LLVM libc optimizations and intrinsic fallbacks - LLVM will transform
calls to some libc functions into intrinsics and some intrinsics into
runtime function calls, for reasons of performance and platform
compatibility. Obvious solutions to making these compatible with
segmented stacks impose a high minimum stack requirement, partially
defeating the point of a segmented stack.
Instead of segmented stacks we're going to rely on the OS and MMU to
help us map pages lazily. Although the details aren't clear yet, I
expect that on 64-bit platforms the number of concurrent tasks will be
comparable to using segmented stacks. On 32-bit platforms, with their
limited address space, the situation will not be as good, but this is a
calculated risk that we can live without the same amount of concurrency
there.
Regards,
Brian
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev