Andy Trick and I had this conversation Thursday, and I thought I'd capture it 
here.

The Problem

It's a longstanding complaint that SIL uses drastically different code patterns 
for the same sequence of operations based on whether the types of the values 
involved are loadable or "address-only".  For example, suppose I have Swift 
code like this:
  let x, y : T
  let z = x + y

If T is a loadable type, this will generate SIL somewhat like this:
 // %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
 %operator = function_ref T.+
 %result = apply %operator(%lhs, %rhs)
 %z = %result

(copy_value doesn't currently exist, but it's easier to understand, and as it 
happens we're thinking of adding it back.)

If T is an address-only type, this will generate SIL somewhat like this:
  // %x and %y are values of type $*T
  %z = alloc_stack $T
  %lhs = alloc_stack $T
  copy_addr %x to [initialization] %lhs
  %rhs = alloc_stack $T
  copy_addr %y to [initialization] %rhs
  %operator = function_ref T.+
  apply %operator(%z, %lhs, %rhs)
  dealloc_stack %rhs
  dealloc_stack %lhs

Notably, we're explicitly modeling the memory in which values are stored, which 
is both very verbose and — more importantly — loses any interesting SSA 
properties for tracking actual values around.  And this has a bunch of 
secondary effects where various high-level operations like dynamic casts end up 
needing two different instructions based on whether the value is stored in 
memory.  This is pretty dumb, and it's a major contributor to the reality that 
generic code is currently very poorly optimized.

It does, however, have some significant advantages: since the memory allocation 
is explicit, it's quite natural to express optimizations that e.g. hoist or 
sink those allocations, and the next level of lowering (IRGen) can be very 
simple.

Addresses and address-only types

Swift is an imperative language, and imperative languages make formal memory 
locations part of the high-level semantics.  DI and SSA formation allow us to 
eliminate formal memory locations for most local variables, but class 
properties, global variables, escaped local variables, and pointers are all 
fundamentally un-SSA-able.  Therefore we will always have some concept of an 
address.

But most values of address-only type aren't really being stored in that sort of 
formal memory location; we're just representing them that way in SIL.  Why do 
we do this?  What is it about a type that makes it address-only?

Well, some types are inherently "memory-pinned": something about their 
representation only makes sense, or is only implementable, if the value is 
always kept in memory:
  - The representation of the value may involve interior pointers, as with 
LLVM's SmallVector.  This isn't currently a thing in Swift, but it's a 
possibility someday with the import of non-POD C++ types.
  - The address of the value may need to be registered elsewhere, as with weak 
references.
  - The value may allow internal mutation even from a shared reference, like a 
C++ class with a mutable field or a Rust atomic type; you can see weak 
references as analogous to this.
Such types are necessarily address-only at a low level.

Other types are address-only by abstraction: their representation isn't (fully) 
known, and so they have to be kept in memory (1) in case that representation 
includes another address-only value and (2) because there's no way to store a 
unbounded amount of data except in memory anyway.

But this sense of address-only types that we're describing is really a property 
of the final generated code.  It's necessary for LLVM IR generation to know 
about it, but it's not obviously necessary for SIL to know about it beyond the 
implications of the inherently memory-pinned cases above:
  - it is not generally safe to assume that the stored properties of a non-POD 
C++ type remain invariant across moves
  - weak references *do* represent the same value across moves, but of course 
that value can change dynamically at any time anyway, per the rules of weak 
references
  - mutable fields can be modified even by a shared borrowed reference, and so 
(if they are modeled at all in SIL at all, rather than just leaving the type 
opaque) there must be some way to project a mutable address from a shared 
borrow and so on.

Our current representation of address-only types arises directly from the 
low-level operations that are required, which does admit some interesting 
optimizations on its own, but the disadvantages of having to support wildly 
divergent code paths and completely give up SSA use/def chains are crippling.  
What would SIL look like if we abandoned this entirely?

All types as SIL scalars

The fundamental issue here is that IR-level lowering does need to place 
memory-pinned values into memory.  Suppose we take a value, use it as an enum 
case payload, inject that enum into an Any, and pass that to a function.  In a 
future world where we consistently SSA all local values, this ideally looks 
something like this:

 // %value is a $T
 %enum = enum #MyEnum.foo, %value : $T
 %any = existential $Any, %enum
 %fn = function_ref @bar
  apply %fn(%any)

If all of these types were loadable, lowering this to IR doesn't impose any 
abstraction costs, because we can just manipulate it as a bit-pattern in 
memory, which LLVM (really, any reasonable compiler infrastructure) should be 
quite good at analyzing and optimizing.  But if all of these types are 
address-only, that's not obviously true.  Let's look at the details of lowering 
to memory.

Because all types are movable — even in a world with Rust-style ownership — 
there's a fairly straightforward lowering that works for all possible 
functions: every address-only SIL value gets its own allocation which is 
deallocated along the dominance frontier of the value.  Indirect arguments use 
the passed-in buffer.  Basic block arguments are copied from the allocation for 
the branch argument value to the allocation for the corresponding destination 
block parameter.  Indirect return values are copied from the allocation for the 
return value to the pointer passed in.

This admits some obvious optimizations.  If "owned" SIL values are 
pseudo-linear — i.e. along every path from their definition point, it is 
statically evident that they are (optionally) borrowed multiple times, consumed 
exactly once, and then never used again — then the copies can instead be moves. 
 Standard register-allocation algorithms can recognize when basic block 
parameters can use the same location as their arguments.  SIL only permits a 
single return instruction, so there's no reason not to allocate returned values 
directly into the return slot.  These optimizations will tend to eliminate a 
lot of moves.

However, there's another problem, which is injections.  Consider the example 
above.  %any is an opaque existential, and initializing it formally involves 
allocating a buffer within the existential and moving the argument into that 
buffer.  Ideally, we would allocate that buffer first and then simply allocate 
%enum in-place into that buffer.  In this simple example, that's easy.  But if 
the control flow were more complex, detecting that this is possible becomes 
significantly more challenging, as does ensuring that the buffer is properly 
cleaned up along all paths.  For example, suppose that %value were computed by 
calling a throwing function:

  try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
  %enum = enum #MyEnum.foo, %value : $T
  %any = existential $Any, %enum
  %fn = function_ref @bar
  apply %fn(%any)
handler(%error: $Error):
  throw $error

Naive allocation here is going to introduce a lot of moves.  Optimally, we 
would receive the return value from %someFunction directly in the payload of 
%enum, which we want to build directly into the allocated existential buffer of 
%any.  But to do this, we actually need to allocate that existential buffer 
before executing the try_apply; and if the try_apply throws, we need to 
deallocate that existential buffer in the handler block.  The need to 
retroactively insert this kind of clean-up code adds a lot of complexity to 
this allocation approach.  Moreover, it's quite possible that complex 
intermediate control — for example, if there's a loop somewhere between the 
definition of a value and its consuming use — will tend to block this kind of 
analysis and cause more unnecessary moves.

In contrast, the current SIL-generation scheme tends to be aware of at least 
the immediate local uses of values and therefore emits a lot of these kinds of 
initialization "in-place" already.  But that said, it does proceed from a 
simple recursive examination of the AST, which means it will miss examples that 
are just slightly more opaque, like binding a return value as a let and only 
then returning it.  That kind of thing is currently left to the optimizer for 
no real good reason.

Summary

Overall, I think representing local address-only values as SIL scalars with 
full SSA use/def chains is really promising, and I do think we can write an 
allocation pass that does an acceptable job eliminating unnecessary moves.  In 
order to actually do this, though, I think we need two things:
  - We need SIL to be "pseudo-linear" as discussed above.  We really don't want 
the allocation pass to have to worry about keeping values alive past 
consumptions, and thus potentially having to insert copies instead of moves.
  - We need the allocation pass to handle exits before initialization (like 
with try_apply) and other sorts of interfering control flow.  It will not be 
acceptable for this to be a block-local analysis with only a trivial amount of 
cross-block argument merging.

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to