[swift-dev] Notes on the coroutines ABI

John McCall via swift-dev Wed, 13 Sep 2017 16:27:12 -0700

Even if we don't do coroutines as a user feature in Swift 5, we want to make 
sure we implement coroutine-esque features (like materializeForSet) with an ABI 
that we're satisfied with.  Joe Groff and I were talking about the low-level 
structure of that ABI, and I wanted to make sure that it was available for 
general interest + debate.


Apologies for the format.  We'll write up a proper summary before implementing 
anything.

John.

rjmccall
3:47 jgroff: Hey, so I heard you were thinking that coroutines might not be 
necessary for ABI stability.
3:47 I'd be interested in hearing you out on that.

Joe Groff
3:57 it seems to me like we can get the ABI we want without the language 
feature. like, if we literally had no time we could in principle accept 
materializeForSet as is, and quickly hack transition get to materializeForGet
3:58 and we can pile on addressor-like one offs for the stdlib to do what it 
needs to with dictionaries

rjmccall
3:59 Ah.  Yes, I agree that we could probably make co-routines conform to the 
materializeForSet ABI if we needed to.

Joe Groff
3:59 generators are maybe a bit more interesting; at minimum though you could 
use the interface of IteratorProtocol as their abi if we needed to
3:59 at least, for consuming generators

rjmccall
4:01 Hmm.  Possibly true.

Joe Groff
4:01 it seems to me like we could also tweak the ABI of our existing interfaces 
to be what we want for generator coroutines there too

rjmccall
4:02 Which interfaces?

Joe Groff
4:02 like, slap an attribute on IteratorProtocol saying "this will always be a 
one-method protocol that looks like a coroutine"

rjmccall
4:03 Oh, I see, to special-case it in the ABI so that we can get the ideal 
layout.
4:06 Yeah, I guess I don't disagree that on a primitive ABI level we could make 
future coroutine work conform to whatever ABI we roll with in Swift 5.

Joe Groff
4:06 yeah, modulo borrowing concerns

rjmccall
4:06 Right, which are significant, of course.

Joe Groff
4:07 it's a fair question whether doing special-case work to get the ABIs right 
without the feature saves time over doing the feature properly

rjmccall
4:07 Yeah, that's what I was about to say.

jordan_rose
4:07 arclite borrowing? “if I am on Swift 5 stdlib then I can reach into Array 
like this”

Joe Groff
4:08 our existing accessor ABI seems to me like it's pretty close to what we 
want, and we could refine it within the existing model without too much churn

rjmccall
4:08 It feels like we'd end up where we did with addressors, iterating because 
we realize that we need more and more in the library.
4:09 Well, the big thing with accessors is that I think we want to replace a 
getter with a materializeForGet.

Joe Groff
4:09 right

rjmccall
4:09 And I'm really hesitant to roll out more stuff on the materializeForSet 
pattern.
4:09 But you're right that we could.

Joe Groff
4:09 i was trying to say that we don't necessarily need coroutines to do that; 
we could do compiler-driven mangling of the current source-level property model

rjmccall
4:10 Right, like we do with materializeForSet.

Joe Groff
4:21 from our previous discussions, i recall two general possible ABIs for 
accessors—could go with materializeForSet-style returning a callback, or pass 
down a nested function. we ought to be able to make materializeFor[GS]et work 
either way today
4:21 independent of languagew ork

rjmccall
4:22 Yeah.  I think the callback ABI doesn't work if we need to support 
accessor usage in other coroutines, though.

Joe Groff
4:23 yeah, certainly not compiler-inverted ones at least

rjmccall
4:23 Right, obviously it's fine if somehow we can avoid compiler inversion.
4:25 Unless you really think there's something promising there, though, I think 
we need to assume that something more like the current ABI is what we're going 
to roll with.

Joe Groff
4:25 i can't think of anything else we could realistically do this year

rjmccall
4:27 Well, if we can base async functions on it, we can do that later.  My 
assumption has been that there's not really anything we can do there at all 
that wouldn't be extremely invasive.
4:27 i.e. it's not just a because-we're-crammed-this-year problem.
4:27 Not sure that made any sense; let me rephrase.

Joe Groff
4:28 i think i get it

rjmccall
4:28 If there's an alternative way to lower async functions that doesn't 
require compiler inversion, and it's something we can pursue (presumably) next 
year, then we can assume we'll have that.
4:29 And we can base the accessor ABI around that assumption, e.g. by using 
coroutines.
4:29 But I haven't heard anything that makes me think that there's actually a 
feasible alternative there that we can pursue next year.
4:31 Okay.  If we accept that coroutines require some sort of control 
inversion, then we should design what we think that ABI should look like.

Joe Groff
4:32 well, given infinite time, we could keep a coroutine's stack in some 
relocatable buffer, and compile async functions to burn a couple context 
registers to track the base of the buffer and sp relative to it, and keep all 
"addresses" of stuff on that stack relative. beginAsync allocates the buffer; 
suspendAsync gives you a refcounted handle to the buffer; pushing onto the 
stack potentially reallocs
4:32 but i don't think that's realistic
4:34 if anything else we really wouldn't want to malloc a context for every 
opaque access

rjmccall
4:35 Yeah, so I definitely don't think we can convince the backend to generate 
code that way.

Joe Groff
4:35 right
4:35 that would be a year-long llvm project in itself

rjmccall
4:35 Yeah, at least.
4:36 We can maybe make something like that the basis of a higher-level 
lowering, though.
4:37 i.e. that's how you get local memory that persists across a save point.
4:37 Er, a yield point.
4:38 I think we'd probably want to ensure that a coroutine can save a certain 
amount of state without needing to interact with the runtime.
4:39 Just for the benefit of really small coroutines.

Joe Groff
4:39 something like the caller-provided buffer for materializeForSet?

rjmccall
4:39 Right.
4:39 Like maybe the stack always has at least 4 words of free space?

Joe Groff
4:40 yeah, it'd be nice to avoid going to the heap if continuations never 
escape, which would be the common case for accesses and for loops

rjmccall
4:42 And something starting a stack, e.g. calling materializeForSet, can just 
allocate some fixed amount of space within their own stack and do some cheap 
initialization, and that's good enough for both the fast path and the slow path.
4:43 Maybe… pass down two pointers, a scratch space and a stack context?
4:44 Scratch space is [K x i8*], stack context is some control structure which 
can just be zero-initialized.

Joe Groff
4:44 if the caller knows it's going to store the yielded context for later, can 
it emplace the scratch space in an escapable object?
4:44 what's the stack context for?

rjmccall
4:46 Yeah, we don't necessary even need to promise that the scratch space stay 
at the same address.  If the coroutine needs stable memory, they can allocate 
it.

Joe Groff
4:46 so anything stored there would have to be memcpyable?
4:46 makes sense

rjmccall
4:47 I mean, we could go either way on that.
4:47 But yeah, I think it might be useful to allow it to be memcpy'ed.  
Allocate it locally and then move it aside if you need to.
4:48 RE: stack context: I was thinking basically a pool of memory, so that 
repeated requests to allocate could resolve faster.
4:49 It's a trade-off, though, because its existence means that the originator 
needs to clean it up.
4:49 Which is presumably a runtime call.
4:50 Maybe one that could be elided in a fast path, but still, code size.
4:50 If you don't have it, then coroutines need to malloc every time.
4:51 (Unless they fit in scratch.)

Joe Groff
4:53 did anything come of the alloca-into-caller talk? you could theoretically 
let the coroutine push as much context as it needs on the stack, and make the 
caller move it if it needs to

rjmccall
4:53 Another benefit of a scratch space: it could potentially be pre-allocated 
to the known-right size in the original caller to make the system dynamically 
allocation-free.

Joe Groff
4:54 how would you get the known-right size?

rjmccall
4:54 Experimentation.  It's not a great solution.
4:55 Or just picking a large-enough buffer in the same way that people pick 
large-enough stack sizes.
4:56 Alloca-into-caller still has the problem of requiring the memory to be 
movable, which gets into that big-LLVM-project space.
4:57 Well, okay, maybe less big.
4:57 We'd be filling this in some sort of coroutine lowering pass and deciding 
what to put there.

Joe Groff
4:58 yeah, seems a bit more tractable

rjmccall
4:58 Maybe tough at the LLVM level.

Joe Groff
4:58 across a yield point i think you have to reload all your local state out 
of a state union of some sort in any compiler transform

rjmccall
4:59 I guess we could add annotations.
4:59 Right.

Joe Groff
5:01 the state storage type you derive as part of the coroutine transform could 
be what you alloca-into-caller and leave for the caller to either move for 
later or immediate restart
5:01 immediately*

rjmccall
5:02 Yeah.  So you only have to malloc if you actually need to persist 
something that has to stay at a fixed address.

Joe Groff
5:02 yeah
5:03 alternatively, we could also put an ABI cap on the size of the struct, so 
it always fits into a size the caller can preallocate

rjmccall
5:03 Now I guess you'd potentially be doing quadratic amounts of copying if you 
had coroutines forwarding to each other.

Joe Groff
5:03 yeah
5:04 so maybe it's better to let the caller emplace the state

rjmccall
5:05 Yeah.  I think we can assume that accessors will usually be shallow (depth 
1, overwhelmingly) and that async would need to get copied anyway.
5:06 Generators I'm not as sure about.

Joe Groff
5:07 it seems to me like generators would still primarily be relatively shallow 
and not escaped
5:07 either because you're digesting the whole thing in a for loop or in some 
silghtly more complex, but still local, control flow

rjmccall
5:08 Yeah.  And relatively shallow probably still means depth=1.
5:09 Especially after inlining.
5:09 Okay.  So… some kind of fixed-size scratch space that can be memcpy'ed by 
the caller between invocations.
5:10 For accessors, ramp function returns value pointer + optional continuation 
function pointer.

Joe Groff
5:11 is it better to return an optional pointer, or a pointer to a no-op 
function?

rjmccall
5:11 Optional?  Maybe better for code size to just return a non-optional 
pointer and… yeah.
5:11 Typical time/code size trade-off, I think.

Joe Groff
5:12 i guess the no-op is always two branches vs one or three with the null 
check, but the return branch ought to be predictable at least

rjmccall
5:13 Yeah.  And the same no-op function can be reused a lot.

rjmccall
5:14 For generators, I think ramp + continuation functions return value/address 
+ optional continuation function pointer, with null meaning done + no value.
5:17 Async I guess would be the same way.

Joe Groff
5:17 yeah, generators and async ought to look mostly the same at the 
implementation level

spestov
5:17 I wonder how incredibly painful it would be to try to implement multi-shot 
continuations with reference counting

rjmccall
5:18 Er, actually, is that true?  Maybe async actually just returns void.

Joe Groff
5:18 an async function is in a sense a generator that yields void

rjmccall
5:18 No, I mean maybe the continuation handled privately.

Joe Groff
5:18 or, potentially, some context information about what it's waiting on
5:19 ah, you mean, make it the callee's responsibility to schedule the 
continuation and just return void to the sync context?

rjmccall
5:19 Right.
5:20 Like, what would the caller do with the continuation function?

Joe Groff
5:20 sure, that might also be nice for retrofitting into async void-returning 
ABIs if needed
5:21 in the universe of possible designs, maybe async yields a list of event 
fds back to the caller event loop for queuing, instead of queuing itself
5:21 with chris's proposal though it's up to the callee

rjmccall
5:21 Ah, sure, if you had a universal event system to wait on, that would work.
5:22 Just return an event handle + a continuation to notify upon that event.
5:22 But yeah, it's hard to retrofit that.
5:24 Okay, if async functions schedule themselves, they don't have scratch 
buffers, either.
5:25 Thoughts on scratch size for accessors/generators?  I guess they can vary 
separately.
5:26 Generators probably want more scratch space overall by default.
5:26 And there's some benefit to that, since it means generators can call 
accessors.
5:26 Maybe 4 pointers and 8 pointers?
5:26 16-byte aligned in both cases, just because.
5:27 Or maybe not, maybe just pointer-aligned is fine.

Joe Groff
5:28 how big is a typical lazy transformer struct

rjmccall
5:28 You mean like a lazy collection?

Joe Groff
5:28 yeah
5:29 a closure is two words

rjmccall
5:29 There's going to at least be a transform function, and then I guess the 
underlying collection.

Joe Groff
5:29 an array might be one or two words

rjmccall
5:29 Yeah.

Joe Groff
5:30 8 pointers gives you a collection and three closures, i guess that's ok

rjmccall
5:31 Yeah, or a collection, a closure, and an accessor call.
5:31 Hmm, not an accessor call, because you also have to save the continuation.
5:33 Other than wanting accessors to fit cleanly within generators, I don't 
think we need to be stingy here.

Joe Groff
5:36 yeah, maybe a bit more space for generators, since it's going to be common 
to yield an access, and there's more interesting state that could accumulate

rjmccall
5:36 12?  16?
5:36 12 is probably fine.

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

[swift-dev] Notes on the coroutines ABI

Reply via email to