Re: [swift-evolution] Contextualizing async coroutines

Pierre Habouzit via swift-evolution Thu, 31 Aug 2017 19:51:07 -0700

> On Aug 31, 2017, at 11:35 AM, Joe Groff via swift-evolution 
> <[email protected]> wrote:
> 
> The coroutine proposal as it stands essentially exposes raw delimited 
> continuations. While this is a flexible and expressive feature in the 
> abstract, for the concrete purpose of representing asynchronous coroutines, 
> it provides weak user-level guarantees about where their code might be 
> running after being resumed from suspension, and puts a lot of pressure on 
> APIs to be well-behaved in this respect. And if we're building toward actors, 
> where async actor methods should be guaranteed to run "in the actor", I think 
> we'll *need* something more than the bare-bones delimited continuation 
> approach to get there. I think the proposal's desire to keep coroutines 
> independent of a specific runtime model is a good idea, but I also think 
> there are a couple possible modifications we could add to the design to make 
> it easier to reason about what context things run in for any runtime model 
> that benefits from async/await:
> 
> # Coroutine context
> 
> Associating a context value with a coroutine would let us thread useful 
> information through the execution of the coroutine. This is particularly 
> useful for GCD, so you could attach a queue, QoS, and other attributes to the 
> coroutine, since these aren't reliably available from the global environment. 
> It could be a performance improvement even for things like per-pthread 
> queues, since coroutine context should be cheaper to access than 
> pthread_self.


> [...]


YES!

We need that. You're very focused on performance and affinity and whatnot here, 
but knowing where the completion will run upfront is critical for priority 
inheritance purposes.

This is exactly the spirit of the mail I just wrote in reply to Chris a bit 
earlier tonight. Execution context matters to the OS, a lot.

The OS needs to know two things:
- where is the precursor of this coroutine (which work is preventing the 
coroutine to execute)
- where will the coroutine go (which for GCD is critical because the OS lazily 
attributes threads, so any typical OS primitive to raise an existing thread 
priority doesn't work)

In other words, a coroutine needs:
- various tags (QoS, logging context, ...)
- precursors / reverse dependencies
- where it will execute (whether it's a dispatch queue or a runloop is 
completely irrelevant though).


And then if you do it that way when the precursor fires and allows for your 
coroutine to be scheduled, then it can actually schedule it right away on the 
right execution context and minimize context switches (which are way worse than 
shared mutable state for your performance).


> # `onResume` hooks
> 
> Relying on coroutine context alone still leaves responsibility wholly on 
> suspending APIs to pay attention to the coroutine context and schedule the 
> continuation correctly. You'd still have the expression problem when 
> coroutine-spawning APIs from one framework interact with suspending APIs from 
> another framework that doesn't understand the spawning framework's desired 
> scheduling policy. We could provide some defense against this by letting the 
> coroutine control its own resumption with an "onResume" hook, which would run 
> when a suspended continuation is invoked instead of immediately resuming the 
> coroutine. That would let the coroutine-aware dispatch_async example from 
> above do something like this, to ensure the continuation always ends up back 
> on the correct queue:
> 
> extension DispatchQueue {
> func `async`(_ body: () async -> ()) {
>   dispatch_async(self, {
>     beginAsync(
>       context: self,
>       body: { await body() },
>       onResume: { continuation in
>         // Defensively hop to the right queue
>         dispatch_async(self, continuation)
>       }
>     )
>   })
> }
> }
> 
> This would let spawning APIs provide a stronger guarantee that the spawned 
> coroutine is always executing as if scheduled by a specific queue/actor/event 
> loop/HWND/etc., even if later suspended by an async API working in a 
> different paradigm. This would also let you more strongly associate a 
> coroutine with a future object representing its completion:
> 
> class CoroutineFuture<T> {
> enum State {
>   case busy // currently running
>   case suspended(() -> ()) // suspended
>   case success(T) // completed with success
>   case failure(Error) // completed with error
> }
> 
> var state: State = .busy
> 
> init(_ body: () async -> T) {
> 
>   beginAsync(
>     body: {
>       do {
>         self.state = .success(await body())
>       } catch {
>         self.state = .failure(error)
>       }
>     },
>     onResume: { continuation in
>       assert(self.state == .busy, "already running?!")
>       self.state = .suspended(continuation)
>     }
>   }
> }
> 
> // Return the result of the future, or try to make progress computing it
> func poll() throws -> T? {
>   switch state {
>   case .busy:
>     return nil
>   case .suspended(let cont):
>     cont()
>     switch state {
>     case .success(let value):
>       return value
>     case .failure(let error):
>       throw error
>     case .busy, .suspended:
>       return nil
>     }
>   case .success(let value):
>     return value
>   case .error(let error):
>     throw error
> }
> }
> 
> 
> A downside of this design is that it incurs some cost from defensive 
> rescheduling on the continuation side, and also prevents writing APIs that 
> intentionally change context across an `await`, like a theoretical 
> "goToMainThread()" function (though you could do that by spawning a 
> semantically-independent coroutine associated with the main thread, which 
> might be a better design anyway).

Given the limitations, I'm very skeptical. Also in general suspending/resuming 
work is very difficult to handle for a runtime (implementation wise), has large 
memory costs, and breaks priority inversion avoidance. 
dispatch_suspend()/dispatch_resume() is one of the banes of my existence when 
it comes to dispatch API surface. It only makes sense for dispatch source "I 
don't want to receive these events anymore for a while" is a perfectly valid 
thing to say or do. But suspending a queue or work is ripping the carpet from 
under the feet of the OS as you just basically make all work that is depending 
on the suspended one invisible and impossible to reason about.

The proper way to do something akin to suspension is really to "fail" your 
operation with a "You need to redrive me later", or implement an event 
monitoring system inside the subsystem providing the Actor that wants 
suspension to have the client handle the redrive/monitoring, this way the 
priority relationship is established and the OS can reason about it. Said 
another way, the Actor should fail with an error that gives you some kind of 
"resume token" that the requestor can hold and redrive according to his own 
rules and in a way that it is clear he's the waiter. Most of the time 
suspension() is a waiting-on-behalf-of relationship and this is a bad thing to 
build (except in priority homogenous environments, which iOS/macOS are *not*).

Also implementing the state you described requires more synchronization than 
you want to be useful: if you want to take action after observing a state, then 
you really really really don't want that state to change while you perform the 
consequence. the "on$Event" hook approach (which dispatch uses for dispatch 
sources e.g.) is much better because the ordering and serialization is provided 
by the actor itself. The only states that are valid to expose as a getter are 
states that you cannot go back from: succes, failure, error, canceled are all 
perfectly fine states to expose as getters because they only change state once. 
.suspended/.busy is not such a thing.

FWIW dispatch sources, and more importantly dispatch mach channels (which is 
the private interface that is used to implement XPC Connections) have a design 
that try really really really hard to not fall into any these pitfalls, are 
priority inheritance friendly, execute on *distributed* execution contexts, and 
have a state machine exposed through "on$Event" callbacks. We should benefit 
from the many years of experience that are condensed in these implementations 
when thinking about Actors and the primitives they provide.

-Pierre

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Contextualizing async coroutines

Reply via email to