Re: [swift-evolution] Contextualizing async coroutines

Joe Groff via swift-evolution Fri, 01 Sep 2017 06:45:05 -0700


> On Aug 31, 2017, at 7:50 PM, Pierre Habouzit <[email protected]> wrote:
> 
>> On Aug 31, 2017, at 11:35 AM, Joe Groff via swift-evolution 
>> <[email protected]> wrote:
>> 
>> The coroutine proposal as it stands essentially exposes raw delimited 
>> continuations. While this is a flexible and expressive feature in the 
>> abstract, for the concrete purpose of representing asynchronous coroutines, 
>> it provides weak user-level guarantees about where their code might be 
>> running after being resumed from suspension, and puts a lot of pressure on 
>> APIs to be well-behaved in this respect. And if we're building toward 
>> actors, where async actor methods should be guaranteed to run "in the 
>> actor", I think we'll *need* something more than the bare-bones delimited 
>> continuation approach to get there. I think the proposal's desire to keep 
>> coroutines independent of a specific runtime model is a good idea, but I 
>> also think there are a couple possible modifications we could add to the 
>> design to make it easier to reason about what context things run in for any 
>> runtime model that benefits from async/await:
>> 
>> # Coroutine context
>> 
>> Associating a context value with a coroutine would let us thread useful 
>> information through the execution of the coroutine. This is particularly 
>> useful for GCD, so you could attach a queue, QoS, and other attributes to 
>> the coroutine, since these aren't reliably available from the global 
>> environment. It could be a performance improvement even for things like 
>> per-pthread queues, since coroutine context should be cheaper to access than 
>> pthread_self. 
> 
>> [...]
> 
> 
> YES!
> 
> We need that. You're very focused on performance and affinity and whatnot 
> here, but knowing where the completion will run upfront is critical for 
> priority inheritance purposes.
> 
> This is exactly the spirit of the mail I just wrote in reply to Chris a bit 
> earlier tonight. Execution context matters to the OS, a lot.
> 
> The OS needs to know two things:
> - where is the precursor of this coroutine (which work is preventing the 
> coroutine to execute)
> - where will the coroutine go (which for GCD is critical because the OS 
> lazily attributes threads, so any typical OS primitive to raise an existing 
> thread priority doesn't work)
> 
> In other words, a coroutine needs:
> - various tags (QoS, logging context, ...)
> - precursors / reverse dependencies
> - where it will execute (whether it's a dispatch queue or a runloop is 
> completely irrelevant though).
> 
> 
> And then if you do it that way when the precursor fires and allows for your 
> coroutine to be scheduled, then it can actually schedule it right away on the 
> right execution context and minimize context switches (which are way worse 
> than shared mutable state for your performance).
> 
> 
>> # `onResume` hooks
>> 
>> Relying on coroutine context alone still leaves responsibility wholly on 
>> suspending APIs to pay attention to the coroutine context and schedule the 
>> continuation correctly. You'd still have the expression problem when 
>> coroutine-spawning APIs from one framework interact with suspending APIs 
>> from another framework that doesn't understand the spawning framework's 
>> desired scheduling policy. We could provide some defense against this by 
>> letting the coroutine control its own resumption with an "onResume" hook, 
>> which would run when a suspended continuation is invoked instead of 
>> immediately resuming the coroutine. That would let the coroutine-aware 
>> dispatch_async example from above do something like this, to ensure the 
>> continuation always ends up back on the correct queue:
>> 
>> extension DispatchQueue {
>> func `async`(_ body: () async -> ()) {
>>   dispatch_async(self, {
>>     beginAsync(
>>       context: self,
>>       body: { await body() },
>>       onResume: { continuation in
>>         // Defensively hop to the right queue
>>         dispatch_async(self, continuation)
>>       }
>>     )
>>   })
>> }
>> }
>> 
>> This would let spawning APIs provide a stronger guarantee that the spawned 
>> coroutine is always executing as if scheduled by a specific 
>> queue/actor/event loop/HWND/etc., even if later suspended by an async API 
>> working in a different paradigm. This would also let you more strongly 
>> associate a coroutine with a future object representing its completion:
>> 
>> class CoroutineFuture<T> {
>> enum State {
>>   case busy // currently running
>>   case suspended(() -> ()) // suspended
>>   case success(T) // completed with success
>>   case failure(Error) // completed with error
>> }
>> 
>> var state: State = .busy
>> 
>> init(_ body: () async -> T) {
>> 
>>   beginAsync(
>>     body: {
>>       do {
>>         self.state = .success(await body())
>>       } catch {
>>         self.state = .failure(error)
>>       }
>>     },
>>     onResume: { continuation in
>>       assert(self.state == .busy, "already running?!")
>>       self.state = .suspended(continuation)
>>     }
>>   }
>> }
>> 
>> // Return the result of the future, or try to make progress computing it
>> func poll() throws -> T? {
>>   switch state {
>>   case .busy:
>>     return nil
>>   case .suspended(let cont):
>>     cont()
>>     switch state {
>>     case .success(let value):
>>       return value
>>     case .failure(let error):
>>       throw error
>>     case .busy, .suspended:
>>       return nil
>>     }
>>   case .success(let value):
>>     return value
>>   case .error(let error):
>>     throw error
>> }
>> }
>> 
>> 
>> A downside of this design is that it incurs some cost from defensive 
>> rescheduling on the continuation side, and also prevents writing APIs that 
>> intentionally change context across an `await`, like a theoretical 
>> "goToMainThread()" function (though you could do that by spawning a 
>> semantically-independent coroutine associated with the main thread, which 
>> might be a better design anyway).
> 
> Given the limitations, I'm very skeptical. Also in general 
> suspending/resuming work is very difficult to handle for a runtime 
> (implementation wise), has large memory costs, and breaks priority inversion 
> avoidance. dispatch_suspend()/dispatch_resume() is one of the banes of my 
> existence when it comes to dispatch API surface. It only makes sense for 
> dispatch source "I don't want to receive these events anymore for a while" is 
> a perfectly valid thing to say or do. But suspending a queue or work is 
> ripping the carpet from under the feet of the OS as you just basically make 
> all work that is depending on the suspended one invisible and impossible to 
> reason about.


Sorry, I was using the term 'suspend' somewhat imprecisely. I was specifically 
referring to an operation that semantically pauses the coroutine and gives you 
its continuation closure, to be handed off as a completion handler or something 
of that sort, not something that would block the thread or suspend the queue. 
Execution would return back up the non-async layer at the point this happens. 

-Joe

> 
> The proper way to do something akin to suspension is really to "fail" your 
> operation with a "You need to redrive me later", or implement an event 
> monitoring system inside the subsystem providing the Actor that wants 
> suspension to have the client handle the redrive/monitoring, this way the 
> priority relationship is established and the OS can reason about it. Said 
> another way, the Actor should fail with an error that gives you some kind of 
> "resume token" that the requestor can hold and redrive according to his own 
> rules and in a way that it is clear he's the waiter. Most of the time 
> suspension() is a waiting-on-behalf-of relationship and this is a bad thing 
> to build (except in priority homogenous environments, which iOS/macOS are 
> *not*).
> 
> Also implementing the state you described requires more synchronization than 
> you want to be useful: if you want to take action after observing a state, 
> then you really really really don't want that state to change while you 
> perform the consequence. the "on$Event" hook approach (which dispatch uses 
> for dispatch sources e.g.) is much better because the ordering and 
> serialization is provided by the actor itself. The only states that are valid 
> to expose as a getter are states that you cannot go back from: succes, 
> failure, error, canceled are all perfectly fine states to expose as getters 
> because they only change state once. .suspended/.busy is not such a thing.
> 
> FWIW dispatch sources, and more importantly dispatch mach channels (which is 
> the private interface that is used to implement XPC Connections) have a 
> design that try really really really hard to not fall into any these 
> pitfalls, are priority inheritance friendly, execute on *distributed* 
> execution contexts, and have a state machine exposed through "on$Event" 
> callbacks. We should benefit from the many years of experience that are 
> condensed in these implementations when thinking about Actors and the 
> primitives they provide.
> 
> -Pierre

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Contextualizing async coroutines

Reply via email to