Re: [swift-evolution] [Concurrency] async/await + actors

Chris Lattner via swift-evolution Sat, 02 Sep 2017 22:00:09 -0700

On Sep 2, 2017, at 12:19 PM, Pierre Habouzit <pie...@habouzit.net> wrote:
>>> What do you mean by this?
>> 
>> My understanding is that GCD doesn’t currently scale to 1M concurrent queues 
>> / tasks.
> 
> It completely does provided these 1M queues / tasks are organized on several 
> well known independent contexts.


Ok, I stand corrected.  My understanding was that you could run into situations 
where you get stack explosions, fragment your VM and run out of space, but 
perhaps that is a relic of 32-bit systems.

>>> queues are serial/exclusive execution contexts, and if you're not modeling 
>>> actors as being serial queues, then these two concepts are just disjoint. 
>> 
>> AFAICT, the “one queue per actor” model is the only one that makes sense.  
>> It doesn’t have to be FIFO, but it needs to be some sort of queue.  If you 
>> allow servicing multiple requests within the actor at a time, then you lose 
>> the advantages of “no shared mutable state”.
> 
> I agree, I don't quite care about how the actor is implemented here, what I 
> care about is where it runs onto. my wording was poor, what I really meant is:
> 
> queues at the bottom of a queue hierarchy are serial/exclusive execution 
> contexts, and if you're not modeling actors as being such fully independent 
> serial queues, then these two concepts are just disjoint.
> 
> In GCD there's a very big difference between the one queue at the root of 
> your graph (just above the thread pool) and any other that is within. The 
> number that doesn't scale is the number of the former contexts, not the 
> latter.

I’m sorry, but I still don’t understand what you’re getting at here.

> The pushback I have here is that today Runloops and dispatch queues on 
> iOS/macOS are already systems that have huge impedance mismatches, and do not 
> share the resources either (in terms of OS physical threads). I would hate 
> for us to bring on ourselves the pain of creating a third completely 
> different system that is using another way to use threads. When these 3 
> worlds would interoperate this would cause significant amount of context 
> switches just to move across the boundaries.

Agreed, to be clear, I have no objection to building actors on top of (perhaps 
enhanced) GCD queues.  In fact I *hope* that this can work, since it leads to a 
naturally more incremental path forward, which is therefore much more likely to 
actually happen.

> I'd like to dive and debunk this "GCD doesn't scale" point, that I'd almost 
> call a myth (and I'm relatively unhappy to see these words in your proposal 
> TBH because they send the wrong message).

I’m happy to revise the proposal, please let me know what you think makes sense.

> My currently not very well formed opinion on this subject is that GCD queues 
> are just what you need with these possibilities:
> - this Actor queue can be targeted to other queues by the developer when he 
> means for these actor to be executed in an existing execution context / 
> locking domain,
> - we disallow Actors to be directly targeted to GCD global concurrent queues 
> ever
> - for the other ones we create a new abstraction with stronger and better 
> guarantees (typically limiting the number of possible threads servicing 
> actors to a low number, not greater than NCPU).

Is there a specific important use case for being able to target an actor to an 
existing queue?  Are you looking for advanced patterns where multiple actors 
(each providing disjoint mutable state) share an underlying queue? Would this 
be for performance reasons, for compatibility with existing code, or something 
else?

I don’t see a problem with disallowing actors on the global concurrent queues 
in general, but I do think it makes sense to be able to provide an abstraction 
for homing code on the main thread/queue/actor somehow. 

> I think this aligns with your idea, in the sense that if you exhaust the 
> Swift Actor Thread Pool, then you're screwed forever. But given that the 
> pattern above can be hidden inside framework code that the developer has *no 
> control over*, it is fairly easy to write actors that eventually through the 
> said framework, would result in this synchronization pattern happening. Even 
> if we can build the amazing debugging tools that make these immediately 
> obvious to the developer (as in understanding what is happening), I don't 
> know how the developer can do *anything* to work around these. The only 
> solution is to fix the frameworks. However the experience of the last few 
> years of maintaining GCD shows that the patterns above are not widely 
> perceived as a dramatic design issue, let alone a bug. It will be a very long 
> road before most framework code there is out there is Swift Actor async/await 
> safe.
> 
> What is your proposal to address this? that we annotate functions that are 
> unsafe? And then, assuming we succeed at this Herculean task, what can 
> developers do anyway about it if the only way to do a thing is async/await 
> unsafe ?

I don’t think that annotations are the right way to go.  It should be an 
end-goal for the system to be almost completely actor safe, so the parity of 
the annotation would have to be “this code is unsafe”.  Given that, I don’t see 
how we could audit the entire world, and I don’t think that an annotation 
explosion would be acceptable this late in Swift’s lifetime.  It would be like 
IUO-everywhere problem in the Swift 1 betas.

My preferred solution is three-fold:
 - Make frameworks incrementally actor safe over time.  Ensure that new APIs 
are done right, and make sure that no existing APIs ever go from “actor safe” 
to “actor unsafe”.
 - Provide a mechanism that developers can use to address problematic APIs that 
they encounter in practice.  It should be something akin to “wrap your calls in 
a closure and pass it to a special GCD function”, or something else of similar 
complexity.
- Continue to improve perf and debugger tools to help identify problematic 
cases that occur in practice.

This would ensure that developers can always “get their job done”, but also 
provide a path where we can incrementally improve the world over the course of 
years (if necessary).

>>> Actors are the way you present the various tasks/operations/activities that 
>>> you schedule. These contexts are a way for the developer to explain which 
>>> things are related in a consistent system, and give them access to state 
>>> which is local to this context (whether it's TSD for threads, or queue 
>>> specific data, or any similar context),
>> 
>> Just MHO, but I don’t think you’d need or want the concept of “actor local 
>> data” in the sense of TLS (e.g. __thread).  All actor methods have a ‘self’ 
>> already, and having something like TLS strongly encourages breaking the 
>> model.  To me, the motivation for TLS is to provide an easier way to migrate 
>> single-threaded global variables, when introducing threading into legacy 
>> code.
> 
> I disagree, if you have an execution context that is "my database subsystem", 
> it probably has an object that knows about all your database handles. Or do 
> you suggest that your database subsystem is an actor too? I don't quite see 
> the database subsystem as an actor in the sense that it's a strongly 
> exclusive execution context (if your database is SQLite) and will really 
> receive actors to execute, providing them a home.

I haven’t spent a lot of thought on this, but I tend to think that “database as 
an actor” fits with the programming model that I’m advocating for.  The problem 
(as I think you’re getting at) is that you don’t want serialization at the API 
level of the database, you want to allow the database itself to have multiple 
threads running around in its process.

This is similar in some ways to your NIC example from before, where I think you 
want one instance of a NIC actor for each piece of hardware you have… but you 
want multithreaded access.

One plausible way to model this is to say that it is a “multithreaded actor” of 
some sort, where the innards of the actor allow arbitrary number of client 
threads to call into it concurrently.  The onus would be on the implementor of 
the NIC or database to implement the proper synchronization on the mutable 
state within the actor.

I think that something like this is attractive because it provides key things 
that I value highly in a concurrency model:

- The programmer has a natural way to model things: “a instance of a database 
is a thing", and therefore should be modeled as an instance of an actor.  The 
fact that the actor can handle multiple concurrent requests is an 
implementation detail the clients shouldn’t have to be rewritten to understand.

- Making this non-default would provide proper progressive disclosure of 
complexity.

- You’d still get improved safety and isolation of the system as a whole, even 
if individual actors are “optimized” in this way.

- When incrementally migrating code to the actor model, this would make it much 
easier to provide actor wrappers for existing subsystems built on shared 
mutable state.

- Something like this would also probably be the right abstraction for imported 
RPC services that allow for multiple concurrent synchronous requests.

I’m curious to know what you and others think about this concept.


> You can obviously model this as "the database is an actor itself", and have 
> the queue of other actors that only do database work target this database 
> actor queue, but while this looks very appealing, in practice this creates a 
> system which is hard to understand for developers. Actors talking to each 
> other/messaging each other is fine. Actors nesting their execution inside 
> each other is not because the next thing people will then ask from such a 
> system is a way to execute code from the outer actor when in the context of 
> the inner Actor, IOW what a recursive mutex is to a mutex, but for the Actor 
> queue. This obvious has all the terrible issues of recursive locks where you 
> think you hold the lock for the first time and expect your object invariants 
> to be valid, except that you're really in a nested execution and see broken 
> invariants from the outer call and this creates terribly hard bugs to find.

Yes, I understand the problem of recursive locks, but I don’t see how or why 
you’d want an outer actor to have an inner actor call back to it.  Ideally your 
actors are in the shape of a DAG.  Cycles would be properly modeled with (e.g.) 
weak references, but I think that synchronous/awaited cyclic calls should fail 
fast at runtime.  To be more explicit, I mean something like:

1. Actor A has a reference to actor B.  Actor B has a weak backref to actor A.
2. Actor A does an await on an async actor method on B.  As such, it’s queue is 
blocked: no messages can be run on its queue until the B method returns.
3. Actor B’s method turns around and calls a method on A, awaiting the result.  
Because there is a cyclic wait, we have a deadlock, one which ideally fails 
fast with a trap.

The solution for this is to change the callback to A to be a call an async 
(void returning) actor method.  Such a call would simply enqueue the request on 
A’s queue, and get serviced after the original A call returns.

I wrote this out to make it clear what problem I think you’re talking about.  
If this isn’t what you’re trying to get at, please let me know :-)

>> 
>>> IMO, Swift as a runtime should define what an execution context is, and be 
>>> relatively oblivious of which context it is exactly as long it presents a 
>>> few common capabilities:
>>> - possibility to schedule work (async)
>>> - have a name
>>> - be an exclusion context
>>> - is an entity the kernel can reason about (if you want to be serious about 
>>> any integration on a real operating system with priority inheritance and 
>>> complex issues like this, which it is the OS responsibility to handle and 
>>> not the language)
>>> - ...
>>> 
>>> In that sense, whether your execution context is:
>>> - a dispatch serial queue
>>> - a CFRunloop
>>> - a libev/libevent/... event loop
>>> - your own hand rolled event loop
>> 
>> Generalizing the approach is completely possible, but it is also possible to 
>> introduce a language abstraction that is “underneath” the high level event 
>> loops.  That’s what I’m proposing.
> 
> I'm not sure I understand what you mean here, can you elaborate?

The concept of an event loop is IMO a library abstraction that could be built 
on top of the actor model.  The actor would represent the “context” for the 
concurrency, the event loop API would represent the other stuff.

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Concurrency] async/await + actors

Reply via email to