Re: [swift-evolution] [Concurrency] async/await + actors

Pierre Habouzit via swift-evolution Sat, 02 Sep 2017 12:21:03 -0700

> On Sep 2, 2017, at 11:15 AM, Chris Lattner <clatt...@nondot.org> wrote:
> 
> On Aug 31, 2017, at 7:24 PM, Pierre Habouzit <pie...@habouzit.net 
> <mailto:pie...@habouzit.net>> wrote:
>> 
>> I fail at Finding the initial mail and am quite late to the party of 
>> commenters, but there are parts I don't undertsand or have questions about.
>> 
>> Scalable Runtime
>> 
>> [...]
>> 
>> The one problem I anticipate with GCD is that it doesn't scale well enough: 
>> server developers in particular will want to instantiate hundreds of 
>> thousands of actors in their application, at least one for every incoming 
>> network connection. The programming model is substantially harmed when you 
>> have to be afraid of creating too many actors: you have to start aggregating 
>> logically distinct stuff together to reduce # queues, which leads to 
>> complexity and loses some of the advantages of data isolation.
>> 
>> 
>> What do you mean by this?
> 
> My understanding is that GCD doesn’t currently scale to 1M concurrent queues 
> / tasks.


It completely does provided these 1M queues / tasks are organized on several 
well known independent contexts.
The place where GCD "fails" at is that if you target your individual serial 
queues to the global concurrent queues (a.k.a. root queues) which means "please 
pool, do your job", then yes it doesn't scale, because we take these individual 
serial queues as proxies for OS threads.

If however you target these queues to either:
- new serial queues to segregate your actors per subsystem yourself
- or some more constrained pool than what the current GCD runtime offers (where 
we don't create threads to run your work nearly as eagerly)

Then I don't see why the current implementation of GCD wouldn't scale.

> 
>> queues are serial/exclusive execution contexts, and if you're not modeling 
>> actors as being serial queues, then these two concepts are just disjoint. 
> 
> AFAICT, the “one queue per actor” model is the only one that makes sense.  It 
> doesn’t have to be FIFO, but it needs to be some sort of queue.  If you allow 
> servicing multiple requests within the actor at a time, then you lose the 
> advantages of “no shared mutable state”.

I agree, I don't quite care about how the actor is implemented here, what I 
care about is where it runs onto. my wording was poor, what I really meant is:

queues at the bottom of a queue hierarchy are serial/exclusive execution 
contexts, and if you're not modeling actors as being such fully independent 
serial queues, then these two concepts are just disjoint.

In GCD there's a very big difference between the one queue at the root of your 
graph (just above the thread pool) and any other that is within. The number 
that doesn't scale is the number of the former contexts, not the latter.

The pushback I have here is that today Runloops and dispatch queues on 
iOS/macOS are already systems that have huge impedance mismatches, and do not 
share the resources either (in terms of OS physical threads). I would hate for 
us to bring on ourselves the pain of creating a third completely different 
system that is using another way to use threads. When these 3 worlds would 
interoperate this would cause significant amount of context switches just to 
move across the boundaries.

"GCD Doesn't scale so let's build something new" will only create pain, we need 
a way for actors to inherently run on a thread pool that is shared with 
dispatch and that dispatch can reason about and vice versa, and where the Swift 
runtime gives enough information for GCD for it to execute the right work at 
the right time.

I'd like to dive and debunk this "GCD doesn't scale" point, that I'd almost 
call a myth (and I'm relatively unhappy to see these words in your proposal TBH 
because they send the wrong message).

Way before I started working on it, probably to ease adoption, the decision was 
made that it was ok to write code such as this and have it run without problems 
(FSVO without problems):

dispatch_queue_t q = ...;
dispatch_semaphore_t sema = dispatch_semaphore_create(0);
dispatch_async(q, ^{ dispatch_semaphore_signal(sema); });
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER);


To accommodate for this we when the caller of this code blocks a worker thread, 
then the kernel will notice your level of concurrency dropped and will bring up 
a new thread for you. This thread will likely be the one that picks up `q` that 
got woken up by this async, and will unblock the caller.

If you never write such horrible code, then GCD scales *just fine*. The real 
problem is that if you go async you need to be async all the way. Node.js and 
other similar projects have understood that a very long time ago. If you 
express dependencies between asynchronous execution context with a blocking 
relationship such as above, then you're just committing performance suicide. 
GCD handles this by adding more threads and overcommitting the system, my 
understanding is that your proposal is to instead livelock.



My currently not very well formed opinion on this subject is that GCD queues 
are just what you need with these possibilities:
- this Actor queue can be targeted to other queues by the developer when he 
means for these actor to be executed in an existing execution context / locking 
domain,
- we disallow Actors to be directly targeted to GCD global concurrent queues 
ever
- for the other ones we create a new abstraction with stronger and better 
guarantees (typically limiting the number of possible threads servicing actors 
to a low number, not greater than NCPU).

I think this aligns with your idea, in the sense that if you exhaust the Swift 
Actor Thread Pool, then you're screwed forever. But given that the pattern 
above can be hidden inside framework code that the developer has *no control 
over*, it is fairly easy to write actors that eventually through the said 
framework, would result in this synchronization pattern happening. Even if we 
can build the amazing debugging tools that make these immediately obvious to 
the developer (as in understanding what is happening), I don't know how the 
developer can do *anything* to work around these. The only solution is to fix 
the frameworks. However the experience of the last few years of maintaining GCD 
shows that the patterns above are not widely perceived as a dramatic design 
issue, let alone a bug. It will be a very long road before most framework code 
there is out there is Swift Actor async/await safe.

What is your proposal to address this? that we annotate functions that are 
unsafe? And then, assuming we succeed at this Herculean task, what can 
developers do anyway about it if the only way to do a thing is async/await 
unsafe ?

Note that synchronously waiting is not necessarily all bad. Any waiting that is 
waiting because something else is already happening and can make forward 
progress on its own (transitively so through any such blocking relationship) is 
100% fine. Typically to take an example that is widely misunderstood, sync IPC 
is not bad, because it is about making another security domain perform work on 
your behalf, this other process is using the resources you're leaving free as a 
result of your blocking and this is fine. The problematic blockings are the 
ones for which the system cannot identify the work that you're waiting on and 
that may require allocation of constrained extra resources (such as a thread in 
the pool) to be able to be run.


>> Actors are the way you present the various tasks/operations/activities that 
>> you schedule. These contexts are a way for the developer to explain which 
>> things are related in a consistent system, and give them access to state 
>> which is local to this context (whether it's TSD for threads, or queue 
>> specific data, or any similar context),
> 
> Just MHO, but I don’t think you’d need or want the concept of “actor local 
> data” in the sense of TLS (e.g. __thread).  All actor methods have a ‘self’ 
> already, and having something like TLS strongly encourages breaking the 
> model.  To me, the motivation for TLS is to provide an easier way to migrate 
> single-threaded global variables, when introducing threading into legacy code.

I disagree, if you have an execution context that is "my database subsystem", 
it probably has an object that knows about all your database handles. Or do you 
suggest that your database subsystem is an actor too? I don't quite see the 
database subsystem as an actor in the sense that it's a strongly exclusive 
execution context (if your database is SQLite) and will really receive actors 
to execute, providing them a home.

You can obviously model this as "the database is an actor itself", and have the 
queue of other actors that only do database work target this database actor 
queue, but while this looks very appealing, in practice this creates a system 
which is hard to understand for developers. Actors talking to each 
other/messaging each other is fine. Actors nesting their execution inside each 
other is not because the next thing people will then ask from such a system is 
a way to execute code from the outer actor when in the context of the inner 
Actor, IOW what a recursive mutex is to a mutex, but for the Actor queue. This 
obvious has all the terrible issues of recursive locks where you think you hold 
the lock for the first time and expect your object invariants to be valid, 
except that you're really in a nested execution and see broken invariants from 
the outer call and this creates terribly hard bugs to find.

Do I make sense?

> This is not a problem we need or want to solve, given programmers would be 
> rewriting their algorithm anyway to get it into the actor model.
> 
>> IMO, Swift as a runtime should define what an execution context is, and be 
>> relatively oblivious of which context it is exactly as long it presents a 
>> few common capabilities:
>> - possibility to schedule work (async)
>> - have a name
>> - be an exclusion context
>> - is an entity the kernel can reason about (if you want to be serious about 
>> any integration on a real operating system with priority inheritance and 
>> complex issues like this, which it is the OS responsibility to handle and 
>> not the language)
>> - ...
>> 
>> In that sense, whether your execution context is:
>> - a dispatch serial queue
>> - a CFRunloop
>> - a libev/libevent/... event loop
>> - your own hand rolled event loop
> 
> Generalizing the approach is completely possible, but it is also possible to 
> introduce a language abstraction that is “underneath” the high level event 
> loops.  That’s what I’m proposing.

I'm not sure I understand what you mean here, can you elaborate?

>> Design sketch for interprocess and distributed compute
>> 
>> [...]
>> 
>> One of these principles is the concept of progressive disclosure of 
>> complexity <https://en.wikipedia.org/wiki/Progressive_disclosure>: a Swift 
>> developer shouldn't have to worry about IPC or distributed compute if they 
>> don't care about it.
>> 
>> 
>> While I agree with the sentiment, I don't think that anything useful can be 
>> done without "distributed" computation. I like the loadResourceFromTheWeb 
>> example, as we have something like this on our platform, which is the 
>> NSURLSession APIs, or the CloudKit API Surface, that are about fetching some 
>> resource from a server (URL or CloudKit database records). However, they 
>> don't have a single result, they have:
>> 
>> - progress notification callbacks
>> - broken down notifications for the results (e.g. headers first and body 
>> second, or per-record for CloudKit operations)
>> - various levels of error reporting.
> 
> I don’t understand the concern about this.  If you want low level control 
> like this, it is quite easy to express that.  However, it is also quite 
> common to just want to say “load a URL with this name”, which is super easy 
> and awesome with async/await.
> 
>> I expect most developers will have to use such a construct, and for these, 
>> having a single async pivot in your code that essentially fully serializes 
>> your state machine on getting a full result from the previous step to be 
>> lacking. 
> 
> Agreed, the examples are not trying to show that.  It is perfectly fine to 
> pass in additional callbacks (or delegates, etc) to async methods, which 
> would be a natural way to express this… just like the current APIs do.
> 
>> Delivering all these notifications on the context of the initiator would be 
>> quite inefficient as clearly there are in my example above two very 
>> different contexts, and having to hop through one to reach the other would 
>> make this really terrible for the operating system. I also don't understand 
>> how such operations would be modeled in the async/await world to be 
>> completely honest.
> 
> The proposal isn’t trying to address this problem, because Swift already has 
> ways to do it.
> 
> -Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Concurrency] async/await + actors

Reply via email to