[Sorry I hit send too fast, let me fix two spots I didn't correct]

> On Sep 2, 2017, at 11:09 PM, Pierre Habouzit <phabou...@apple.com> wrote:
> 
> 
> -Pierre
> 
>> On Sep 2, 2017, at 9:59 PM, Chris Lattner <clatt...@nondot.org 
>> <mailto:clatt...@nondot.org>> wrote:
>> 
>> On Sep 2, 2017, at 12:19 PM, Pierre Habouzit <pie...@habouzit.net 
>> <mailto:pie...@habouzit.net>> wrote:
>>>>> What do you mean by this?
>>>> 
>>>> My understanding is that GCD doesn’t currently scale to 1M concurrent 
>>>> queues / tasks.
>>> 
>>> It completely does provided these 1M queues / tasks are organized on 
>>> several well known independent contexts.
>> 
>> Ok, I stand corrected.  My understanding was that you could run into 
>> situations where you get stack explosions, fragment your VM and run out of 
>> space, but perhaps that is a relic of 32-bit systems.
> 
> a queue on 64bit systems is 128 bytes (nowadays). Provided you have that 
> amount of VM available to you (1M queues is 128M after all) then you're good.
> If a large amount of them fragments the VM beyond this is a malloc/VM bug on 
> 64bit systems that are supposed to have enough address space.
> 
>> 
>>>>> queues are serial/exclusive execution contexts, and if you're not 
>>>>> modeling actors as being serial queues, then these two concepts are just 
>>>>> disjoint. 
>>>> 
>>>> AFAICT, the “one queue per actor” model is the only one that makes sense.  
>>>> It doesn’t have to be FIFO, but it needs to be some sort of queue.  If you 
>>>> allow servicing multiple requests within the actor at a time, then you 
>>>> lose the advantages of “no shared mutable state”.
>>> 
>>> I agree, I don't quite care about how the actor is implemented here, what I 
>>> care about is where it runs onto. my wording was poor, what I really meant 
>>> is:
>>> 
>>> queues at the bottom of a queue hierarchy are serial/exclusive execution 
>>> contexts, and if you're not modeling actors as being such fully independent 
>>> serial queues, then these two concepts are just disjoint.
>>> 
>>> In GCD there's a very big difference between the one queue at the root of 
>>> your graph (just above the thread pool) and any other that is within. The 
>>> number that doesn't scale is the number of the former contexts, not the 
>>> latter.
>> 
>> I’m sorry, but I still don’t understand what you’re getting at here.
> 
> What doesn't scale is asking for threads, not having queues.
> 
>> 
>>> The pushback I have here is that today Runloops and dispatch queues on 
>>> iOS/macOS are already systems that have huge impedance mismatches, and do 
>>> not share the resources either (in terms of OS physical threads). I would 
>>> hate for us to bring on ourselves the pain of creating a third completely 
>>> different system that is using another way to use threads. When these 3 
>>> worlds would interoperate this would cause significant amount of context 
>>> switches just to move across the boundaries.
>> 
>> Agreed, to be clear, I have no objection to building actors on top of 
>> (perhaps enhanced) GCD queues.  In fact I *hope* that this can work, since 
>> it leads to a naturally more incremental path forward, which is therefore 
>> much more likely to actually happen.
> 
> Good :)
> 
>>> I'd like to dive and debunk this "GCD doesn't scale" point, that I'd almost 
>>> call a myth (and I'm relatively unhappy to see these words in your proposal 
>>> TBH because they send the wrong message).
>> 
>> I’m happy to revise the proposal, please let me know what you think makes 
>> sense.
> 
> What doesn't scale is the way GCD asks for threads, which is what the global 
> concurrent queues abstract.
> The way it works (or rather limp along) is what we should not reproduce for 
> Swift.
> 
> What you can write in your proposal and is true is "GCD current relationship 
> with the system threads doesn't scale". It's below the queues that the 
> scalability has issues.
> Dave Z. explained it in a mail earlier today in very good words.
> 
>>> My currently not very well formed opinion on this subject is that GCD 
>>> queues are just what you need with these possibilities:
>>> - this Actor queue can be targeted to other queues by the developer when he 
>>> means for these actor to be executed in an existing execution context / 
>>> locking domain,
>>> - we disallow Actors to be directly targeted to GCD global concurrent 
>>> queues ever
>>> - for the other ones we create a new abstraction with stronger and better 
>>> guarantees (typically limiting the number of possible threads servicing 
>>> actors to a low number, not greater than NCPU).
>> 
>> Is there a specific important use case for being able to target an actor to 
>> an existing queue?  Are you looking for advanced patterns where multiple 
>> actors (each providing disjoint mutable state) share an underlying queue? 
>> Would this be for performance reasons, for compatibility with existing code, 
>> or something else?
> 
> Mostly for interaction with current designs where being on a given bottom 
> serial queue gives you the locking context for resources naturally attached 
> to it.
> 
>> I don’t see a problem with disallowing actors on the global concurrent 
>> queues in general, but I do think it makes sense to be able to provide an 
>> abstraction for homing code on the main thread/queue/actor somehow. 
>> 
>>> I think this aligns with your idea, in the sense that if you exhaust the 
>>> Swift Actor Thread Pool, then you're screwed forever. But given that the 
>>> pattern above can be hidden inside framework code that the developer has 
>>> *no control over*, it is fairly easy to write actors that eventually 
>>> through the said framework, would result in this synchronization pattern 
>>> happening. Even if we can build the amazing debugging tools that make these 
>>> immediately obvious to the developer (as in understanding what is 
>>> happening), I don't know how the developer can do *anything* to work around 
>>> these. The only solution is to fix the frameworks. However the experience 
>>> of the last few years of maintaining GCD shows that the patterns above are 
>>> not widely perceived as a dramatic design issue, let alone a bug. It will 
>>> be a very long road before most framework code there is out there is Swift 
>>> Actor async/await safe.
>>> 
>>> What is your proposal to address this? that we annotate functions that are 
>>> unsafe? And then, assuming we succeed at this Herculean task, what can 
>>> developers do anyway about it if the only way to do a thing is async/await 
>>> unsafe ?
>> 
>> I don’t think that annotations are the right way to go.  It should be an 
>> end-goal for the system to be almost completely actor safe, so the parity of 
>> the annotation would have to be “this code is unsafe”.  Given that, I don’t 
>> see how we could audit the entire world, and I don’t think that an 
>> annotation explosion would be acceptable this late in Swift’s lifetime.  It 
>> would be like IUO-everywhere problem in the Swift 1 betas.
>> 
>> My preferred solution is three-fold:
>>  - Make frameworks incrementally actor safe over time.  Ensure that new APIs 
>> are done right, and make sure that no existing APIs ever go from “actor 
>> safe” to “actor unsafe”.
>>  - Provide a mechanism that developers can use to address problematic APIs 
>> that they encounter in practice.  It should be something akin to “wrap your 
>> calls in a closure and pass it to a special GCD function”, or something else 
>> of similar complexity.
>> - Continue to improve perf and debugger tools to help identify problematic 
>> cases that occur in practice.
>> 
>> This would ensure that developers can always “get their job done”, but also 
>> provide a path where we can incrementally improve the world over the course 
>> of years (if necessary).
> 
> fair enough.
> 
>> 
>>>>> Actors are the way you present the various tasks/operations/activities 
>>>>> that you schedule. These contexts are a way for the developer to explain 
>>>>> which things are related in a consistent system, and give them access to 
>>>>> state which is local to this context (whether it's TSD for threads, or 
>>>>> queue specific data, or any similar context),
>>>> 
>>>> Just MHO, but I don’t think you’d need or want the concept of “actor local 
>>>> data” in the sense of TLS (e.g. __thread).  All actor methods have a 
>>>> ‘self’ already, and having something like TLS strongly encourages breaking 
>>>> the model.  To me, the motivation for TLS is to provide an easier way to 
>>>> migrate single-threaded global variables, when introducing threading into 
>>>> legacy code.
>>> 
>>> I disagree, if you have an execution context that is "my database 
>>> subsystem", it probably has an object that knows about all your database 
>>> handles. Or do you suggest that your database subsystem is an actor too? I 
>>> don't quite see the database subsystem as an actor in the sense that it's a 
>>> strongly exclusive execution context (if your database is SQLite) and will 
>>> really receive actors to execute, providing them a home.
>> 
>> I haven’t spent a lot of thought on this, but I tend to think that “database 
>> as an actor” fits with the programming model that I’m advocating for.  The 
>> problem (as I think you’re getting at) is that you don’t want serialization 
>> at the API level of the database, you want to allow the database itself to 
>> have multiple threads running around in its process.
>> 
>> This is similar in some ways to your NIC example from before, where I think 
>> you want one instance of a NIC actor for each piece of hardware you have… 
>> but you want multithreaded access.
>> 
>> One plausible way to model this is to say that it is a “multithreaded actor” 
>> of some sort, where the innards of the actor allow arbitrary number of 
>> client threads to call into it concurrently.  The onus would be on the 
>> implementor of the NIC or database to implement the proper synchronization 
>> on the mutable state within the actor.
>> 
>> I think that something like this is attractive because it provides key 
>> things that I value highly in a concurrency model:
>> 
>> - The programmer has a natural way to model things: “a instance of a 
>> database is a thing", and therefore should be modeled as an instance of an 
>> actor.  The fact that the actor can handle multiple concurrent requests is 
>> an implementation detail the clients shouldn’t have to be rewritten to 
>> understand.
>> 
>> - Making this non-default would provide proper progressive disclosure of 
>> complexity.
>> 
>> - You’d still get improved safety and isolation of the system as a whole, 
>> even if individual actors are “optimized” in this way.
>> 
>> - When incrementally migrating code to the actor model, this would make it 
>> much easier to provide actor wrappers for existing subsystems built on 
>> shared mutable state.
>> 
>> - Something like this would also probably be the right abstraction for 
>> imported RPC services that allow for multiple concurrent synchronous 
>> requests.
>> 
>> I’m curious to know what you and others think about this concept.
> 
> I think what you said made sense. But it wasn't what I meant. I was really 
> thinking at sqlite where the database is strongly serial (you can't use it in 
> a multi-threaded way well, or rather you can but it has a big lock inside). 
> It is much better to interact with that dude on the same exclusion context 
> all the time. What I meant is really having some actors that have a "strong 
> affinity" with a given execution context which eases the task of the actor 
> scheduler.
> 
> 
> Another problem I haven't touched either is kernel-issued events (inbound IPC 
> from other processes, networking events, etc...). Dispatch for the longest 
> time used an indirection through a manager thread for all such events, and 
> that had two major issues:
> 
> - the thread hops it caused, causing networking workloads to utilize up to 
> 15-20% more CPU time than an equivalent manually made pthread parked in 
> kevent(), because networking pace even when busy idles back all the time as 
> far as the CPU is concerned, so dispatch queues never stay hot, and the 
> context switch is not only a scheduled context switch but also has the cost 
> of a thread bring up
> 
> - if you deliver all possible events this way you also deliver events that 
> cannot possibly make progress because the execution context that will handle 
> them is already "locked" (as in busy running something else.
> 
> It took us several years to get to the point we presented at WWDC this year 
> where we deliver events directly to the right dispatch queue. If you only 
> have very anonymous execution contexts then all this machinery is wasted and 
> unused. However, this machinery has been evaluated and saves full percents of 
> CPU load system-wide. I'd hate for us to go back 5 years here.
> 
> Declaring that an actor targets a given existing serial context also means 
> that if that actor needs to make urgent progress the context in question has 
> to be rushed, and its priority elevated. It's really hard to do the same on 
> an anonymous global context (the way dispatch does it still is to actually 
> enqueue stealer work that try to steal the "actor" at a higher priority. this 
> approach is terribly wasteful).
> 
> 
> I know that I have a hard time getting my point across mostly because I'm not 
> a language design guy, I'm a system design guy, and I clearly don't 
> understand Actors enough to get how to integrate them with the OS. But to me, 
> to go back to the database example, the Database Actor, or the NIC Actor from 
> earlier are different from say a given SQL query or a given HTTP request: the 
> formers are the things the OS needs to know about *in the kernel*, whereas 
> the SQL Query or HTTP Request are merely actors enqueued on the formers. IOW 
> these top-level actors are different, because they are top-level, right atop 
> the kernel/low-level runtime, and this is the thing the kernel has to be able 
> to reason about. This makes them different.
> 
> In dispatch, there are 2 kind of queues at the API level in that regard:
> - the global queues, which aren't queues like the other and really is just an 
> abstraction on top of the thread pool
> - all the other queues that you can target on each other the way you want.
> 
> It is clear today that it was a mistake and that there should have been 3 
> kind of queues:
> - the global queues, which aren't real queues but represent which family of 
> system attributes your execution context requires (mostly priorities), and we 
> should have disallowed enqueuing raw work on it
> - the bottom queues (which GCD since last year tracks and call "bases" in the 
> source code) that are known to the kernel when they have work enqueued
> - any other "inner" queue, which the kernel couldn't care less about
> 
> In dispatch, we regret every passing day that the difference between the 2nd 
> and 3rd group of queues wasn't made clear in the API originally.
> 
> I like to call the 2nd category execution contexts, but I can also see why 
> you want to pass them as Actors, it's probably more uniform (and GCD did the 
> same by presenting them both as queues). Such top-level "Actors" should be 
> few, because if they all become active at once, they will need as many 
> threads in your process, and this is not a resource that scales. This is why 
> it is important to distinguish them. And like we're discussing they usually 
> also wrap some kind of shared mutable state, resource, or similar, which 
> inner actors probably won't do.
> 
> 
>>> You can obviously model this as "the database is an actor itself", and have 
>>> the queue of other actors that only do database work target this database 
>>> actor queue, but while this looks very appealing, in practice this creates 
>>> a system which is hard to understand for developers. Actors talking to each 
>>> other/messaging each other is fine. Actors nesting their execution inside 
>>> each other is not because the next thing people will then ask from such a 
>>> system is a way to execute code from the outer actor when in the context of 
>>> the inner Actor, IOW what a recursive mutex is to a mutex, but for the 
>>> Actor queue. This obvious has all the terrible issues of recursive locks 
>>> where you think you hold the lock for the first time and expect your object 
>>> invariants to be valid, except that you're really in a nested execution and 
>>> see broken invariants from the outer call and this creates terribly hard 
>>> bugs to find.
>> 
>> Yes, I understand the problem of recursive locks, but I don’t see how or why 
>> you’d want an outer actor to have an inner actor call back to it.
> 
> I don't see why you'd need this with dispatch queues either today, however my 
> radar queue disagrees strongly with this statement. People want this all the 
> time, mostly because the outer actor has a state machine and the inner actor 
> wants it to make progress before it continues working.
> 
> In CFRunloop terms it's like running the runloop from your work by calling 
> CFRunloopRun() yourself again until you can observe some event happened.
> 
> It's not great and problematic for tons of reasons. If actors are nested, we 
> need a way to make sure people don't have to ever do something like that.
> 
> Just as a data point, my reactions to this thread yielded a private 
> discussion off list *exactly* about someone wanting us to provide something 
> like this for dispatch (or rather in this instance having the inner actor be 
> able to suspend the outer one, but it's just a corollary / similar layering 
> violation where the inner actor wants to affect the outer one in a way the 
> outer one).

-> in a way the outer one didn't expect

> 
>> Ideally your actors are in the shape of a DAG.  Cycles would be properly 
>> modeled with (e.g.) weak references, but I think that synchronous/awaited 
>> cyclic calls should fail fast at runtime.  To be more explicit, I mean 
>> something like:
>> 
>> 1. Actor A has a reference to actor B.  Actor B has a weak backref to actor 
>> A.
>> 2. Actor A does an await on an async actor method on B.  As such, it’s queue 
>> is blocked: no messages can be run on its queue until the B method returns.
>> 3. Actor B’s method turns around and calls a method on A, awaiting the 
>> result.  Because there is a cyclic wait, we have a deadlock, one which 
>> ideally fails fast with a trap.
>> 
>> The solution for this is to change the callback to A to be a call an async 
>> (void returning) actor method.  Such a call would simply enqueue the request 
>> on A’s queue, and get serviced after the original A call returns.
>> 
>> I wrote this out to make it clear what problem I think you’re talking about. 
>>  If this isn’t what you’re trying to get at, please let me know :-)
> 
> It is definitely the family of problems I'm worried about. I want to make 
> sure we have an holistic approach here because I think that recursive 
> mutexes, recursive CFRunloop runs and similar ideas are flawed and dangerous. 
> I want to make sure we understand 

I want to make sure we understand which limitations we want to impose here, and 
by limitations I really mean layering/architecture rules, and that we document 
them upfront and explain how to work with them.

>> 
>>>> 
>>>>> IMO, Swift as a runtime should define what an execution context is, and 
>>>>> be relatively oblivious of which context it is exactly as long it 
>>>>> presents a few common capabilities:
>>>>> - possibility to schedule work (async)
>>>>> - have a name
>>>>> - be an exclusion context
>>>>> - is an entity the kernel can reason about (if you want to be serious 
>>>>> about any integration on a real operating system with priority 
>>>>> inheritance and complex issues like this, which it is the OS 
>>>>> responsibility to handle and not the language)
>>>>> - ...
>>>>> 
>>>>> In that sense, whether your execution context is:
>>>>> - a dispatch serial queue
>>>>> - a CFRunloop
>>>>> - a libev/libevent/... event loop
>>>>> - your own hand rolled event loop
>>>> 
>>>> Generalizing the approach is completely possible, but it is also possible 
>>>> to introduce a language abstraction that is “underneath” the high level 
>>>> event loops.  That’s what I’m proposing.
>>> 
>>> I'm not sure I understand what you mean here, can you elaborate?
>> 
>> The concept of an event loop is IMO a library abstraction that could be 
>> built on top of the actor model.  The actor would represent the “context” 
>> for the concurrency, the event loop API would represent the other stuff.
> 
> I think by reading your reply I start to grasp a bit better how you see 
> Actors working, indeed. This makes sense to me. Like I said above then, we 
> need to see top-level Actors (that can be equivalent to an OS thread, or 
> whatever concurrent execution context the OS provides for these) being 
> different from other ones. Like Dave Z. was saying earlier, building a 
> language that pretends your resources are infinite is a mistake. We have to 
> force people to think about the cost of their architecture.
> 
> I don't see actors that would be running on a "global pool" kind of thing to 
> be such top-level actors though. But these top-level actors probably need to 
> live outside of the pool: the NIC example is very good for this: if you don't 
> schedule it as soon as it has events, then your networking bandwidth will 
> suffer as things like TCP or SCTP are very sensitive to various timers and 
> delays, so you can't be dependent on a constrained shared resource. (this is 
> exactly why GCD has a notion of overcommit queues which bypass the NCPU 
> limits, sadly in GCD way too many things are overcommit, but this is another 
> discussion to have entirely).

What I realized after send for this is that in your way of reasoning about 
this, the "global pool" Is likely an Actor itself, that allows for concurrency 
and whose interface is about executing other actors that are otherwise not 
related to it.

> 
> I think that what I'm getting at is that we can't build Actors without 
> defining how they interact with the operating system and what guarantees of 
> latency and liveness they have.
> 
> Is this making more sense?
> 
> -Pierre
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution 
> <https://lists.swift.org/mailman/listinfo/swift-evolution>
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to