Re: [swift-corelibs-dev] libdispatch roadmap and api addition proposal

Pierre Habouzit via swift-corelibs-dev Tue, 08 Dec 2015 08:09:13 -0800

-Pierre

> On Dec 8, 2015, at 7:34 AM, Joakim Hassila via swift-corelibs-dev 
> <swift-corelibs-dev@swift.org> wrote:
> 
> Hi Pierre,
> 
> Thanks for the good explanation, will try to respond inline below:
> 
>> On 7 dec. 2015, at 23:10, Pierre Habouzit <phabou...@apple.com 
>> <mailto:phabou...@apple.com>> wrote:
>> 
>> Hi Joakim, Kevin,
>> 
>> [ Full disclosure, I made that reply in rdar://problem/16436943 
>> <rdar://problem/16436943> and your use case was slightly different IIRC but 
>> you’re right it’s a close enough problem ]
>> 
>> Dispatch internally has a notion of something that does almost that, called 
>> _dispatch_barrier_trysync_f[1]. However, it is used internally to serialize 
>> state changes on sources and queues such as setting the target queue or 
>> event handlers.
>> 
>> The problem is that this call bypasses the target queue hierarchy in its 
>> fastpath, which while it’s correct when changing the state of a given source 
>> or queue, is generally the wrong thing to do. Let’s consider this code 
>> assuming the dispatch_barrier_trysync()
>> 
>> 
>>     dispatch_queue_t outer = dispatch_queue_create("outer", NULL);
>>     dispatch_queue_t inner = dispatch_queue_create("inner", NULL);
>>     dispatch_set_target_queue(outer, inner);
>> 
>>     dispatch_async(inner, ^{
>>         // write global state protected by inner
>>     });
>>     dispatch_barrier_trysync(outer, ^{
>>         // write global state protected by inner
>>     });
>> 
>> 
>> Then if it works like the internal version we have today, the code above has 
>> a data race, which we’ll all agree is bad.
>> Or we do an API version that when the queue you do the trysync on is not 
>> targetted at a global root queue always go through async, and that is weird, 
>> because the performance characteristics would completely depend on the 
>> target queue hierarchy, which when layering and frameworks start to be at 
>> play, is a bad characteristic for a good API.
> 
> Yes, we could currently assume that we only targeted a root queue for our use 
> case, so our implementation has this limitation (so it is not a valid general 
> solution as you say). 
> 
> It would perhaps be a bit strange to have different performance 
> characteristics depending on the target queue hierarchy as you say, but there 
> are already some performance differences in actual behavior if using e.g. an 
> overcommit queue vs a non, so perhaps another option would be to have this as 
> an optional queue attribute instead of an additional generic API (queue 
> attribute ’steal calling thread for inline processing of requests if the 
> queue was empty when dispatching’) …?


My point is, adding API to dispatch is not something we do lightly. I’m not 
keen on an interface that only works for base queues. Mac OS and iOS code where 
dispatchy code is pervasive, more than 2 queue deep queues hierarchy is very 
common typically.

>> Or we don’t give up right away when the hierarchy is deep, but then that 
>> means that dispatch_trysync would need to be able to unwind all the locks it 
>> took, and then you have ordering issues because enqueuing that block that 
>> couldn’t run synchronously may end up being after another one and break the 
>> FIFO ordering of queues. Respecting this which is a desired property of our 
>> API and getting an efficient implementation are somehow at odds.
> 
> Yes, agree it is a desirable property of the API to retain the ordering.
> 
>> The other argument against trysync that way, is that during testing trysync 
>> would almost always go through the non contended codepath, and lead 
>> developers to not realize that they should have taken copies of variables 
>> and the like (this is less of a problem on Darwin with obj-c and ARC), but 
>> trysync running on the same thread will hide that. except that once it 
>> starts being contended in production, it’ll bite you hard with memory 
>> corruption everywhere.
> 
> Less of an issue for us as we depend on the _f interfaces throughout due to 
> portability concerns, but fair point.
> 
>> Technically what you’re after is that bringing up a new thread is very 
>> costly and that you’d rather use the one that’s asyncing the request because 
>> it will soon give up control. The wake up of a queue isn’t that expensive, 
>> in the sense that the overhead of dispatch_sync() in terms of memory 
>> barriers and locking is more or less comparable. What’s expensive is 
>> creating a thread to satisfy this enqueue.
> 
> Yes, in fact, bringing up a new thread is so costly that we keep a pool 
> around in the libpwq implementation. Unfortunately we would often see 
> double-digit microsecond latency incurred by this, which is unacceptable for 
> us, so we had to (for some configurations/special deployments) have a 
> dedicated spin thread that will grab the next queue to work on (that cut down 
> the latency with a factor of 10 or so) and the next thread woken from the 
> thread pool would take over a spinner…
> 
>> In my opinion, to get the win you’re after, you’d rather want an async() 
>> version that if it wakes up the target queue hierarchy up to the root then 
>> you  want to have more resistance in bringing up a new thread to satisfy 
>> that request. Fortunately, the overcommit property of queues could be used 
>> by a thread pool to decide to apply that resistance. There are various parts 
>> of the thread pool handling (especially without kernel work queues support) 
>> that could get some love to get these exact benefits without changing the 
>> API.
> 
> That would indeed be a very interesting idea, the problem is that the thread 
> using ‘dispatch_barrier_trysync’ is not returning to the pthread_workqueue 
> pool to grab the next dispatch queue for processing, but is instead going 
> back to block on a syscall (e.g. read() from a socket) - and even the latency 
> to wake up a thread (as is commonly done now) with mutex/condition signaling 
> is way too slow for the use case we have (thus the very ugly workaround with 
> a spin thread for some deployments).
> 
> Essentially, for these kind of operations we really want to avoid all context 
> switches as long as we can keep up with the rate of inbound data, and in 
> general such dynamics would be a nice property to have - if the thread 
> performing the async call was known to always return to the global pwq thread 
> pool, it would be nicely solved by applying resistance as you suggest, the 
> problem is what to do when it gets blocked and you thus get stuck.
> 
> Perhaps we have to live with the limited implementation we have for practical 
> purposes, but I have the feeling that the behavior we are after would be 
> useful for other use cases, perhaps the queue attribute suggested above could 
> be another way of expressing it without introducing new dispatch API. 

I completely agree with you, but I think that the way to address this is by 
making the thread pool smarter, not having the developper have to sprinkle his 
code with dispatch_barrier_trysync() where he feels like it. Using it properly 
require a deep understanding of the implementation of dispatch he’s using and 
changes on each platform / version combination. that’s not really the kind of 
interface we want to build.

“overcommit” is exactly the hint you’re after as far as the queue is concerned. 
It means “if I’m woken up, bring up a new thread provided it doesn’t blow up 
the system, no matter what”. So make your queue non overcommit by targetting it 
manually to dispatch_get_global_queue(0, 0) (that one isn’t overcommit), and 
make the thread pool smarter. That’s the right way to go and the 
design-compatible way to do it.

If your thread block in read() then I would argue that it should use a READ 
dispatch source instead, that way, the source would get enqueued *after* your 
async and you can ping pong. Doing blocking read()s is not dispatchy at all and 
will cause you all sorts of problems like that one, because re-async doesn’t 
work for you.

-Pierre

_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev

Re: [swift-corelibs-dev] libdispatch roadmap and api addition proposal

Reply via email to