[v8-dev] Re: Kernel signaling boosts are potentially hurting Chrome

Sami Kyostila Thu, 30 Aug 2018 03:38:06 -0700

This came up again in a different context today, so I filed a bug to track
the investigation (unless you already had one Gab?): crbug.com/879097.


- Sami

ke 29. elok. 2018 klo 21.30 Bruce Dawson ([email protected])
kirjoitti:

> BTW, another issue which the signal boost can cause is around locks. If
> the scheduling thread is holding a lock when it signals that a task is
> available and if there aren't enough cores available then the receiving
> thread will be boosted, will take the CPU from the scheduling thread, try
> to acquire the lock, fail, block on an event, the scheduling thread will
> then (one hopes) be scheduled, will release the lock, and then the
> receiving thread will wake up, grab the lock, grab the task, and start
> running. If we hit this pattern then scheduling a single task can take
> three context switches.
>
> Gabriel and I brainstormed a few ways to investigate both the consequences
> of priority boosting and why Chrome does so many context switches.
>
> On Tue, Aug 28, 2018 at 9:26 AM Sami Kyostila <[email protected]>
> wrote:
>
>> I think I've seen instances of this problem even with the old IPC system:
>> the sending thread is likely to get descheduled because the receiving
>> thread is woken up before the former finished running. We kicked around an
>> idea once about buffering message sends and only flushing them once the
>> current task is finished -- maybe it would be time to revisit something
>> like that?
>>
>> - Sami
>>
>> ke 22. elok. 2018 klo 1.15 Bruce Dawson ([email protected])
>> kirjoitti:
>>
>>> I've definitely been bitten by this. On one game engine that I worked on
>>> they would signal all of the worker threads when a task was ready. Due to
>>> the priority boosting all of them would wake up and try to acquire the
>>> scheduler lock. The scheduler lock was held by the thread that had signaled
>>> all of the worker threads, which was reliably no longer running. And oh, by
>>> the way, it was a spin lock, so the main thread couldn't release because it
>>> wasn't running. The call to SetEvent() would frequently take 20 ms to
>>> return.
>>>
>>> There were a lot of problems with this:
>>>
>>>    - Don't signal all of your worker threads when you have just one task
>>>    - Don't use a spin lock
>>>
>>> In this case the priority raising made the issues critical, but it
>>> wasn't the underlying issue.
>>>
>>> I commented on the bug. I do think this is worth exploring, but there
>>> are probably cases where we rely on this priority boost to avoid starvation
>>> or improve response times. It's possible that we'd see better results by
>>> somehow reducing the number of cross-thread/cross-process messages we send,
>>> somehow.
>>>
>>> Also, note that on systems with enough cores the priority boost can
>>> become irrelevant - two communicating threads will migrate to different
>>> cores and both will continue running. So, our workstations will behave
>>> fundamentally differently from customer machines. Yay.
>>>
>>> On Mon, Aug 20, 2018 at 4:37 PM Gabriel Charette <[email protected]>
>>> wrote:
>>>
>>>> Hello scheduler devs (and *v8/chromium-mojo* friends -- sorry for
>>>> cross-posting; see related note below).
>>>>
>>>> Some kernels give a boost to a thread when the resource it was waiting
>>>> on is signaled (lock, event, pipe, file I/O, etc.). Some platforms document
>>>> this
>>>> <https://docs.microsoft.com/en-us/windows/desktop/procthread/priority-boosts>;
>>>> on others we've anecdotally observed things that make us believe they do.
>>>>
>>>> I think this might be hurting Chrome's task system.
>>>>
>>>> The Chrome semantics when signaling a thread is often "hey, you have
>>>> work, you should run soon"; not "hey, please do this work ASAP"; I think...
>>>> This is certainly the case for TaskScheduler use cases, I'm not so sure
>>>> about input use cases (e.g. 16 thread hops to respond to input IIRC; boost
>>>> probably helps that chain a lot..?).
>>>> But in a case where there are many messages (e.g. *mojo*), this means
>>>> many context switches (send one message; switch; process one message;
>>>> switch back; etc.).
>>>>
>>>> https://crbug.com/872248#c4 suggests that MessageLoop::ScheduleWork()
>>>> is really expensive (though there may be sampling bias there --
>>>> investigation in progress).
>>>>
>>>> https://crbug.com/872248 also suggests that the Blink main thread is
>>>> descheduled while it's trying to signal workers to help it on a parallel
>>>> task (I've observed this first hand when working in *v8* this winter
>>>> but didn't know what to think of it then trace1
>>>> <https://drive.google.com/file/d/1YFC8lh67rCEQOMA2_A8i7BlFw_NHkCma/view?usp=sharing>
>>>>  trace2
>>>> <https://drive.google.com/file/d/1prrkIlNApLNeu-ppL_5PQT8a2opgKubb/view?usp=sharing>
>>>> ).
>>>>
>>>> On Windows we can tweak this with
>>>> ::SetProcessPriorityBoost/SetThreadPriorityBoost(). Not sure about POSIX. I
>>>> might try to experiment with this (feels scary..!).
>>>>
>>>> In the meantime I figured it would at least be good to inform all of
>>>> you so you no longer scratch your head at these occasional unexplained
>>>> latency delays in traces.
>>>>
>>>> Cheers!
>>>> Gab
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "scheduler-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiNLNRQiCyv%2BNLU8X7ToQ-s-wRt%2BQgy2B%2BcjwuEKfCu%2B5g%40mail.gmail.com
>>> <https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiNLNRQiCyv%2BNLU8X7ToQ-s-wRt%2BQgy2B%2BcjwuEKfCu%2B5g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "scheduler-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiONimMBjicEHqaSh3_ju5ss79d_BWMFV-xB84DyP06BtA%40mail.gmail.com
> <https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiONimMBjicEHqaSh3_ju5ss79d_BWMFV-xB84DyP06BtA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[v8-dev] Re: Kernel signaling boosts are potentially hurting Chrome

Reply via email to