This came up again in a different context today, so I filed a bug to track the investigation (unless you already had one Gab?): crbug.com/879097.
- Sami ke 29. elok. 2018 klo 21.30 Bruce Dawson ([email protected]) kirjoitti: > BTW, another issue which the signal boost can cause is around locks. If > the scheduling thread is holding a lock when it signals that a task is > available and if there aren't enough cores available then the receiving > thread will be boosted, will take the CPU from the scheduling thread, try > to acquire the lock, fail, block on an event, the scheduling thread will > then (one hopes) be scheduled, will release the lock, and then the > receiving thread will wake up, grab the lock, grab the task, and start > running. If we hit this pattern then scheduling a single task can take > three context switches. > > Gabriel and I brainstormed a few ways to investigate both the consequences > of priority boosting and why Chrome does so many context switches. > > On Tue, Aug 28, 2018 at 9:26 AM Sami Kyostila <[email protected]> > wrote: > >> I think I've seen instances of this problem even with the old IPC system: >> the sending thread is likely to get descheduled because the receiving >> thread is woken up before the former finished running. We kicked around an >> idea once about buffering message sends and only flushing them once the >> current task is finished -- maybe it would be time to revisit something >> like that? >> >> - Sami >> >> ke 22. elok. 2018 klo 1.15 Bruce Dawson ([email protected]) >> kirjoitti: >> >>> I've definitely been bitten by this. On one game engine that I worked on >>> they would signal all of the worker threads when a task was ready. Due to >>> the priority boosting all of them would wake up and try to acquire the >>> scheduler lock. The scheduler lock was held by the thread that had signaled >>> all of the worker threads, which was reliably no longer running. And oh, by >>> the way, it was a spin lock, so the main thread couldn't release because it >>> wasn't running. The call to SetEvent() would frequently take 20 ms to >>> return. >>> >>> There were a lot of problems with this: >>> >>> - Don't signal all of your worker threads when you have just one task >>> - Don't use a spin lock >>> >>> In this case the priority raising made the issues critical, but it >>> wasn't the underlying issue. >>> >>> I commented on the bug. I do think this is worth exploring, but there >>> are probably cases where we rely on this priority boost to avoid starvation >>> or improve response times. It's possible that we'd see better results by >>> somehow reducing the number of cross-thread/cross-process messages we send, >>> somehow. >>> >>> Also, note that on systems with enough cores the priority boost can >>> become irrelevant - two communicating threads will migrate to different >>> cores and both will continue running. So, our workstations will behave >>> fundamentally differently from customer machines. Yay. >>> >>> On Mon, Aug 20, 2018 at 4:37 PM Gabriel Charette <[email protected]> >>> wrote: >>> >>>> Hello scheduler devs (and *v8/chromium-mojo* friends -- sorry for >>>> cross-posting; see related note below). >>>> >>>> Some kernels give a boost to a thread when the resource it was waiting >>>> on is signaled (lock, event, pipe, file I/O, etc.). Some platforms document >>>> this >>>> <https://docs.microsoft.com/en-us/windows/desktop/procthread/priority-boosts>; >>>> on others we've anecdotally observed things that make us believe they do. >>>> >>>> I think this might be hurting Chrome's task system. >>>> >>>> The Chrome semantics when signaling a thread is often "hey, you have >>>> work, you should run soon"; not "hey, please do this work ASAP"; I think... >>>> This is certainly the case for TaskScheduler use cases, I'm not so sure >>>> about input use cases (e.g. 16 thread hops to respond to input IIRC; boost >>>> probably helps that chain a lot..?). >>>> But in a case where there are many messages (e.g. *mojo*), this means >>>> many context switches (send one message; switch; process one message; >>>> switch back; etc.). >>>> >>>> https://crbug.com/872248#c4 suggests that MessageLoop::ScheduleWork() >>>> is really expensive (though there may be sampling bias there -- >>>> investigation in progress). >>>> >>>> https://crbug.com/872248 also suggests that the Blink main thread is >>>> descheduled while it's trying to signal workers to help it on a parallel >>>> task (I've observed this first hand when working in *v8* this winter >>>> but didn't know what to think of it then trace1 >>>> <https://drive.google.com/file/d/1YFC8lh67rCEQOMA2_A8i7BlFw_NHkCma/view?usp=sharing> >>>> trace2 >>>> <https://drive.google.com/file/d/1prrkIlNApLNeu-ppL_5PQT8a2opgKubb/view?usp=sharing> >>>> ). >>>> >>>> On Windows we can tweak this with >>>> ::SetProcessPriorityBoost/SetThreadPriorityBoost(). Not sure about POSIX. I >>>> might try to experiment with this (feels scary..!). >>>> >>>> In the meantime I figured it would at least be good to inform all of >>>> you so you no longer scratch your head at these occasional unexplained >>>> latency delays in traces. >>>> >>>> Cheers! >>>> Gab >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scheduler-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiNLNRQiCyv%2BNLU8X7ToQ-s-wRt%2BQgy2B%2BcjwuEKfCu%2B5g%40mail.gmail.com >>> <https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiNLNRQiCyv%2BNLU8X7ToQ-s-wRt%2BQgy2B%2BcjwuEKfCu%2B5g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "scheduler-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiONimMBjicEHqaSh3_ju5ss79d_BWMFV-xB84DyP06BtA%40mail.gmail.com > <https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiONimMBjicEHqaSh3_ju5ss79d_BWMFV-xB84DyP06BtA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
