Re: [Qemu-devel] [PATCH 5/5] icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread

Alex Bennée Mon, 13 Mar 2017 11:16:33 -0700

Paolo Bonzini <pbonz...@redhat.com> writes:

> On 13/03/2017 17:53, Alex Bennée wrote:
>>
>> Paolo Bonzini <pbonz...@redhat.com> writes:
>>
>>> icount has become much slower after tcg_cpu_exec has stopped
>>> using the BQL.  There is also a latent bug that is masked by
>>> the slowness.
>>>
>>> The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL
>>> timer now has to wake up the I/O thread and wait for it.  The rendez-vous
>>> is mediated by the BQL QemuMutex:
>>>
>>> - handle_icount_deadline wakes up the I/O thread with BQL taken
>>> - the I/O thread wakes up and waits on the BQL
>>> - the VCPU thread releases the BQL a little later
>>> - the I/O thread raises an interrupt, which calls qemu_cpu_kick
>>> - the VCPU thread notices the interrupt, takes the BQL to
>>>   process it and waits on it
>>>
>>> All this back and forth is extremely expensive, causing a 6 to 8-fold
>>> slowdown when icount is turned on.
>>>
>>> One may think that the issue is that the VCPU thread is too dependent
>>> on the BQL, but then the latent bug comes in.  I first tried removing
>>> the BQL completely from the x86 cpu_exec, only to see everything break.
>>> The only way to fix it (and make everything slow again) was to add a dummy
>>> BQL lock/unlock pair.
>>>
>>> This is because in -icount mode you really have to process the events
>>> before the CPU restarts executing the next instruction.  Therefore, this
>>> series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in
>>> the vCPU thread when running in icount mode.
>>>
>>> The required changes include:
>>>
>>> - make the timer notification callback wake up TCG's single vCPU thread
>>>   when run from another thread.  By using async_run_on_cpu, the callback
>>>   can override all_cpu_threads_idle() when the CPU is halted.
>>>
>>> - move handle_icount_deadline after qemu_tcg_wait_io_event, so that
>>>   the timer notification callback is invoked after the dummy work item
>>>   wakes up the vCPU thread
>>>
>>> - make handle_icount_deadline run the timers instead of just waking the
>>>   I/O thread.
>>>
>>> - stop processing the timers in the main loop
>>>
>>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
>>> ---
>>>  cpus.c               | 26 +++++++++++++++++++++++---
>>>  include/qemu/timer.h | 24 ++++++++++++++++++++++++
>>>  util/qemu-timer.c    |  4 +++-
>>>  3 files changed, 50 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index 747addd..209c196 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -804,9 +804,23 @@ static void qemu_cpu_kick_rr_cpu(void)
>>>      } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
>>>  }
>>>
>>> +static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
>>> +{
>>> +}
>>> +
>>>  void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
>>>  {
>>> -    qemu_notify_event();
>>> +    if (!use_icount || type != QEMU_CLOCK_VIRTUAL) {
>>> +        qemu_notify_event();
>>> +        return;
>>> +    }
>>> +
>>> +    if (!qemu_in_vcpu_thread() && first_cpu) {
>>> +        /* run_on_cpu will also kick the CPU out of halt so that
>>> +         * handle_icount_deadline runs
>>> +         */
>>> +        async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
>>> +    }
>>
>> This doesn't read quite right - otherwise you get the same effect by
>> calling qemu_cpu_kick(), or even modify and call qemu_cpu_kick_rr_cpu()?
>>
>> The only real effect of having something in the work queue is you run
>> the outside of the loop before returning into the tcg_cpu_exec.
>
> Yes, and "the outside of the loop" here means handle_icount_deadline.
>
> But qemu_cpu_kick_rr_cpu won't signal condition variables, and
> qemu_cpu_kick does but it won't make cpu_thread_is_idle return false.
> Therefore, qemu_tcg_wait_io_event keeps looping.


How about:

  Scheduling work with async_run_on_cpu ensures we exit
  qemu_tcg_wait_io_event if we have halted that
  handle_icount_deadline can run.

>
>>>  }
>>>
>>>  static void kick_tcg_thread(void *opaque)
>>> @@ -1220,12 +1234,15 @@ static int64_t tcg_get_icount_limit(void)
>>>
>>>  static void handle_icount_deadline(void)
>>>  {
>>> +    assert(qemu_in_vcpu_thread());
>>>      if (use_icount) {
>>>          int64_t deadline =
>>>              qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
>>>
>>>          if (deadline == 0) {
>>> +            /* Wake up other AioContexts.  */
>>>              qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>>> +            qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
>>>          }
>>>      }
>>>  }
>>> @@ -1338,6 +1355,11 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>>>          /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
>>>          qemu_account_warp_timer();
>>>
>>> +        /* Run the timers here.  This is much more efficient than
>>> +         * waking up the I/O thread and waiting for completion.
>>> +         */
>>> +        handle_icount_deadline();
>>> +
>>>          if (!cpu) {
>>>              cpu = first_cpu;
>>>          }
>>> @@ -1379,8 +1401,6 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>>>              atomic_mb_set(&cpu->exit_request, 0);
>>>          }
>>>
>>> -        handle_icount_deadline();
>>> -
>>
>> I guess we could just as easily move the handling to after
>> qemu_tcg_wait_io_event(cpu ? cpu : QTAILQ_FIRST(&cpus))?
>>
>> I noticed we still call handle_icount_deadline in the multi-thread case
>> which is probably superfluous unless/until we figure out a way for it to
>> work with MTTCG.
>
> Should I remove the call?  Add an assert(!use_icount)?

Yes pretty much that as an independent patch.

>
>>>          qemu_tcg_wait_io_event(cpu ? cpu : QTAILQ_FIRST(&cpus));
>>>          deal_with_unplugged_cpus();
>>>      }
>>> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
>>> index 1441b42..e1742f2 100644
>>> --- a/include/qemu/timer.h
>>> +++ b/include/qemu/timer.h
>>> @@ -533,6 +533,12 @@ static inline QEMUTimer *timer_new_tl(QEMUTimerList 
>>> *timer_list,
>>>   * Create a new timer and associate it with the default
>>>   * timer list for the clock type @type.
>>>   *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread.  This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>>   * Returns: a pointer to the timer
>>>   */
>>>  static inline QEMUTimer *timer_new(QEMUClockType type, int scale,
>>> @@ -550,6 +556,12 @@ static inline QEMUTimer *timer_new(QEMUClockType type, 
>>> int scale,
>>>   * Create a new timer with nanosecond scale on the default timer list
>>>   * associated with the clock.
>>>   *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread.  This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>>   * Returns: a pointer to the newly created timer
>>>   */
>>>  static inline QEMUTimer *timer_new_ns(QEMUClockType type, QEMUTimerCB *cb,
>>> @@ -564,6 +576,12 @@ static inline QEMUTimer *timer_new_ns(QEMUClockType 
>>> type, QEMUTimerCB *cb,
>>>   * @cb: the callback to call when the timer expires
>>>   * @opaque: the opaque pointer to pass to the callback
>>>   *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread.  This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>>   * Create a new timer with microsecond scale on the default timer list
>>>   * associated with the clock.
>>>   *
>>> @@ -581,6 +599,12 @@ static inline QEMUTimer *timer_new_us(QEMUClockType 
>>> type, QEMUTimerCB *cb,
>>>   * @cb: the callback to call when the timer expires
>>>   * @opaque: the opaque pointer to pass to the callback
>>>   *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread.  This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>>   * Create a new timer with millisecond scale on the default timer list
>>>   * associated with the clock.
>>>   *
>>> diff --git a/util/qemu-timer.c b/util/qemu-timer.c
>>> index dc3181e..82d5650 100644
>>> --- a/util/qemu-timer.c
>>> +++ b/util/qemu-timer.c
>>> @@ -658,7 +658,9 @@ bool qemu_clock_run_all_timers(void)
>>>      QEMUClockType type;
>>>
>>>      for (type = 0; type < QEMU_CLOCK_MAX; type++) {
>>> -        progress |= qemu_clock_run_timers(type);
>>> +        if (qemu_clock_use_for_deadline(type)) {
>>> +            progress |= qemu_clock_run_timers(type);
>>> +        }
>>
>> minor nit: its not really qemu_clock_run_all_timers() now is it. We run
>> all but the virtual timers (if icount is enabled).
>
> Well yeah, it's all those that pass qemu_clock_use_for_deadline.
>
> Paolo

Have you done any testing with record/replay? So far I have one
reproducible run and one failure. However it is not entirely clear to me
how I am meant to cleanly halt and stop a machine so I don't corrupt the
replay log.

--
Alex Bennée

Re: [Qemu-devel] [PATCH 5/5] icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread

Reply via email to