On 14/8/25 18:05, Igor Mammedov wrote:
when kernel-irqchip=split is used, QEMU still hits BQL
contention issue when reading ACPI PM/HPET timers
(despite of timer[s] access being lock-less).

So Windows with more than 255 cpus is still not able to
boot (since it requires iommu -> split irqchip).

Problematic path is in kvm_arch_pre_run() where BQL is taken
unconditionally when split irqchip is in use.

There are a few parts that BQL protects there:
   1. interrupt check and injecting

     however we do not take BQL when checking for pending
     interrupt (even within the same function), so the patch
     takes the same approach for cpu->interrupt_request checks
     and takes BQL only if there is a job to do.

   2. request_interrupt_window access
       CPUState::kvm_run::request_interrupt_window doesn't need BQL
       as it's accessed by its own vCPU thread.

   3. cr8/cpu_get_apic_tpr access
       the same (as #2) applies to CPUState::kvm_run::cr8,
       and APIC registers are also cached/synced (get/put) within
       the vCPU thread it belongs to.

Taking BQL only when is necessary, eleminates BQL bottleneck on
IO/MMIO only exit path, improoving latency by 80% on HPET micro
benchmark.

This lets Windows to boot succesfully (in case hv-time isn't used)
when more than 255 vCPUs are in use.

Signed-off-by: Igor Mammedov <imamm...@redhat.com>
Reviewed-by: Peter Xu <pet...@redhat.com>
---
v3:
   * drop net needed pair of () in cpu->interrupt_request & CPU_INTERRUPT_HARD
     check
   * Paolo Bonzini <pbonz...@redhat.com>
      * don't take BQL when setting exit_request, use qatomic_set() instead
      * after above simplification take/release BQL unconditionally
      * drop smp_mb() after run->cr8/run->request_interrupt_window update
---
  target/i386/kvm/kvm.c | 12 +++++-------
  1 file changed, 5 insertions(+), 7 deletions(-)


      /* Force the VCPU out of its inner loop to process any INIT requests
       * or (for userspace APIC, but it is cheap to combine the checks here)
@@ -5489,10 +5486,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run 
*run)
      if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
          if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
              !(env->hflags & HF_SMM_MASK)) {
-            cpu->exit_request = 1;
+            qatomic_set(&cpu->exit_request, 1);
          }
          if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
-            cpu->exit_request = 1;
+            qatomic_set(&cpu->exit_request, 1);
          }
      }

Interesting. IMHO to avoid future similar problems, we should declare
CPUState::exit_request a "private" field and access it via a helper,
where atomicity is enforced.

The only non-accelerator use is in PPC sPAPR:

hw/ppc/spapr_hcall.c:512:        cs->exit_request = 1;
hw/ppc/spapr_hcall.c:534:    cs->exit_request = 1;
hw/ppc/spapr_hcall.c:627:    cs->exit_request = 1;

FYI last week we noticed a similar problem with CPUState::thread_kicked
when using HVF and I'll post in a few days a series containing this
change:

-- >8 --
diff --git a/system/cpus.c b/system/cpus.c
index 26fb3bd69c3..39daf85bae7 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -464,10 +464,10 @@ void qemu_wait_io_event(CPUState *cpu)

 void cpus_kick_thread(CPUState *cpu)
 {
-    if (cpu->thread_kicked) {
+    if (qatomic_read(&cpu->thread_kicked)) {
         return;
     }
-    cpu->thread_kicked = true;
+    qatomic_set(&cpu->thread_kicked, true);

---

It only affects HVF because this is the single access to thread_kicked
out of accelerator code:

$ git grep -w thread_kicked
include/hw/core/cpu.h:484:    bool thread_kicked;
system/cpus.c:449:    qatomic_set_mb(&cpu->thread_kicked, false);
system/cpus.c:476:    if (cpu->thread_kicked) {
system/cpus.c:479:    cpu->thread_kicked = true;
target/arm/hvf/hvf.c:1825:    qatomic_set_mb(&cpu->thread_kicked, false);

(Call introduced in commit 219c101fa7f "arm/hvf: Add a WFI handler").

Reply via email to