On Fri, Aug 08, 2025 at 02:01:35PM +0200, Igor Mammedov wrote:
> when kernel-irqchip=split is used, QEMU still hits BQL
> contention issue when reading ACPI PM/HPET timers
> (despite of timer[s] access being lock-less).
> 
> So Windows with more than 255 cpus is still not able to
> boot (since it requires iommu -> split irqchip).
> 
> Problematic path is in kvm_arch_pre_run() where BQL is taken
> unconditionally when split irqchip is in use.
> 
> There are a few parts that BQL protects there:
>   1. interrupt check and injecting
> 
>     however we do not take BQL when checking for pending
>     interrupt (even within the same function), so the patch
>     takes the same approach for cpu->interrupt_request checks
>     and takes BQL only if there is a job to do.
> 
>   2. request_interrupt_window access
>       CPUState::kvm_run::request_interrupt_window doesn't need BQL
>       as it's accessed by its own vCPU thread.
> 
>   3. cr8/cpu_get_apic_tpr access
>       the same (as #2) applies to CPUState::kvm_run::cr8,
>       and APIC registers are also cached/synced (get/put) within
>       the vCPU thread it belongs to.
> 
> Taking BQL only when is necessary, eleminates BQL bottleneck on
> IO/MMIO only exit path, improoving latency by 80% on HPET micro
> benchmark.
> 
> This lets Windows to boot succesfully (in case hv-time isn't used)
> when more than 255 vCPUs are in use.
> 
> Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> ---
> v3:
>   * drop net needed pair of () in cpu->interrupt_request & CPU_INTERRUPT_HARD
>     check
>   * Paolo Bonzini <pbonz...@redhat.com>
>      * don't take BQL when setting exit_request, use qatomic_set() instead
>      * after above simplification take/release BQL unconditionally
>      * drop smp_mb() after run->cr8/run->request_interrupt_window update
> ---
>  target/i386/kvm/kvm.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index a7b5c8f81b..306430a052 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5478,9 +5478,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run 
> *run)
>          }
>      }
>  
> -    if (!kvm_pic_in_kernel()) {
> -        bql_lock();
> -    }
>  
>      /* Force the VCPU out of its inner loop to process any INIT requests
>       * or (for userspace APIC, but it is cheap to combine the checks here)
> @@ -5489,10 +5486,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run 
> *run)
>      if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
>          if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
>              !(env->hflags & HF_SMM_MASK)) {
> -            cpu->exit_request = 1;
> +            qatomic_set(&cpu->exit_request, 1);
>          }
>          if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
> -            cpu->exit_request = 1;
> +            qatomic_set(&cpu->exit_request, 1);

I still see plenty of cases where exit_request will be updated without
qatomic_set().. There's even one in kvm.c (kvm_arch_process_async_events)
which is the same file.

Maybe it would make more sense to move these two changes separately, e.g. a
single patch changing tree-wide exit_request setters to use qatomic_set()?

But not that it's a huge deal..

Reviewed-by: Peter Xu <pet...@redhat.com>

>          }
>      }
>  
> @@ -5503,6 +5500,8 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run 
> *run)
>              (env->eflags & IF_MASK)) {
>              int irq;
>  
> +            bql_lock();
> +
>              cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
>              irq = cpu_get_pic_interrupt(env);
>              if (irq >= 0) {
> @@ -5517,6 +5516,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run 
> *run)
>                              strerror(-ret));
>                  }
>              }
> +            bql_unlock();
>          }
>  
>          /* If we have an interrupt but the guest is not ready to receive an
> @@ -5531,8 +5531,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run 
> *run)
>  
>          DPRINTF("setting tpr\n");
>          run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
> -
> -        bql_unlock();
>      }
>  }
>  
> -- 
> 2.47.1
> 

-- 
Peter Xu


Reply via email to