when kernel-irqchip=split is used, QEMU still hits BQL contention issue when reading ACPI PM/HPET timers (despite of timer[s] access being lock-less).
So Windows with more than 255 cpus is still not able to boot (since it requires iommu -> split irqchip). Problematic path is in kvm_arch_pre_run() where BQL is taken unconditionally when split irqchip is in use. There are a few parts that BQL protects there: 1. interrupt check and injecting however we do not take BQL when checking for pending interrupt (even within the same function), so the patch takes the same approach for cpu->interrupt_request checks and takes BQL only if there is a job to do. 2. request_interrupt_window access CPUState::kvm_run::request_interrupt_window doesn't need BQL as it's accessed by its own vCPU thread. 3. cr8/cpu_get_apic_tpr access the same (as #2) applies to CPUState::kvm_run::cr8, and APIC registers are also cached/synced (get/put) within the vCPU thread it belongs to. Taking BQL only when is necessary, eleminates BQL bottleneck on IO/MMIO only exit path, improoving latency by 80% on HPET micro benchmark. This lets Windows to boot succesfully (in case hv-time isn't used) when more than 255 vCPUs are in use. Signed-off-by: Igor Mammedov <imamm...@redhat.com> --- v3: * drop net needed pair of () in cpu->interrupt_request & CPU_INTERRUPT_HARD check * Paolo Bonzini <pbonz...@redhat.com> * don't take BQL when setting exit_request, use qatomic_set() instead * after above simplification take/release BQL unconditionally * drop smp_mb() after run->cr8/run->request_interrupt_window update --- target/i386/kvm/kvm.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index a7b5c8f81b..306430a052 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -5478,9 +5478,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) } } - if (!kvm_pic_in_kernel()) { - bql_lock(); - } /* Force the VCPU out of its inner loop to process any INIT requests * or (for userspace APIC, but it is cheap to combine the checks here) @@ -5489,10 +5486,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) { if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) && !(env->hflags & HF_SMM_MASK)) { - cpu->exit_request = 1; + qatomic_set(&cpu->exit_request, 1); } if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) { - cpu->exit_request = 1; + qatomic_set(&cpu->exit_request, 1); } } @@ -5503,6 +5500,8 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) (env->eflags & IF_MASK)) { int irq; + bql_lock(); + cpu->interrupt_request &= ~CPU_INTERRUPT_HARD; irq = cpu_get_pic_interrupt(env); if (irq >= 0) { @@ -5517,6 +5516,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) strerror(-ret)); } } + bql_unlock(); } /* If we have an interrupt but the guest is not ready to receive an @@ -5531,8 +5531,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) DPRINTF("setting tpr\n"); run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state); - - bql_unlock(); } } -- 2.47.1