Re: [Qemu-devel] [PATCH 1/1 V6 resent ] qemu-kvm: fix improper nmi emulation

2011-10-27 Thread Marcelo Tosatti
On Tue, Oct 25, 2011 at 05:55:28PM +0800, Lai Jiangshan wrote:
 Previous discussions:
  
   Which approach you prefer to?
   I need to know the result before wasting too much time to respin
   the approach.
   
   Yes, sorry about the slow and sometimes conflicting feedback.
   
   1) Fix KVM_NMI emulation approach  (which is v3 patchset)
   - It directly fixes the problem and matches the
 real hard ware more, but it changes KVM_NMI bahavior.
   - Require both kernel-site and userspace-site fix.
  
   2) Get the LAPIC state from kernel irqchip, and inject NMI if it is 
   allowed
  (which is v4 patchset)
   - Simple, don't changes any kernel behavior.
   - Only need the userspace-site fix
  
   3) Add KVM_SET_LINT1 approach (which is v5 patchset)
   - don't changes the kernel's KVM_NMI behavior.
   - much complex
   - Require both kernel-site and userspace-site fix.
   - userspace-site should also handle the !KVM_SET_LINT1
 condition, it uses all the 2) approach' code. it means
 this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
  
   This is an urgent bug of us, we need to settle it down soo
   
   While (1) is simple, it overloads a single ioctl with two meanings,
   that's not so good.
   
   Whether we do (1) or (3), we need (2) as well, for older kernels.
   
   So I recommend first focusing on (2) and merging it, then doing (3).
   
   (note an additional issue with 3 is whether to make it a vm or vcpu
   ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
   choice).
   
 It is the 2) approach.
 It only changes the user space site, the kernel site is not touched.
 It is changed from previous v4 patch, fixed problems found by Jan.
 end previous discussions
 
 
 From: Lai Jiangshan la...@cn.fujitsu.com
 
 
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is maskied in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
 
 With this patch, inject-nmi request is handled as follows.
 
 - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
   interrupt.
 - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
   delivering the NMI directly. (Suggested by Jan Kiszka)
 
 Changed from old version:
   re-implement it by the Jan's suggestion.
   fix the race found by Jan.
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 Acked-by: Avi Kivity a...@redhat.com
 Acked-by: Jan Kiszka jan.kis...@web.de

Please rebase.




[Qemu-devel] [PATCH 1/1 V6 resent ] qemu-kvm: fix improper nmi emulation

2011-10-25 Thread Lai Jiangshan
Previous discussions:
 
  Which approach you prefer to?
  I need to know the result before wasting too much time to respin
  the approach.
  
  Yes, sorry about the slow and sometimes conflicting feedback.
  
  1) Fix KVM_NMI emulation approach  (which is v3 patchset)
- It directly fixes the problem and matches the
  real hard ware more, but it changes KVM_NMI bahavior.
- Require both kernel-site and userspace-site fix.
 
  2) Get the LAPIC state from kernel irqchip, and inject NMI if it is 
  allowed
 (which is v4 patchset)
- Simple, don't changes any kernel behavior.
- Only need the userspace-site fix
 
  3) Add KVM_SET_LINT1 approach (which is v5 patchset)
- don't changes the kernel's KVM_NMI behavior.
- much complex
- Require both kernel-site and userspace-site fix.
- userspace-site should also handle the !KVM_SET_LINT1
  condition, it uses all the 2) approach' code. it means
  this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
 
  This is an urgent bug of us, we need to settle it down soo
  
  While (1) is simple, it overloads a single ioctl with two meanings,
  that's not so good.
  
  Whether we do (1) or (3), we need (2) as well, for older kernels.
  
  So I recommend first focusing on (2) and merging it, then doing (3).
  
  (note an additional issue with 3 is whether to make it a vm or vcpu
  ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
  choice).
  
It is the 2) approach.
It only changes the user space site, the kernel site is not touched.
It is changed from previous v4 patch, fixed problems found by Jan.
end previous discussions


From: Lai Jiangshan la...@cn.fujitsu.com


Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
  interrupt.
- When in-kernel irqchip is enabled, get the in-kernel LAPIC states
  and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
  delivering the NMI directly. (Suggested by Jan Kiszka)

Changed from old version:
  re-implement it by the Jan's suggestion.
  fix the race found by Jan.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
Acked-by: Avi Kivity a...@redhat.com
Acked-by: Jan Kiszka jan.kis...@web.de
---
 hw/apic.c |   33 +
 hw/apic.h |1 +
 monitor.c |6 +-
 3 files changed, 39 insertions(+), 1 deletions(-)
diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..922796a 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
+
+static void kvm_irqchip_deliver_nmi(void *p)
+{
+APICState *s = p;
+struct kvm_lapic_state klapic;
+uint32_t lvt;
+
+kvm_get_lapic(s-cpu_env, klapic);
+lvt = kapic_reg(klapic, 0x32 + APIC_LVT_LINT1);
+
+if (lvt  APIC_LVT_MASKED) {
+return;
+}
+
+if (((lvt  8)  7) != APIC_DM_NMI) {
+return;
+}
+
+kvm_vcpu_ioctl(s-cpu_env, KVM_NMI);
+}
+
+void apic_deliver_nmi(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+if (kvm_irqchip_in_kernel()) {
+run_on_cpu(s-cpu_env, kvm_irqchip_deliver_nmi, s);
+} else {
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict 
*qdict, QObject **ret_data)
 CPUState *env;
 
 for (env = first_cpu; env != NULL; env = env-next_cpu) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+if (!env-apic_state) {
+cpu_interrupt(env, CPU_INTERRUPT_NMI);
+} else {
+apic_deliver_nmi(env-apic_state);
+}
 }
 
 return 0;