Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
On Thu, 2014-11-27 at 18:02 +, Julien Grall wrote: state at the GIC level. This would also avoid masking the output signal and requires specific handling in the guest OS. which requires? It doesn't seem quite right to me otherwise, since context switching the virq state *removes* the need to have the guest do anything other than what it would do on native. Assuming this is what you meant I propose (fixing some grammar etc as I go): xen/arm: Handle platforms with edge-triggered virtual timer Some platforms (such as the ARMv8 model) use an edge-triggered interrupt for the virtual timer. Even if the timer output signal is masked in the context switch, the GIC will keep track that of any interrupts raised while IRQs are disabled. As soon as IRQs are re-enabled, the virtual interrupt timer will be injected to Xen. If an idle vVCPU was scheduled next then the interrupt handler doesn't expect to the receive the IRQ and will crash: (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54 (XEN)[00247010] do_IRQ+0x1a4/0x220 (XEN)[00244864] gic_interrupt+0x50/0xec (XEN)[0024fbac] do_trap_irq+0x20/0x2c (XEN)[00255240] hyp_irq+0x5c/0x60 (XEN)[00241084] context_switch+0xb8/0xc4 (XEN)[0022482c] schedule+0x684/0x6d0 (XEN)[0022785c] __do_softirq+0xcc/0xe8 (XEN)[002278d4] do_softirq+0x14/0x1c (XEN)[00240fac] idle_loop+0x134/0x154 (XEN)[0024c160] start_secondary+0x14c/0x15c (XEN)[0001] 0001 The proper solution is to context switch the virtual interrupt state at the GIC level. This would also avoid masking the output signal which requires specific handling in the guest OS and more complex code in Xen to deal with EOIs, and so is desirable for that reason too. Sadly, this solution requires some refactoring which would not be suitable for a freeze exception for the Xen 4.5 release. For now implement a temporary solution which ignores the virtual timer interrupt when the idle VCPU is running. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
On Tue, 2014-11-25 at 17:44 +, Julien Grall wrote: ARMv8 model may not disable correctly the timer interrupt when Xen correct disable context switch to an idle vCPU. Therefore Xen may receive a spurious context switches and s/spurious/unexpected/ (since spurious has a specific meaning in the h/w which does not match what is happening here) timer interrupt. As the idle domain doesn't have vGIC, Xen will crash when trying to inject the interrupt with the following stack trace. (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54 (XEN)[00247010] do_IRQ+0x1a4/0x220 (XEN)[00244864] gic_interrupt+0x50/0xec (XEN)[0024fbac] do_trap_irq+0x20/0x2c (XEN)[00255240] hyp_irq+0x5c/0x60 (XEN)[00241084] context_switch+0xb8/0xc4 (XEN)[0022482c] schedule+0x684/0x6d0 (XEN)[0022785c] __do_softirq+0xcc/0xe8 (XEN)[002278d4] do_softirq+0x14/0x1c (XEN)[00240fac] idle_loop+0x134/0x154 (XEN)[0024c160] start_secondary+0x14c/0x15c (XEN)[0001] 0001 While we receive spurious virtual timer interrupt, this could be safely ignore for the time being. A proper fix need to be found for Xen 4.6. Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Ian Campbell ian.campb...@citrix.com Although I wonder if we should log, perhaps rate limited or only once. Also, I've some grammar nits (above and below) which I can fix on commit if there is no resend... --- This patch is a bug fix candidate for Xen 4.5. Any ARMv8 model may randomly crash when running Xen. CCing Konrad. This patch don't inject the virtual timer interrupt if the current VCPU is the idle one. Entering in this function with the idle VCPU is already a bug itself. For now, I think this patch is the safest way to resolve the problem. Meanwhile, I'm investigating with ARM to see wheter the bug comes from Xen or the model. --- xen/arch/arm/time.c | 8 1 file changed, 8 insertions(+) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index a6436f1..83c74cb 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -169,6 +169,14 @@ static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { +/* + * ARMv8 model may not disable correctly the timer interrupt when correctly disable + * Xen context switch to an idle vCPU. Therefore Xen may receive context switches and may receive an unexpected timer interrupt + * timer interrupt. + */ +if ( is_idle_vcpu(current) ) +return; + current-arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); WRITE_SYSREG32(current-arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); vgic_vcpu_inject_irq(current, current-arch.virt_timer.irq); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
On Thu, 27 Nov 2014, Ian Campbell wrote: On Tue, 2014-11-25 at 17:44 +, Julien Grall wrote: ARMv8 model may not disable correctly the timer interrupt when Xen correct disable context switch to an idle vCPU. Therefore Xen may receive a spurious context switches and s/spurious/unexpected/ (since spurious has a specific meaning in the h/w which does not match what is happening here) timer interrupt. As the idle domain doesn't have vGIC, Xen will crash when trying to inject the interrupt with the following stack trace. (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54 (XEN)[00247010] do_IRQ+0x1a4/0x220 (XEN)[00244864] gic_interrupt+0x50/0xec (XEN)[0024fbac] do_trap_irq+0x20/0x2c (XEN)[00255240] hyp_irq+0x5c/0x60 (XEN)[00241084] context_switch+0xb8/0xc4 (XEN)[0022482c] schedule+0x684/0x6d0 (XEN)[0022785c] __do_softirq+0xcc/0xe8 (XEN)[002278d4] do_softirq+0x14/0x1c (XEN)[00240fac] idle_loop+0x134/0x154 (XEN)[0024c160] start_secondary+0x14c/0x15c (XEN)[0001] 0001 While we receive spurious virtual timer interrupt, this could be safely ignore for the time being. A proper fix need to be found for Xen 4.6. Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Ian Campbell ian.campb...@citrix.com Although I wonder if we should log, perhaps rate limited or only once. Also, I've some grammar nits (above and below) which I can fix on commit if there is no resend... --- This patch is a bug fix candidate for Xen 4.5. Any ARMv8 model may randomly crash when running Xen. CCing Konrad. This patch don't inject the virtual timer interrupt if the current VCPU is the idle one. Entering in this function with the idle VCPU is already a bug itself. For now, I think this patch is the safest way to resolve the problem. Meanwhile, I'm investigating with ARM to see wheter the bug comes from Xen or the model. It is worth noting that there are no bad side effects of this change: the vtimer_interrupt is always supposed to be received on non-idle domains. As Julien wrote, the fact that we are receiving a vtimer_interrupt in the idle_domain is a bug, one that probably comes from the ARM model not emulating hardware correctly. xen/arch/arm/time.c | 8 1 file changed, 8 insertions(+) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index a6436f1..83c74cb 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -169,6 +169,14 @@ static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { +/* + * ARMv8 model may not disable correctly the timer interrupt when correctly disable + * Xen context switch to an idle vCPU. Therefore Xen may receive context switches and may receive an unexpected timer interrupt + * timer interrupt. + */ +if ( is_idle_vcpu(current) ) +return; + current-arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); WRITE_SYSREG32(current-arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); vgic_vcpu_inject_irq(current, current-arch.virt_timer.irq); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
Hi Stefano, On 27/11/14 10:51, Stefano Stabellini wrote: On Thu, 27 Nov 2014, Ian Campbell wrote: On Tue, 2014-11-25 at 17:44 +, Julien Grall wrote: ARMv8 model may not disable correctly the timer interrupt when Xen correct disable context switch to an idle vCPU. Therefore Xen may receive a spurious context switches and s/spurious/unexpected/ (since spurious has a specific meaning in the h/w which does not match what is happening here) timer interrupt. As the idle domain doesn't have vGIC, Xen will crash when trying to inject the interrupt with the following stack trace. (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54 (XEN)[00247010] do_IRQ+0x1a4/0x220 (XEN)[00244864] gic_interrupt+0x50/0xec (XEN)[0024fbac] do_trap_irq+0x20/0x2c (XEN)[00255240] hyp_irq+0x5c/0x60 (XEN)[00241084] context_switch+0xb8/0xc4 (XEN)[0022482c] schedule+0x684/0x6d0 (XEN)[0022785c] __do_softirq+0xcc/0xe8 (XEN)[002278d4] do_softirq+0x14/0x1c (XEN)[00240fac] idle_loop+0x134/0x154 (XEN)[0024c160] start_secondary+0x14c/0x15c (XEN)[0001] 0001 While we receive spurious virtual timer interrupt, this could be safely ignore for the time being. A proper fix need to be found for Xen 4.6. Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Ian Campbell ian.campb...@citrix.com Although I wonder if we should log, perhaps rate limited or only once. Also, I've some grammar nits (above and below) which I can fix on commit if there is no resend... --- This patch is a bug fix candidate for Xen 4.5. Any ARMv8 model may randomly crash when running Xen. CCing Konrad. This patch don't inject the virtual timer interrupt if the current VCPU is the idle one. Entering in this function with the idle VCPU is already a bug itself. For now, I think this patch is the safest way to resolve the problem. Meanwhile, I'm investigating with ARM to see wheter the bug comes from Xen or the model. It is worth noting that there are no bad side effects of this change: the vtimer_interrupt is always supposed to be received on non-idle domains. As Julien wrote, the fact that we are receiving a vtimer_interrupt in the idle_domain is a bug, one that probably comes from the ARM model not emulating hardware correctly. ARM says: The v8A ARM ARM says that the signal output will be disabled if , so the signal will be set to 0. However, how this is treated by the GIC depends on its configuration. So I'm not so sure it's a model bug. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
Hi Ian, On 27/11/14 10:40, Ian Campbell wrote: On Tue, 2014-11-25 at 17:44 +, Julien Grall wrote: ARMv8 model may not disable correctly the timer interrupt when Xen correct disable context switch to an idle vCPU. Therefore Xen may receive a spurious context switches and s/spurious/unexpected/ (since spurious has a specific meaning in the h/w which does not match what is happening here) timer interrupt. As the idle domain doesn't have vGIC, Xen will crash when trying to inject the interrupt with the following stack trace. (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54 (XEN)[00247010] do_IRQ+0x1a4/0x220 (XEN)[00244864] gic_interrupt+0x50/0xec (XEN)[0024fbac] do_trap_irq+0x20/0x2c (XEN)[00255240] hyp_irq+0x5c/0x60 (XEN)[00241084] context_switch+0xb8/0xc4 (XEN)[0022482c] schedule+0x684/0x6d0 (XEN)[0022785c] __do_softirq+0xcc/0xe8 (XEN)[002278d4] do_softirq+0x14/0x1c (XEN)[00240fac] idle_loop+0x134/0x154 (XEN)[0024c160] start_secondary+0x14c/0x15c (XEN)[0001] 0001 While we receive spurious virtual timer interrupt, this could be safely ignore for the time being. A proper fix need to be found for Xen 4.6. Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Ian Campbell ian.campb...@citrix.com Although I wonder if we should log, perhaps rate limited or only once. I don't think the printk is necessary, receiving this unexpected interrupt is harmless from the perspective that the guest will still work when the vCPU will run again. Also, I've some grammar nits (above and below) which I can fix on commit if there is no resend... Thanks. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.5] xen/arm: Fix virtual timer on ARMv8 Model
ARMv8 model may not disable correctly the timer interrupt when Xen context switch to an idle vCPU. Therefore Xen may receive a spurious timer interrupt. As the idle domain doesn't have vGIC, Xen will crash when trying to inject the interrupt with the following stack trace. (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54 (XEN)[00247010] do_IRQ+0x1a4/0x220 (XEN)[00244864] gic_interrupt+0x50/0xec (XEN)[0024fbac] do_trap_irq+0x20/0x2c (XEN)[00255240] hyp_irq+0x5c/0x60 (XEN)[00241084] context_switch+0xb8/0xc4 (XEN)[0022482c] schedule+0x684/0x6d0 (XEN)[0022785c] __do_softirq+0xcc/0xe8 (XEN)[002278d4] do_softirq+0x14/0x1c (XEN)[00240fac] idle_loop+0x134/0x154 (XEN)[0024c160] start_secondary+0x14c/0x15c (XEN)[0001] 0001 While we receive spurious virtual timer interrupt, this could be safely ignore for the time being. A proper fix need to be found for Xen 4.6. Signed-off-by: Julien Grall julien.gr...@linaro.org --- This patch is a bug fix candidate for Xen 4.5. Any ARMv8 model may randomly crash when running Xen. This patch don't inject the virtual timer interrupt if the current VCPU is the idle one. Entering in this function with the idle VCPU is already a bug itself. For now, I think this patch is the safest way to resolve the problem. Meanwhile, I'm investigating with ARM to see wheter the bug comes from Xen or the model. --- xen/arch/arm/time.c | 8 1 file changed, 8 insertions(+) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index a6436f1..83c74cb 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -169,6 +169,14 @@ static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { +/* + * ARMv8 model may not disable correctly the timer interrupt when + * Xen context switch to an idle vCPU. Therefore Xen may receive + * timer interrupt. + */ +if ( is_idle_vcpu(current) ) +return; + current-arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); WRITE_SYSREG32(current-arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); vgic_vcpu_inject_irq(current, current-arch.virt_timer.irq); -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel