Re: Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Yang, Shunyong
Hi, Eric, Marc and Christoffer,

On Thu, 2018-03-08 at 19:12 +0100, Auger Eric wrote:
> Hi Marc, Christoffer,
> 
> On 08/03/18 18:28, Marc Zyngier wrote:
> > 
> > On Thu, 08 Mar 2018 16:19:00 +,
> > Christoffer Dall wrote:
> > > 
> > > 
> > > On Thu, Mar 08, 2018 at 11:54:27AM +, Marc Zyngier wrote:
> > > > 
> > > > On 08/03/18 09:49, Marc Zyngier wrote:
> > > > > 
> > > > > [updated Christoffer's email address]
> > > > > 
> > > > > Hi Shunyong,
> > > > > 
> > > > > On 08/03/18 07:01, Shunyong Yang wrote:
> > > > > > 
> > > > > > When resampling irqfds is enabled, level interrupt should
> > > > > > be
> > > > > > de-asserted when resampling happens. On page 4-47 of GIC v3
> > > > > > specification IHI0069D, it said,
> > > > > > "When the PE acknowledges an SGI, a PPI, or an SPI at the
> > > > > > CPU
> > > > > > interface, the IRI changes the status of the interrupt to
> > > > > > active
> > > > > > and pending if:
> > > > > > • It is an edge-triggered interrupt, and another edge has
> > > > > > been
> > > > > > detected since the interrupt was acknowledged.
> > > > > > • It is a level-sensitive interrupt, and the level has not
> > > > > > been
> > > > > > deasserted since the interrupt was acknowledged."
> > > > > > 
> > > > > > GIC v2 specification IHI0048B.b has similar description on
> > > > > > page
> > > > > > 3-42 for state machine transition.
> > > > > > 
> > > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation
> > > > > > driver
> > > > > > in samples/vfio-mdev) triggers a level interrupt, the
> > > > > > status
> > > > > > transition in LR is pending-->active-->active and pending.
> > > > > > Then it will wait resampling to de-assert the interrupt.
> > > > > > 
> > > > > > Current design of lr_signals_eoi_mi() will return false if
> > > > > > state
> > > > > > in LR is not invalid(Inactive). It causes resampling will
> > > > > > not happen
> > > > > > in mtty case.
> > > > > Let me rephrase this, and tell me if I understood it
> > > > > correctly:
> > > > > 
> > > > > - A level interrupt is injected, activated by the guest (LR
> > > > > state=active)
> > > > > - guest exits, re-enters, (LR state=pending+active)
> > > > > - guest EOIs the interrupt (LR state=pending)
> > > > > - maintenance interrupt
> > > > > - we don't signal the resampling because we're not in an
> > > > > invalid state
> > > > > 
> > > > > Is that correct?
> > > > > 
> > > > > That's an interesting case, because it seems to invalidate
> > > > > some of the 
> > > > > optimization that went in over a year ago.
> > > > > 
> > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR
> > > > > fields
> > > > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary
> > > > > save_maint_int_state
> > > > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary
> > > > > process_maintenance operation
> > > > > 
> > > > > We could compare the value of the LR before the guest entry
> > > > > with
> > > > > the value at exit time, but we still could miss it if we have
> > > > > a
> > > > > transition such as P+A -> P -> A and assume a long enough
> > > > > propagation
> > > > > delay for the maintenance interrupt (which is very likely).
> > > > > 
> > > > > In essence, we have lost the benefit of EISR, which was to
> > > > > give us a
> > > > > way to deal with asynchronous signalling.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > This will cause interrupt fired continuously to guest even
> > > > > > 8250 IIR
> > > > > > has no interrupt. When 8250's interrupt is configured in
> > > > > > shared mode,
> > > > > > it will pass interrupt to other drivers to handle. However,
> > > > > > there
> > > > > > is no other driver involved. Then, a "nobody cared" kernel
> > > > > > complaint
> > > > > > occurs.
> > > > > > 
> > > > > > / # cat /dev/ttyS0
> > > > > > [4.826836] random: crng init done
> > > > > > [6.373620] irq 41: nobody cared (try booting with the
> > > > > > "irqpoll"
> > > > > > option)
> > > > > > [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted
> > > > > > 4.16.0-rc4 #4
> > > > > > [6.378927] Hardware name: linux,dummy-virt (DT)
> > > > > > [6.380876] Call trace:
> > > > > > [6.381937]  dump_backtrace+0x0/0x180
> > > > > > [6.383495]  show_stack+0x14/0x1c
> > > > > > [6.384902]  dump_stack+0x90/0xb4
> > > > > > [6.386312]  __report_bad_irq+0x38/0xe0
> > > > > > [6.387944]  note_interrupt+0x1f4/0x2b8
> > > > > > [6.389568]  handle_irq_event_percpu+0x54/0x7c
> > > > > > [6.391433]  handle_irq_event+0x44/0x74
> > > > > > [6.393056]  handle_fasteoi_irq+0x9c/0x154
> > > > > > [6.394784]  generic_handle_irq+0x24/0x38
> > > > > > [6.396483]  __handle_domain_irq+0x60/0xb4
> > > > > > [6.398207]  gic_handle_irq+0x98/0x1b0
> > > > > > [6.399796]  el1_irq+0xb0/0x128
> > > > > > [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> > > > > > [6.403149]  __setup_irq+0x41c/0x678
> > > > > > [6.404669]  

Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Auger Eric
Hi Marc, Christoffer,

On 08/03/18 18:28, Marc Zyngier wrote:
> On Thu, 08 Mar 2018 16:19:00 +,
> Christoffer Dall wrote:
>>
>> On Thu, Mar 08, 2018 at 11:54:27AM +, Marc Zyngier wrote:
>>> On 08/03/18 09:49, Marc Zyngier wrote:
 [updated Christoffer's email address]

 Hi Shunyong,

 On 08/03/18 07:01, Shunyong Yang wrote:
> When resampling irqfds is enabled, level interrupt should be
> de-asserted when resampling happens. On page 4-47 of GIC v3
> specification IHI0069D, it said,
> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> interface, the IRI changes the status of the interrupt to active
> and pending if:
> • It is an edge-triggered interrupt, and another edge has been
> detected since the interrupt was acknowledged.
> • It is a level-sensitive interrupt, and the level has not been
> deasserted since the interrupt was acknowledged."
>
> GIC v2 specification IHI0048B.b has similar description on page
> 3-42 for state machine transition.
>
> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> in samples/vfio-mdev) triggers a level interrupt, the status
> transition in LR is pending-->active-->active and pending.
> Then it will wait resampling to de-assert the interrupt.
>
> Current design of lr_signals_eoi_mi() will return false if state
> in LR is not invalid(Inactive). It causes resampling will not happen
> in mtty case.

 Let me rephrase this, and tell me if I understood it correctly:

 - A level interrupt is injected, activated by the guest (LR state=active)
 - guest exits, re-enters, (LR state=pending+active)
 - guest EOIs the interrupt (LR state=pending)
 - maintenance interrupt
 - we don't signal the resampling because we're not in an invalid state

 Is that correct?

 That's an interesting case, because it seems to invalidate some of the 
 optimization that went in over a year ago.

 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
 b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary 
 save_maint_int_state
 af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary 
 process_maintenance operation

 We could compare the value of the LR before the guest entry with
 the value at exit time, but we still could miss it if we have a
 transition such as P+A -> P -> A and assume a long enough propagation
 delay for the maintenance interrupt (which is very likely).

 In essence, we have lost the benefit of EISR, which was to give us a
 way to deal with asynchronous signalling.

>
> This will cause interrupt fired continuously to guest even 8250 IIR
> has no interrupt. When 8250's interrupt is configured in shared mode,
> it will pass interrupt to other drivers to handle. However, there
> is no other driver involved. Then, a "nobody cared" kernel complaint
> occurs.
>
> / # cat /dev/ttyS0
> [4.826836] random: crng init done
> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> option)
> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> [6.378927] Hardware name: linux,dummy-virt (DT)
> [6.380876] Call trace:
> [6.381937]  dump_backtrace+0x0/0x180
> [6.383495]  show_stack+0x14/0x1c
> [6.384902]  dump_stack+0x90/0xb4
> [6.386312]  __report_bad_irq+0x38/0xe0
> [6.387944]  note_interrupt+0x1f4/0x2b8
> [6.389568]  handle_irq_event_percpu+0x54/0x7c
> [6.391433]  handle_irq_event+0x44/0x74
> [6.393056]  handle_fasteoi_irq+0x9c/0x154
> [6.394784]  generic_handle_irq+0x24/0x38
> [6.396483]  __handle_domain_irq+0x60/0xb4
> [6.398207]  gic_handle_irq+0x98/0x1b0
> [6.399796]  el1_irq+0xb0/0x128
> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> [6.403149]  __setup_irq+0x41c/0x678
> [6.404669]  request_threaded_irq+0xe0/0x190
> [6.406474]  univ8250_setup_irq+0x208/0x234
> [6.408250]  serial8250_do_startup+0x1b4/0x754
> [6.410123]  serial8250_startup+0x20/0x28
> [6.411826]  uart_startup.part.21+0x78/0x144
> [6.413633]  uart_port_activate+0x50/0x68
> [6.415328]  tty_port_open+0x84/0xd4
> [6.416851]  uart_open+0x34/0x44
> [6.418229]  tty_open+0xec/0x3c8
> [6.419610]  chrdev_open+0xb0/0x198
> [6.421093]  do_dentry_open+0x200/0x310
> [6.422714]  vfs_open+0x54/0x84
> [6.424054]  path_openat+0x2dc/0xf04
> [6.425569]  do_filp_open+0x68/0xd8
> [6.427044]  do_sys_open+0x16c/0x224
> [6.428563]  SyS_openat+0x10/0x18
> [6.429972]  el0_svc_naked+0x30/0x34
> [6.431494] handlers:
> [6.432479] [<0e9fb4bb>] serial8250_interrupt
> [6.434597] Disabling IRQ #41
>
> This 

Re: [PATCH v5 22/23] arm64: KVM: Allow mapping of vectors outside of the RAM region

2018-03-08 Thread Andrew Jones
On Thu, Mar 01, 2018 at 03:55:37PM +, Marc Zyngier wrote:
> We're now ready to map our vectors in weird and wonderful locations.
> On enabling ARM64_HARDEN_EL2_VECTORS, a vector slots gets allocated
> if this hasn't been already done via ARM64_HARDEN_BRANCH_PREDICTOR
> and gets mapped outside of the normal RAM region, next to the
> idmap.
> 
> That way, being able to obtain VBAR_EL2 doesn't reveal the mapping
> of the rest of the hypervisor code.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  Documentation/arm64/memory.txt   |  3 +-
>  arch/arm64/Kconfig   | 16 
>  arch/arm64/include/asm/kvm_mmu.h | 81 
> ++--
>  arch/arm64/include/asm/mmu.h |  5 ++-
>  arch/arm64/kernel/Makefile   |  4 +-
>  arch/arm64/kvm/va_layout.c   |  3 ++
>  6 files changed, 96 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt
> index c58cc5dbe667..c5dab30d3389 100644
> --- a/Documentation/arm64/memory.txt
> +++ b/Documentation/arm64/memory.txt
> @@ -90,7 +90,8 @@ When using KVM without the Virtualization Host Extensions, 
> the
>  hypervisor maps kernel pages in EL2 at a fixed (and potentially
>  random) offset from the linear mapping. See the kern_hyp_va macro and
>  kvm_update_va_mask function for more details. MMIO devices such as
> -GICv2 gets mapped next to the HYP idmap page.
> +GICv2 gets mapped next to the HYP idmap page, as do vectors when
> +ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
>  
>  When using KVM with the Virtualization Host Extensions, no additional
>  mappings are created, since the host kernel runs directly in EL2.
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7381eeb7ef8e..e6be4393aaad 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -904,6 +904,22 @@ config HARDEN_BRANCH_PREDICTOR
>  
> If unsure, say Y.
>  
> +config HARDEN_EL2_VECTORS
> + bool "Harden EL2 vector mapping against system register leak" if EXPERT
> + default y
> + help
> +   Speculation attacks against some high-performance processors can
> +   be used to leak privileged information such as the vector base
> +   register, resulting in a potential defeat of the EL2 layout
> +   randomization.
> +
> +   This config option will map the vectors to a fixed location,
> +   independent of the EL2 code mapping, so that revealing VBAR_EL2
> +   to an attacker does no give away any extra information. This
> +   only gets enabled on affected CPUs.
> +
> +   If unsure, say Y.
> +
>  menuconfig ARMV8_DEPRECATED
>   bool "Emulate deprecated/obsolete ARMv8 instructions"
>   depends on COMPAT
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 3da9e5aea936..433d13d0c271 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -360,33 +360,90 @@ static inline unsigned int kvm_get_vmid_bits(void)
>   return (cpuid_feature_extract_unsigned_field(reg, 
> ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
>  }
>  
> -#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
> +#if (defined(CONFIG_HARDEN_BRANCH_PREDICTOR) ||  \
> + defined(CONFIG_HARDEN_EL2_VECTORS))
> +/*
> + * EL2 vectors can be mapped and rerouted in a number of ways,
> + * depending on the kernel configuration and CPU present:
> + *
> + * - If the CPU has the ARM64_HARDEN_BRANCH_PREDICTOR cap, the
> + *   hardening sequence is placed in one of the vector slots, which is
> + *   executed before jumping to the real vectors.
> + *
> + * - If the CPU has both the ARM64_HARDEN_EL2_VECTORS and BP
> + *   hardening, the slot containing the hardening sequence is mapped
> + *   next to the idmap page, and executed before jumping to the real
> + *   vectors.
> + *
> + * - If the CPU only has ARM64_HARDEN_EL2_VECTORS, then an empty slot
> + *   is selected, mapped next to the idmap page, and executed before
> + *   jumping to the real vectors.
> + *
> + * Note that ARM64_HARDEN_EL2_VECTORS is somewhat incompatible with
> + * VHE, as we don't have hypervisor-specific mappings. If the system
> + * is VHE and yet selects this capability, it will be ignored.
> + */
>  #include 
>  
> +extern void *__kvm_bp_vect_base;
> +extern int __kvm_harden_el2_vector_slot;
> +
>  static inline void *kvm_get_hyp_vector(void)
>  {
>   struct bp_hardening_data *data = arm64_get_bp_hardening_data();
> - void *vect = kvm_ksym_ref(__kvm_hyp_vector);
> + int slot = -1;
> +
> + if (cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR) && data->fn)
> + slot = data->hyp_vectors_slot;
> +
> + if (cpus_have_const_cap(ARM64_HARDEN_EL2_VECTORS) &&
> + !has_vhe() && slot == -1)
> + slot = __kvm_harden_el2_vector_slot;
>  
> - if (data->fn) {
> - vect = __bp_harden_hyp_vecs_start +
> -data->hyp_vectors_slot * SZ_2K;
> + if 

Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Marc Zyngier
On Thu, 08 Mar 2018 16:19:00 +,
Christoffer Dall wrote:
> 
> On Thu, Mar 08, 2018 at 11:54:27AM +, Marc Zyngier wrote:
> > On 08/03/18 09:49, Marc Zyngier wrote:
> > > [updated Christoffer's email address]
> > > 
> > > Hi Shunyong,
> > > 
> > > On 08/03/18 07:01, Shunyong Yang wrote:
> > >> When resampling irqfds is enabled, level interrupt should be
> > >> de-asserted when resampling happens. On page 4-47 of GIC v3
> > >> specification IHI0069D, it said,
> > >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> > >> interface, the IRI changes the status of the interrupt to active
> > >> and pending if:
> > >> • It is an edge-triggered interrupt, and another edge has been
> > >> detected since the interrupt was acknowledged.
> > >> • It is a level-sensitive interrupt, and the level has not been
> > >> deasserted since the interrupt was acknowledged."
> > >>
> > >> GIC v2 specification IHI0048B.b has similar description on page
> > >> 3-42 for state machine transition.
> > >>
> > >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> > >> in samples/vfio-mdev) triggers a level interrupt, the status
> > >> transition in LR is pending-->active-->active and pending.
> > >> Then it will wait resampling to de-assert the interrupt.
> > >>
> > >> Current design of lr_signals_eoi_mi() will return false if state
> > >> in LR is not invalid(Inactive). It causes resampling will not happen
> > >> in mtty case.
> > > 
> > > Let me rephrase this, and tell me if I understood it correctly:
> > > 
> > > - A level interrupt is injected, activated by the guest (LR state=active)
> > > - guest exits, re-enters, (LR state=pending+active)
> > > - guest EOIs the interrupt (LR state=pending)
> > > - maintenance interrupt
> > > - we don't signal the resampling because we're not in an invalid state
> > > 
> > > Is that correct?
> > > 
> > > That's an interesting case, because it seems to invalidate some of the 
> > > optimization that went in over a year ago.
> > > 
> > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
> > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary 
> > > save_maint_int_state
> > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary 
> > > process_maintenance operation
> > > 
> > > We could compare the value of the LR before the guest entry with
> > > the value at exit time, but we still could miss it if we have a
> > > transition such as P+A -> P -> A and assume a long enough propagation
> > > delay for the maintenance interrupt (which is very likely).
> > > 
> > > In essence, we have lost the benefit of EISR, which was to give us a
> > > way to deal with asynchronous signalling.
> > > 
> > >>
> > >> This will cause interrupt fired continuously to guest even 8250 IIR
> > >> has no interrupt. When 8250's interrupt is configured in shared mode,
> > >> it will pass interrupt to other drivers to handle. However, there
> > >> is no other driver involved. Then, a "nobody cared" kernel complaint
> > >> occurs.
> > >>
> > >> / # cat /dev/ttyS0
> > >> [4.826836] random: crng init done
> > >> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> > >> option)
> > >> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> > >> [6.378927] Hardware name: linux,dummy-virt (DT)
> > >> [6.380876] Call trace:
> > >> [6.381937]  dump_backtrace+0x0/0x180
> > >> [6.383495]  show_stack+0x14/0x1c
> > >> [6.384902]  dump_stack+0x90/0xb4
> > >> [6.386312]  __report_bad_irq+0x38/0xe0
> > >> [6.387944]  note_interrupt+0x1f4/0x2b8
> > >> [6.389568]  handle_irq_event_percpu+0x54/0x7c
> > >> [6.391433]  handle_irq_event+0x44/0x74
> > >> [6.393056]  handle_fasteoi_irq+0x9c/0x154
> > >> [6.394784]  generic_handle_irq+0x24/0x38
> > >> [6.396483]  __handle_domain_irq+0x60/0xb4
> > >> [6.398207]  gic_handle_irq+0x98/0x1b0
> > >> [6.399796]  el1_irq+0xb0/0x128
> > >> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> > >> [6.403149]  __setup_irq+0x41c/0x678
> > >> [6.404669]  request_threaded_irq+0xe0/0x190
> > >> [6.406474]  univ8250_setup_irq+0x208/0x234
> > >> [6.408250]  serial8250_do_startup+0x1b4/0x754
> > >> [6.410123]  serial8250_startup+0x20/0x28
> > >> [6.411826]  uart_startup.part.21+0x78/0x144
> > >> [6.413633]  uart_port_activate+0x50/0x68
> > >> [6.415328]  tty_port_open+0x84/0xd4
> > >> [6.416851]  uart_open+0x34/0x44
> > >> [6.418229]  tty_open+0xec/0x3c8
> > >> [6.419610]  chrdev_open+0xb0/0x198
> > >> [6.421093]  do_dentry_open+0x200/0x310
> > >> [6.422714]  vfs_open+0x54/0x84
> > >> [6.424054]  path_openat+0x2dc/0xf04
> > >> [6.425569]  do_filp_open+0x68/0xd8
> > >> [6.427044]  do_sys_open+0x16c/0x224
> > >> [6.428563]  SyS_openat+0x10/0x18
> > >> [6.429972]  el0_svc_naked+0x30/0x34
> > >> [6.431494] handlers:
> > >> [6.432479] [<0e9fb4bb>] serial8250_interrupt
> > >> [

Re: [PATCH 1/2] KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid

2018-03-08 Thread Marc Zyngier
On Thu, 08 Mar 2018 16:02:42 +,
Christoffer Dall wrote:
> 
> On Thu, Mar 08, 2018 at 10:19:49AM +, Marc Zyngier wrote:
> > On 07/03/18 23:34, Christoffer Dall wrote:
> > > On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier  
> > > wrote:
> > >> The vgic code is trying to be clever when injecting GICv2 SGIs,
> > >> and will happily populate LRs with the same interrupt number if
> > >> they come from multiple vcpus (after all, they are distinct
> > >> interrupt sources).
> > >>
> > >> Unfortunately, this is against the letter of the architecture,
> > >> and the GICv2 architecture spec says "Each valid interrupt stored
> > >> in the List registers must have a unique VirtualID for that
> > >> virtual CPU interface.". GICv3 has similar (although slightly
> > >> ambiguous) restrictions.
> > >>
> > >> This results in guests locking up when using GICv2-on-GICv3, for
> > >> example. The obvious fix is to stop trying so hard, and inject
> > >> a single vcpu per SGI per guest entry. After all, pending SGIs
> > >> with multiple source vcpus are pretty rare, and are mostly seen
> > >> in scenario where the physical CPUs are severely overcomitted.
> > >>
> > >> Cc: sta...@vger.kernel.org
> > >> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush 
> > >> framework")
> > >> Signed-off-by: Marc Zyngier 
> > >> ---
> > >>  virt/kvm/arm/vgic/vgic.c | 11 +--
> > >>  1 file changed, 1 insertion(+), 10 deletions(-)
> > >>
> > >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > >> index c7c5ef190afa..1f7ff175f47b 100644
> > >> --- a/virt/kvm/arm/vgic/vgic.c
> > >> +++ b/virt/kvm/arm/vgic/vgic.c
> > >> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu 
> > >> *vcpu)
> > >> list_for_each_entry(irq, _cpu->ap_list_head, ap_list) {
> > >> spin_lock(>irq_lock);
> > >>
> > >> -   if (unlikely(vgic_target_oracle(irq) != vcpu))
> > >> -   goto next;
> > >> -
> > >> -   /*
> > >> -* If we get an SGI with multiple sources, try to get
> > >> -* them in all at once.
> > >> -*/
> > >> -   do {
> > >> +   if (likely(vgic_target_oracle(irq) == vcpu))
> > >> vgic_populate_lr(vcpu, irq, count++);
> > > 
> > > I think we need to change vgic_populate_lr to set the EOI maintenance
> > > interrupt flag so that when the interrupt is deactivated, if there are
> > > additional pending sources, we exit the guest and pick up the
> > > interrupt.
> > 
> > Potentially. We need to be careful about about the semantics of EOI MI
> > with non-level interrupts (see the other thread about EOI signalling).
> 
> I'll have a look.
> 
> > 
> > > An alternative would be to set the underflow interrupt, but I don't
> > > think that would be correct for multiple priorities, because the SGI
> > > could have a higher priority than other pending interrupts we put in
> > > the LR.
> > 
> > I don't think priorities are the issue (after all, we already sort the
> > irqs in order of priority). 
> 
> Yes, but assume you have three pending interrupts, one SGI from two
> sources, and one SPI, and assume that the SGI has priority 1 and SPI
> priority 2 (lower means higher priority), then I think with underflow or
> the no-pending interrupt flag, we'll deliver the SGI from the first
> source, and then the SPI, and then exiting to pick up the last SGI from
> the other source.  That's not how I understand the GIC architecture is
> supposed to work.  Am I missing something?

No, you do have a point here. I'm so glad this is a v2 only thing.

> 
> > My worry is that underflow is allowed to
> > fire if there is one interrupt pending, which implies that you could
> > end-up in a livelock scenario if you only have one SGI pending with
> > multiple sources.
> 
> Yes, doesn't work, so I think it should be a maintenance interrupt on
> EOI.
> 
> > 
> > Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
> > GICv2), which delivers a a MI if no pending interrupts are present. Once
> > the SGI has been activated, we're guaranteed to be able to inject a new
> > pending one.
> > 
> > I like the latter, because it doesn't overload the rest of the code with
> > new semantics. Thoughts?
> > 
> 
> I'm fine with that if I can be proven wrong about the multiple sources
> and priorities.

I'll have a look at respining this with MI on EOI.

Thanks,

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Christoffer Dall
On Thu, Mar 08, 2018 at 11:54:27AM +, Marc Zyngier wrote:
> On 08/03/18 09:49, Marc Zyngier wrote:
> > [updated Christoffer's email address]
> > 
> > Hi Shunyong,
> > 
> > On 08/03/18 07:01, Shunyong Yang wrote:
> >> When resampling irqfds is enabled, level interrupt should be
> >> de-asserted when resampling happens. On page 4-47 of GIC v3
> >> specification IHI0069D, it said,
> >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> >> interface, the IRI changes the status of the interrupt to active
> >> and pending if:
> >> • It is an edge-triggered interrupt, and another edge has been
> >> detected since the interrupt was acknowledged.
> >> • It is a level-sensitive interrupt, and the level has not been
> >> deasserted since the interrupt was acknowledged."
> >>
> >> GIC v2 specification IHI0048B.b has similar description on page
> >> 3-42 for state machine transition.
> >>
> >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> >> in samples/vfio-mdev) triggers a level interrupt, the status
> >> transition in LR is pending-->active-->active and pending.
> >> Then it will wait resampling to de-assert the interrupt.
> >>
> >> Current design of lr_signals_eoi_mi() will return false if state
> >> in LR is not invalid(Inactive). It causes resampling will not happen
> >> in mtty case.
> > 
> > Let me rephrase this, and tell me if I understood it correctly:
> > 
> > - A level interrupt is injected, activated by the guest (LR state=active)
> > - guest exits, re-enters, (LR state=pending+active)
> > - guest EOIs the interrupt (LR state=pending)
> > - maintenance interrupt
> > - we don't signal the resampling because we're not in an invalid state
> > 
> > Is that correct?
> > 
> > That's an interesting case, because it seems to invalidate some of the 
> > optimization that went in over a year ago.
> > 
> > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
> > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary 
> > save_maint_int_state
> > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary 
> > process_maintenance operation
> > 
> > We could compare the value of the LR before the guest entry with
> > the value at exit time, but we still could miss it if we have a
> > transition such as P+A -> P -> A and assume a long enough propagation
> > delay for the maintenance interrupt (which is very likely).
> > 
> > In essence, we have lost the benefit of EISR, which was to give us a
> > way to deal with asynchronous signalling.
> > 
> >>
> >> This will cause interrupt fired continuously to guest even 8250 IIR
> >> has no interrupt. When 8250's interrupt is configured in shared mode,
> >> it will pass interrupt to other drivers to handle. However, there
> >> is no other driver involved. Then, a "nobody cared" kernel complaint
> >> occurs.
> >>
> >> / # cat /dev/ttyS0
> >> [4.826836] random: crng init done
> >> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> >> option)
> >> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> >> [6.378927] Hardware name: linux,dummy-virt (DT)
> >> [6.380876] Call trace:
> >> [6.381937]  dump_backtrace+0x0/0x180
> >> [6.383495]  show_stack+0x14/0x1c
> >> [6.384902]  dump_stack+0x90/0xb4
> >> [6.386312]  __report_bad_irq+0x38/0xe0
> >> [6.387944]  note_interrupt+0x1f4/0x2b8
> >> [6.389568]  handle_irq_event_percpu+0x54/0x7c
> >> [6.391433]  handle_irq_event+0x44/0x74
> >> [6.393056]  handle_fasteoi_irq+0x9c/0x154
> >> [6.394784]  generic_handle_irq+0x24/0x38
> >> [6.396483]  __handle_domain_irq+0x60/0xb4
> >> [6.398207]  gic_handle_irq+0x98/0x1b0
> >> [6.399796]  el1_irq+0xb0/0x128
> >> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> >> [6.403149]  __setup_irq+0x41c/0x678
> >> [6.404669]  request_threaded_irq+0xe0/0x190
> >> [6.406474]  univ8250_setup_irq+0x208/0x234
> >> [6.408250]  serial8250_do_startup+0x1b4/0x754
> >> [6.410123]  serial8250_startup+0x20/0x28
> >> [6.411826]  uart_startup.part.21+0x78/0x144
> >> [6.413633]  uart_port_activate+0x50/0x68
> >> [6.415328]  tty_port_open+0x84/0xd4
> >> [6.416851]  uart_open+0x34/0x44
> >> [6.418229]  tty_open+0xec/0x3c8
> >> [6.419610]  chrdev_open+0xb0/0x198
> >> [6.421093]  do_dentry_open+0x200/0x310
> >> [6.422714]  vfs_open+0x54/0x84
> >> [6.424054]  path_openat+0x2dc/0xf04
> >> [6.425569]  do_filp_open+0x68/0xd8
> >> [6.427044]  do_sys_open+0x16c/0x224
> >> [6.428563]  SyS_openat+0x10/0x18
> >> [6.429972]  el0_svc_naked+0x30/0x34
> >> [6.431494] handlers:
> >> [6.432479] [<0e9fb4bb>] serial8250_interrupt
> >> [6.434597] Disabling IRQ #41
> >>
> >> This patch changes the lr state condition in lr_signals_eoi_mi() from
> >> invalid(Inactive) to active and pending to avoid this.
> >>
> >> I am not sure about the original design of the condition of
> >> invalid(active). So, This RFC is 

Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Christoffer Dall
On Thu, Mar 08, 2018 at 09:49:43AM +, Marc Zyngier wrote:
> [updated Christoffer's email address]
> 
> Hi Shunyong,
> 
> On 08/03/18 07:01, Shunyong Yang wrote:
> > When resampling irqfds is enabled, level interrupt should be
> > de-asserted when resampling happens. On page 4-47 of GIC v3
> > specification IHI0069D, it said,
> > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> > interface, the IRI changes the status of the interrupt to active
> > and pending if:
> > • It is an edge-triggered interrupt, and another edge has been
> > detected since the interrupt was acknowledged.
> > • It is a level-sensitive interrupt, and the level has not been
> > deasserted since the interrupt was acknowledged."
> > 
> > GIC v2 specification IHI0048B.b has similar description on page
> > 3-42 for state machine transition.
> > 
> > When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> > in samples/vfio-mdev) triggers a level interrupt, the status
> > transition in LR is pending-->active-->active and pending.
> > Then it will wait resampling to de-assert the interrupt.
> > 
> > Current design of lr_signals_eoi_mi() will return false if state
> > in LR is not invalid(Inactive). It causes resampling will not happen
> > in mtty case.
> 
> Let me rephrase this, and tell me if I understood it correctly:
> 
> - A level interrupt is injected, activated by the guest (LR state=active)
> - guest exits, re-enters, (LR state=pending+active)
> - guest EOIs the interrupt (LR state=pending)
> - maintenance interrupt
> - we don't signal the resampling because we're not in an invalid state
> 
> Is that correct?
> 
> That's an interesting case, because it seems to invalidate some of the 
> optimization that went in over a year ago.
> 
> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state
> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance 
> operation
> 
> We could compare the value of the LR before the guest entry with
> the value at exit time, but we still could miss it if we have a
> transition such as P+A -> P -> A and assume a long enough propagation
> delay for the maintenance interrupt (which is very likely).
> 
> In essence, we have lost the benefit of EISR, which was to give us a
> way to deal with asynchronous signalling.
> 

I don't understand why EISR gives us anything beyond looking at the LR
and evaluating if the state is 00.  My reading of the spec is that the
EISR is merely a shortcut to knowing the state of the LRs but contains
not record or information beyond what you can read from the LRs.

What am I missing?

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Auger Eric
Hi Marc,

On 08/03/18 12:54, Marc Zyngier wrote:
> On 08/03/18 09:49, Marc Zyngier wrote:
>> [updated Christoffer's email address]
>>
>> Hi Shunyong,
>>
>> On 08/03/18 07:01, Shunyong Yang wrote:
>>> When resampling irqfds is enabled, level interrupt should be
>>> de-asserted when resampling happens. On page 4-47 of GIC v3
>>> specification IHI0069D, it said,
>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
>>> interface, the IRI changes the status of the interrupt to active
>>> and pending if:
>>> • It is an edge-triggered interrupt, and another edge has been
>>> detected since the interrupt was acknowledged.
>>> • It is a level-sensitive interrupt, and the level has not been
>>> deasserted since the interrupt was acknowledged."
>>>
>>> GIC v2 specification IHI0048B.b has similar description on page
>>> 3-42 for state machine transition.
>>>
>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
>>> in samples/vfio-mdev) triggers a level interrupt, the status
>>> transition in LR is pending-->active-->active and pending.
>>> Then it will wait resampling to de-assert the interrupt.
>>>
>>> Current design of lr_signals_eoi_mi() will return false if state
>>> in LR is not invalid(Inactive). It causes resampling will not happen
>>> in mtty case.
>>
>> Let me rephrase this, and tell me if I understood it correctly:
>>
>> - A level interrupt is injected, activated by the guest (LR state=active)
>> - guest exits, re-enters, (LR state=pending+active)
>> - guest EOIs the interrupt (LR state=pending)
>> - maintenance interrupt
>> - we don't signal the resampling because we're not in an invalid state
>>
>> Is that correct?
>>
>> That's an interesting case, because it seems to invalidate some of the 
>> optimization that went in over a year ago.
>>
>> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
>> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary 
>> save_maint_int_state
>> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary 
>> process_maintenance operation
>>
>> We could compare the value of the LR before the guest entry with
>> the value at exit time, but we still could miss it if we have a
>> transition such as P+A -> P -> A and assume a long enough propagation
>> delay for the maintenance interrupt (which is very likely).
>>
>> In essence, we have lost the benefit of EISR, which was to give us a
>> way to deal with asynchronous signalling.
>>
>>>
>>> This will cause interrupt fired continuously to guest even 8250 IIR
>>> has no interrupt. When 8250's interrupt is configured in shared mode,
>>> it will pass interrupt to other drivers to handle. However, there
>>> is no other driver involved. Then, a "nobody cared" kernel complaint
>>> occurs.
>>>
>>> / # cat /dev/ttyS0
>>> [4.826836] random: crng init done
>>> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
>>> option)
>>> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
>>> [6.378927] Hardware name: linux,dummy-virt (DT)
>>> [6.380876] Call trace:
>>> [6.381937]  dump_backtrace+0x0/0x180
>>> [6.383495]  show_stack+0x14/0x1c
>>> [6.384902]  dump_stack+0x90/0xb4
>>> [6.386312]  __report_bad_irq+0x38/0xe0
>>> [6.387944]  note_interrupt+0x1f4/0x2b8
>>> [6.389568]  handle_irq_event_percpu+0x54/0x7c
>>> [6.391433]  handle_irq_event+0x44/0x74
>>> [6.393056]  handle_fasteoi_irq+0x9c/0x154
>>> [6.394784]  generic_handle_irq+0x24/0x38
>>> [6.396483]  __handle_domain_irq+0x60/0xb4
>>> [6.398207]  gic_handle_irq+0x98/0x1b0
>>> [6.399796]  el1_irq+0xb0/0x128
>>> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
>>> [6.403149]  __setup_irq+0x41c/0x678
>>> [6.404669]  request_threaded_irq+0xe0/0x190
>>> [6.406474]  univ8250_setup_irq+0x208/0x234
>>> [6.408250]  serial8250_do_startup+0x1b4/0x754
>>> [6.410123]  serial8250_startup+0x20/0x28
>>> [6.411826]  uart_startup.part.21+0x78/0x144
>>> [6.413633]  uart_port_activate+0x50/0x68
>>> [6.415328]  tty_port_open+0x84/0xd4
>>> [6.416851]  uart_open+0x34/0x44
>>> [6.418229]  tty_open+0xec/0x3c8
>>> [6.419610]  chrdev_open+0xb0/0x198
>>> [6.421093]  do_dentry_open+0x200/0x310
>>> [6.422714]  vfs_open+0x54/0x84
>>> [6.424054]  path_openat+0x2dc/0xf04
>>> [6.425569]  do_filp_open+0x68/0xd8
>>> [6.427044]  do_sys_open+0x16c/0x224
>>> [6.428563]  SyS_openat+0x10/0x18
>>> [6.429972]  el0_svc_naked+0x30/0x34
>>> [6.431494] handlers:
>>> [6.432479] [<0e9fb4bb>] serial8250_interrupt
>>> [6.434597] Disabling IRQ #41
>>>
>>> This patch changes the lr state condition in lr_signals_eoi_mi() from
>>> invalid(Inactive) to active and pending to avoid this.
>>>
>>> I am not sure about the original design of the condition of
>>> invalid(active). So, This RFC is sent out for comments.
>>>
>>> Cc: Joey Zheng 
>>> Signed-off-by: Shunyong Yang 

Re: [PATCH 1/2] KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid

2018-03-08 Thread Christoffer Dall
On Thu, Mar 08, 2018 at 10:19:49AM +, Marc Zyngier wrote:
> On 07/03/18 23:34, Christoffer Dall wrote:
> > On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier  wrote:
> >> The vgic code is trying to be clever when injecting GICv2 SGIs,
> >> and will happily populate LRs with the same interrupt number if
> >> they come from multiple vcpus (after all, they are distinct
> >> interrupt sources).
> >>
> >> Unfortunately, this is against the letter of the architecture,
> >> and the GICv2 architecture spec says "Each valid interrupt stored
> >> in the List registers must have a unique VirtualID for that
> >> virtual CPU interface.". GICv3 has similar (although slightly
> >> ambiguous) restrictions.
> >>
> >> This results in guests locking up when using GICv2-on-GICv3, for
> >> example. The obvious fix is to stop trying so hard, and inject
> >> a single vcpu per SGI per guest entry. After all, pending SGIs
> >> with multiple source vcpus are pretty rare, and are mostly seen
> >> in scenario where the physical CPUs are severely overcomitted.
> >>
> >> Cc: sta...@vger.kernel.org
> >> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush 
> >> framework")
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  virt/kvm/arm/vgic/vgic.c | 11 +--
> >>  1 file changed, 1 insertion(+), 10 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >> index c7c5ef190afa..1f7ff175f47b 100644
> >> --- a/virt/kvm/arm/vgic/vgic.c
> >> +++ b/virt/kvm/arm/vgic/vgic.c
> >> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
> >> list_for_each_entry(irq, _cpu->ap_list_head, ap_list) {
> >> spin_lock(>irq_lock);
> >>
> >> -   if (unlikely(vgic_target_oracle(irq) != vcpu))
> >> -   goto next;
> >> -
> >> -   /*
> >> -* If we get an SGI with multiple sources, try to get
> >> -* them in all at once.
> >> -*/
> >> -   do {
> >> +   if (likely(vgic_target_oracle(irq) == vcpu))
> >> vgic_populate_lr(vcpu, irq, count++);
> > 
> > I think we need to change vgic_populate_lr to set the EOI maintenance
> > interrupt flag so that when the interrupt is deactivated, if there are
> > additional pending sources, we exit the guest and pick up the
> > interrupt.
> 
> Potentially. We need to be careful about about the semantics of EOI MI
> with non-level interrupts (see the other thread about EOI signalling).

I'll have a look.

> 
> > An alternative would be to set the underflow interrupt, but I don't
> > think that would be correct for multiple priorities, because the SGI
> > could have a higher priority than other pending interrupts we put in
> > the LR.
> 
> I don't think priorities are the issue (after all, we already sort the
> irqs in order of priority). 

Yes, but assume you have three pending interrupts, one SGI from two
sources, and one SPI, and assume that the SGI has priority 1 and SPI
priority 2 (lower means higher priority), then I think with underflow or
the no-pending interrupt flag, we'll deliver the SGI from the first
source, and then the SPI, and then exiting to pick up the last SGI from
the other source.  That's not how I understand the GIC architecture is
supposed to work.  Am I missing something?

> My worry is that underflow is allowed to
> fire if there is one interrupt pending, which implies that you could
> end-up in a livelock scenario if you only have one SGI pending with
> multiple sources.

Yes, doesn't work, so I think it should be a maintenance interrupt on
EOI.

> 
> Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
> GICv2), which delivers a a MI if no pending interrupts are present. Once
> the SGI has been activated, we're guaranteed to be able to inject a new
> pending one.
> 
> I like the latter, because it doesn't overload the rest of the code with
> new semantics. Thoughts?
> 

I'm fine with that if I can be proven wrong about the multiple sources
and priorities.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Yang, Shunyong
Hi, Eric,

First, please let me change Christoffer's email to cd...@kernel.org. I
add more information about my test below, please check.

On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote:
> Hi,
> 
> On 08/03/18 08:01, Shunyong Yang wrote:
> > 
> > When resampling irqfds is enabled, level interrupt should be
> > de-asserted when resampling happens. On page 4-47 of GIC v3
> > specification IHI0069D, it said,
> > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> > interface, the IRI changes the status of the interrupt to active
> > and pending if:
> > • It is an edge-triggered interrupt, and another edge has been
> > detected since the interrupt was acknowledged.
> > • It is a level-sensitive interrupt, and the level has not been
> > deasserted since the interrupt was acknowledged."
> > 
> > GIC v2 specification IHI0048B.b has similar description on page
> > 3-42 for state machine transition.
> > 
> > When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> > in samples/vfio-mdev) triggers a level interrupt, the status
> > transition in LR is pending-->active-->active and pending.
> > Then it will wait resampling to de-assert the interrupt.
> > 
> > Current design of lr_signals_eoi_mi() will return false if state
> > in LR is not invalid(Inactive). It causes resampling will not
> > happen
> > in mtty case.
> > 
> > This will cause interrupt fired continuously to guest even 8250 IIR
> > has no interrupt. When 8250's interrupt is configured in shared
> > mode,
> > it will pass interrupt to other drivers to handle. However, there
> > is no other driver involved. Then, a "nobody cared" kernel
> > complaint
> > occurs.
> > 
> > / # cat /dev/ttyS0
> > [4.826836] random: crng init done
> > [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> > option)
> > [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> > [6.378927] Hardware name: linux,dummy-virt (DT)
> > [6.380876] Call trace:
> > [6.381937]  dump_backtrace+0x0/0x180
> > [6.383495]  show_stack+0x14/0x1c
> > [6.384902]  dump_stack+0x90/0xb4
> > [6.386312]  __report_bad_irq+0x38/0xe0
> > [6.387944]  note_interrupt+0x1f4/0x2b8
> > [6.389568]  handle_irq_event_percpu+0x54/0x7c
> > [6.391433]  handle_irq_event+0x44/0x74
> > [6.393056]  handle_fasteoi_irq+0x9c/0x154
> > [6.394784]  generic_handle_irq+0x24/0x38
> > [6.396483]  __handle_domain_irq+0x60/0xb4
> > [6.398207]  gic_handle_irq+0x98/0x1b0
> > [6.399796]  el1_irq+0xb0/0x128
> > [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> > [6.403149]  __setup_irq+0x41c/0x678
> > [6.404669]  request_threaded_irq+0xe0/0x190
> > [6.406474]  univ8250_setup_irq+0x208/0x234
> > [6.408250]  serial8250_do_startup+0x1b4/0x754
> > [6.410123]  serial8250_startup+0x20/0x28
> > [6.411826]  uart_startup.part.21+0x78/0x144
> > [6.413633]  uart_port_activate+0x50/0x68
> > [6.415328]  tty_port_open+0x84/0xd4
> > [6.416851]  uart_open+0x34/0x44
> > [6.418229]  tty_open+0xec/0x3c8
> > [6.419610]  chrdev_open+0xb0/0x198
> > [6.421093]  do_dentry_open+0x200/0x310
> > [6.422714]  vfs_open+0x54/0x84
> > [6.424054]  path_openat+0x2dc/0xf04
> > [6.425569]  do_filp_open+0x68/0xd8
> > [6.427044]  do_sys_open+0x16c/0x224
> > [6.428563]  SyS_openat+0x10/0x18
> > [6.429972]  el0_svc_naked+0x30/0x34
> > [6.431494] handlers:
> > [6.432479] [<0e9fb4bb>] serial8250_interrupt
> > [6.434597] Disabling IRQ #41
> > 
> > This patch changes the lr state condition in lr_signals_eoi_mi()
> > from
> > invalid(Inactive) to active and pending to avoid this.
> > 
> > I am not sure about the original design of the condition of
> > invalid(active). So, This RFC is sent out for comments.
> > 
> > Cc: Joey Zheng 
> > Signed-off-by: Shunyong Yang 
> > ---
> >  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
> >  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-
> > v2.c
> > index e9d840a75e7b..740ee9a5f551 100644
> > --- a/virt/kvm/arm/vgic/vgic-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-v2.c
> > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
> >  
> >  static bool lr_signals_eoi_mi(u32 lr_val)
> >  {
> > -   return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI)
> > &&
> > -      !(lr_val & GICH_LR_HW);
> > +   return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
> > +      (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
> >  }
> >  
> >  /*
> > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-
> > v3.c
> > index 6b329414e57a..43111bba7af9 100644
> > --- a/virt/kvm/arm/vgic/vgic-v3.c
> > +++ b/virt/kvm/arm/vgic/vgic-v3.c
> > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
> >  
> >  static bool lr_signals_eoi_mi(u64 lr_val)

[RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Shunyong Yang
When resampling irqfds is enabled, level interrupt should be
de-asserted when resampling happens. On page 4-47 of GIC v3
specification IHI0069D, it said,
"When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
interface, the IRI changes the status of the interrupt to active
and pending if:
• It is an edge-triggered interrupt, and another edge has been
detected since the interrupt was acknowledged.
• It is a level-sensitive interrupt, and the level has not been
deasserted since the interrupt was acknowledged."

GIC v2 specification IHI0048B.b has similar description on page
3-42 for state machine transition.

When some VFIO device, like mtty(8250 VFIO mdev emulation driver
in samples/vfio-mdev) triggers a level interrupt, the status
transition in LR is pending-->active-->active and pending.
Then it will wait resampling to de-assert the interrupt.

Current design of lr_signals_eoi_mi() will return false if state
in LR is not invalid(Inactive). It causes resampling will not happen
in mtty case.

This will cause interrupt fired continuously to guest even 8250 IIR
has no interrupt. When 8250's interrupt is configured in shared mode,
it will pass interrupt to other drivers to handle. However, there
is no other driver involved. Then, a "nobody cared" kernel complaint
occurs.

/ # cat /dev/ttyS0
[4.826836] random: crng init done
[6.373620] irq 41: nobody cared (try booting with the "irqpoll"
option)
[6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
[6.378927] Hardware name: linux,dummy-virt (DT)
[6.380876] Call trace:
[6.381937]  dump_backtrace+0x0/0x180
[6.383495]  show_stack+0x14/0x1c
[6.384902]  dump_stack+0x90/0xb4
[6.386312]  __report_bad_irq+0x38/0xe0
[6.387944]  note_interrupt+0x1f4/0x2b8
[6.389568]  handle_irq_event_percpu+0x54/0x7c
[6.391433]  handle_irq_event+0x44/0x74
[6.393056]  handle_fasteoi_irq+0x9c/0x154
[6.394784]  generic_handle_irq+0x24/0x38
[6.396483]  __handle_domain_irq+0x60/0xb4
[6.398207]  gic_handle_irq+0x98/0x1b0
[6.399796]  el1_irq+0xb0/0x128
[6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
[6.403149]  __setup_irq+0x41c/0x678
[6.404669]  request_threaded_irq+0xe0/0x190
[6.406474]  univ8250_setup_irq+0x208/0x234
[6.408250]  serial8250_do_startup+0x1b4/0x754
[6.410123]  serial8250_startup+0x20/0x28
[6.411826]  uart_startup.part.21+0x78/0x144
[6.413633]  uart_port_activate+0x50/0x68
[6.415328]  tty_port_open+0x84/0xd4
[6.416851]  uart_open+0x34/0x44
[6.418229]  tty_open+0xec/0x3c8
[6.419610]  chrdev_open+0xb0/0x198
[6.421093]  do_dentry_open+0x200/0x310
[6.422714]  vfs_open+0x54/0x84
[6.424054]  path_openat+0x2dc/0xf04
[6.425569]  do_filp_open+0x68/0xd8
[6.427044]  do_sys_open+0x16c/0x224
[6.428563]  SyS_openat+0x10/0x18
[6.429972]  el0_svc_naked+0x30/0x34
[6.431494] handlers:
[6.432479] [<0e9fb4bb>] serial8250_interrupt
[6.434597] Disabling IRQ #41

This patch changes the lr state condition in lr_signals_eoi_mi() from
invalid(Inactive) to active and pending to avoid this.

I am not sure about the original design of the condition of
invalid(active). So, This RFC is sent out for comments.

Cc: Joey Zheng 
Signed-off-by: Shunyong Yang 
---
 virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
 virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index e9d840a75e7b..740ee9a5f551 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
 
 static bool lr_signals_eoi_mi(u32 lr_val)
 {
-   return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) &&
-  !(lr_val & GICH_LR_HW);
+   return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
+  (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
 }
 
 /*
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 6b329414e57a..43111bba7af9 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
 
 static bool lr_signals_eoi_mi(u64 lr_val)
 {
-   return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) &&
-  !(lr_val & ICH_LR_HW);
+   return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&
+  (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);
 }
 
 void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
-- 
1.8.3.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Auger Eric
Hi Shunyong,
On 08/03/18 10:31, Yang, Shunyong wrote:
> Hi, Eric,
> 
> First, please let me change Christoffer's email to cd...@kernel.org. I
> add more information about my test below, please check.
> 
> On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote:
>> Hi,
>>
>> On 08/03/18 08:01, Shunyong Yang wrote:
>>>
>>> When resampling irqfds is enabled, level interrupt should be
>>> de-asserted when resampling happens. On page 4-47 of GIC v3
>>> specification IHI0069D, it said,
>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
>>> interface, the IRI changes the status of the interrupt to active
>>> and pending if:
>>> • It is an edge-triggered interrupt, and another edge has been
>>> detected since the interrupt was acknowledged.
>>> • It is a level-sensitive interrupt, and the level has not been
>>> deasserted since the interrupt was acknowledged."
>>>
>>> GIC v2 specification IHI0048B.b has similar description on page
>>> 3-42 for state machine transition.
>>>
>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
>>> in samples/vfio-mdev) triggers a level interrupt, the status
>>> transition in LR is pending-->active-->active and pending.
>>> Then it will wait resampling to de-assert the interrupt.
>>>
>>> Current design of lr_signals_eoi_mi() will return false if state
>>> in LR is not invalid(Inactive). It causes resampling will not
>>> happen
>>> in mtty case.
>>>
>>> This will cause interrupt fired continuously to guest even 8250 IIR
>>> has no interrupt. When 8250's interrupt is configured in shared
>>> mode,
>>> it will pass interrupt to other drivers to handle. However, there
>>> is no other driver involved. Then, a "nobody cared" kernel
>>> complaint
>>> occurs.
>>>
>>> / # cat /dev/ttyS0
>>> [4.826836] random: crng init done
>>> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
>>> option)
>>> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
>>> [6.378927] Hardware name: linux,dummy-virt (DT)
>>> [6.380876] Call trace:
>>> [6.381937]  dump_backtrace+0x0/0x180
>>> [6.383495]  show_stack+0x14/0x1c
>>> [6.384902]  dump_stack+0x90/0xb4
>>> [6.386312]  __report_bad_irq+0x38/0xe0
>>> [6.387944]  note_interrupt+0x1f4/0x2b8
>>> [6.389568]  handle_irq_event_percpu+0x54/0x7c
>>> [6.391433]  handle_irq_event+0x44/0x74
>>> [6.393056]  handle_fasteoi_irq+0x9c/0x154
>>> [6.394784]  generic_handle_irq+0x24/0x38
>>> [6.396483]  __handle_domain_irq+0x60/0xb4
>>> [6.398207]  gic_handle_irq+0x98/0x1b0
>>> [6.399796]  el1_irq+0xb0/0x128
>>> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
>>> [6.403149]  __setup_irq+0x41c/0x678
>>> [6.404669]  request_threaded_irq+0xe0/0x190
>>> [6.406474]  univ8250_setup_irq+0x208/0x234
>>> [6.408250]  serial8250_do_startup+0x1b4/0x754
>>> [6.410123]  serial8250_startup+0x20/0x28
>>> [6.411826]  uart_startup.part.21+0x78/0x144
>>> [6.413633]  uart_port_activate+0x50/0x68
>>> [6.415328]  tty_port_open+0x84/0xd4
>>> [6.416851]  uart_open+0x34/0x44
>>> [6.418229]  tty_open+0xec/0x3c8
>>> [6.419610]  chrdev_open+0xb0/0x198
>>> [6.421093]  do_dentry_open+0x200/0x310
>>> [6.422714]  vfs_open+0x54/0x84
>>> [6.424054]  path_openat+0x2dc/0xf04
>>> [6.425569]  do_filp_open+0x68/0xd8
>>> [6.427044]  do_sys_open+0x16c/0x224
>>> [6.428563]  SyS_openat+0x10/0x18
>>> [6.429972]  el0_svc_naked+0x30/0x34
>>> [6.431494] handlers:
>>> [6.432479] [<0e9fb4bb>] serial8250_interrupt
>>> [6.434597] Disabling IRQ #41
>>>
>>> This patch changes the lr state condition in lr_signals_eoi_mi()
>>> from
>>> invalid(Inactive) to active and pending to avoid this.
>>>
>>> I am not sure about the original design of the condition of
>>> invalid(active). So, This RFC is sent out for comments.
>>>
>>> Cc: Joey Zheng 
>>> Signed-off-by: Shunyong Yang 
>>> ---
>>>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>>>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-
>>> v2.c
>>> index e9d840a75e7b..740ee9a5f551 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>>>  
>>>  static bool lr_signals_eoi_mi(u32 lr_val)
>>>  {
>>> -   return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI)
>>> &&
>>> -  !(lr_val & GICH_LR_HW);
>>> +   return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
>>> +  (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
>>>  }
>>>  
>>>  /*
>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-
>>> v3.c
>>> index 6b329414e57a..43111bba7af9 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu 

Re: [PATCH v5 19/23] arm64: KVM: Allow far branches from vector slots to the main vectors

2018-03-08 Thread Catalin Marinas
On Thu, Mar 01, 2018 at 03:55:34PM +, Marc Zyngier wrote:
> So far, the branch from the vector slots to the main vectors can at
> most be 4GB from the main vectors (the reach of ADRP), and this
> distance is known at compile time. If we were to remap the slots
> to an unrelated VA, things would break badly.
> 
> A way to achieve VA independence would be to load the absolute
> address of the vectors (__kvm_hyp_vector), either using a constant
> pool or a series of movs, followed by an indirect branch.
> 
> This patches implements the latter solution, using another instance
> of a patching callback.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Catalin Marinas 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 18/23] arm64: KVM: Add epilogue branching to the vector code

2018-03-08 Thread Catalin Marinas
On Thu, Mar 01, 2018 at 03:55:33PM +, Marc Zyngier wrote:
> We are soon going to have to do some extra work in the BP hardening
> vector slots. Instead of doing that work in the vectors themselves
> (which would massively reduce the space available to deal with
> Spectre v2), let's branch to an epilogue where we can do "stuff".
> 
> This has a number of consequences:
> - We need some free registers, so we're spilling x0 and x1 on the
>   stack
> - In order to counterbalance this, we branch to the *second* instruction
>   in the vectors, avoiding the initial store that is already there
>   (or loading the registers back if we've branched to a panic vector)
> 
> This is all controlled by a new capability (ARM64_HARDEN_EL2_VECTORS)
> which doesn't get enabled yet.
> 
> Signed-off-by: Marc Zyngier 

That's mostly kvm but anyway:

Reviewed-by: Catalin Marinas 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid

2018-03-08 Thread Marc Zyngier
On 08/03/18 10:19, Marc Zyngier wrote:
> On 07/03/18 23:34, Christoffer Dall wrote:
>> On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier  wrote:
>>> The vgic code is trying to be clever when injecting GICv2 SGIs,
>>> and will happily populate LRs with the same interrupt number if
>>> they come from multiple vcpus (after all, they are distinct
>>> interrupt sources).
>>>
>>> Unfortunately, this is against the letter of the architecture,
>>> and the GICv2 architecture spec says "Each valid interrupt stored
>>> in the List registers must have a unique VirtualID for that
>>> virtual CPU interface.". GICv3 has similar (although slightly
>>> ambiguous) restrictions.
>>>
>>> This results in guests locking up when using GICv2-on-GICv3, for
>>> example. The obvious fix is to stop trying so hard, and inject
>>> a single vcpu per SGI per guest entry. After all, pending SGIs
>>> with multiple source vcpus are pretty rare, and are mostly seen
>>> in scenario where the physical CPUs are severely overcomitted.
>>>
>>> Cc: sta...@vger.kernel.org
>>> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush 
>>> framework")
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  virt/kvm/arm/vgic/vgic.c | 11 +--
>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>> index c7c5ef190afa..1f7ff175f47b 100644
>>> --- a/virt/kvm/arm/vgic/vgic.c
>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
>>> list_for_each_entry(irq, _cpu->ap_list_head, ap_list) {
>>> spin_lock(>irq_lock);
>>>
>>> -   if (unlikely(vgic_target_oracle(irq) != vcpu))
>>> -   goto next;
>>> -
>>> -   /*
>>> -* If we get an SGI with multiple sources, try to get
>>> -* them in all at once.
>>> -*/
>>> -   do {
>>> +   if (likely(vgic_target_oracle(irq) == vcpu))
>>> vgic_populate_lr(vcpu, irq, count++);
>>
>> I think we need to change vgic_populate_lr to set the EOI maintenance
>> interrupt flag so that when the interrupt is deactivated, if there are
>> additional pending sources, we exit the guest and pick up the
>> interrupt.
> 
> Potentially. We need to be careful about about the semantics of EOI MI
> with non-level interrupts (see the other thread about EOI signalling).
> 
>> An alternative would be to set the underflow interrupt, but I don't
>> think that would be correct for multiple priorities, because the SGI
>> could have a higher priority than other pending interrupts we put in
>> the LR.
> 
> I don't think priorities are the issue (after all, we already sort the
> irqs in order of priority). My worry is that underflow is allowed to
> fire if there is one interrupt pending, which implies that you could
> end-up in a livelock scenario if you only have one SGI pending with
> multiple sources.
> 
> Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
> GICv2), which delivers a a MI if no pending interrupts are present. Once
> the SGI has been activated, we're guaranteed to be able to inject a new
> pending one.
> 
> I like the latter, because it doesn't overload the rest of the code with
> new semantics. Thoughts?
To illustrate what I mean, here's a quickly hacked patch that implements
this idea:

diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index c00c4c33e432..b26eccc78fb1 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -503,6 +503,7 @@
 
 #define ICH_HCR_EN (1 << 0)
 #define ICH_HCR_UIE(1 << 1)
+#define ICH_HCR_NPIE   (1 << 3)
 #define ICH_HCR_TC (1 << 10)
 #define ICH_HCR_TALL0  (1 << 11)
 #define ICH_HCR_TALL1  (1 << 12)
diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
index d3453ee072fc..68d8b1f73682 100644
--- a/include/linux/irqchip/arm-gic.h
+++ b/include/linux/irqchip/arm-gic.h
@@ -84,6 +84,7 @@
 
 #define GICH_HCR_EN(1 << 0)
 #define GICH_HCR_UIE   (1 << 1)
+#define GICH_HCR_NPIE  (1 << 3)
 
 #define GICH_LR_VIRTUALID  (0x3ff << 0)
 #define GICH_LR_PHYSID_CPUID_SHIFT (10)
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 0be616e4ee29..9a22b74f1550 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -37,6 +37,13 @@ void vgic_v2_init_lrs(void)
vgic_v2_write_lr(i, 0);
 }
 
+void vgic_v2_set_npie(struct kvm_vcpu *vcpu)
+{
+   struct vgic_v2_cpu_if *cpuif = >arch.vgic_cpu.vgic_v2;
+
+   cpuif->vgic_hcr |= GICH_HCR_NPIE;
+}
+
 void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
 {
struct vgic_v2_cpu_if *cpuif = 

Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Marc Zyngier
On 08/03/18 09:49, Marc Zyngier wrote:
> [updated Christoffer's email address]
> 
> Hi Shunyong,
> 
> On 08/03/18 07:01, Shunyong Yang wrote:
>> When resampling irqfds is enabled, level interrupt should be
>> de-asserted when resampling happens. On page 4-47 of GIC v3
>> specification IHI0069D, it said,
>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
>> interface, the IRI changes the status of the interrupt to active
>> and pending if:
>> • It is an edge-triggered interrupt, and another edge has been
>> detected since the interrupt was acknowledged.
>> • It is a level-sensitive interrupt, and the level has not been
>> deasserted since the interrupt was acknowledged."
>>
>> GIC v2 specification IHI0048B.b has similar description on page
>> 3-42 for state machine transition.
>>
>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
>> in samples/vfio-mdev) triggers a level interrupt, the status
>> transition in LR is pending-->active-->active and pending.
>> Then it will wait resampling to de-assert the interrupt.
>>
>> Current design of lr_signals_eoi_mi() will return false if state
>> in LR is not invalid(Inactive). It causes resampling will not happen
>> in mtty case.
> 
> Let me rephrase this, and tell me if I understood it correctly:
> 
> - A level interrupt is injected, activated by the guest (LR state=active)
> - guest exits, re-enters, (LR state=pending+active)
> - guest EOIs the interrupt (LR state=pending)
> - maintenance interrupt
> - we don't signal the resampling because we're not in an invalid state
> 
> Is that correct?
> 
> That's an interesting case, because it seems to invalidate some of the 
> optimization that went in over a year ago.
> 
> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state
> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance 
> operation
> 
> We could compare the value of the LR before the guest entry with
> the value at exit time, but we still could miss it if we have a
> transition such as P+A -> P -> A and assume a long enough propagation
> delay for the maintenance interrupt (which is very likely).
> 
> In essence, we have lost the benefit of EISR, which was to give us a
> way to deal with asynchronous signalling.
> 
>>
>> This will cause interrupt fired continuously to guest even 8250 IIR
>> has no interrupt. When 8250's interrupt is configured in shared mode,
>> it will pass interrupt to other drivers to handle. However, there
>> is no other driver involved. Then, a "nobody cared" kernel complaint
>> occurs.
>>
>> / # cat /dev/ttyS0
>> [4.826836] random: crng init done
>> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
>> option)
>> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
>> [6.378927] Hardware name: linux,dummy-virt (DT)
>> [6.380876] Call trace:
>> [6.381937]  dump_backtrace+0x0/0x180
>> [6.383495]  show_stack+0x14/0x1c
>> [6.384902]  dump_stack+0x90/0xb4
>> [6.386312]  __report_bad_irq+0x38/0xe0
>> [6.387944]  note_interrupt+0x1f4/0x2b8
>> [6.389568]  handle_irq_event_percpu+0x54/0x7c
>> [6.391433]  handle_irq_event+0x44/0x74
>> [6.393056]  handle_fasteoi_irq+0x9c/0x154
>> [6.394784]  generic_handle_irq+0x24/0x38
>> [6.396483]  __handle_domain_irq+0x60/0xb4
>> [6.398207]  gic_handle_irq+0x98/0x1b0
>> [6.399796]  el1_irq+0xb0/0x128
>> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
>> [6.403149]  __setup_irq+0x41c/0x678
>> [6.404669]  request_threaded_irq+0xe0/0x190
>> [6.406474]  univ8250_setup_irq+0x208/0x234
>> [6.408250]  serial8250_do_startup+0x1b4/0x754
>> [6.410123]  serial8250_startup+0x20/0x28
>> [6.411826]  uart_startup.part.21+0x78/0x144
>> [6.413633]  uart_port_activate+0x50/0x68
>> [6.415328]  tty_port_open+0x84/0xd4
>> [6.416851]  uart_open+0x34/0x44
>> [6.418229]  tty_open+0xec/0x3c8
>> [6.419610]  chrdev_open+0xb0/0x198
>> [6.421093]  do_dentry_open+0x200/0x310
>> [6.422714]  vfs_open+0x54/0x84
>> [6.424054]  path_openat+0x2dc/0xf04
>> [6.425569]  do_filp_open+0x68/0xd8
>> [6.427044]  do_sys_open+0x16c/0x224
>> [6.428563]  SyS_openat+0x10/0x18
>> [6.429972]  el0_svc_naked+0x30/0x34
>> [6.431494] handlers:
>> [6.432479] [<0e9fb4bb>] serial8250_interrupt
>> [6.434597] Disabling IRQ #41
>>
>> This patch changes the lr state condition in lr_signals_eoi_mi() from
>> invalid(Inactive) to active and pending to avoid this.
>>
>> I am not sure about the original design of the condition of
>> invalid(active). So, This RFC is sent out for comments.
>>
>> Cc: Joey Zheng 
>> Signed-off-by: Shunyong Yang 
>> ---
>>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> 

Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Marc Zyngier
On 08/03/18 09:31, Yang, Shunyong wrote:
> Hi, Eric,
> 
> First, please let me change Christoffer's email to cd...@kernel.org. I
> add more information about my test below, please check.
> 
> On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote:
>> Hi,
>>
>> On 08/03/18 08:01, Shunyong Yang wrote:
>>>
>>> When resampling irqfds is enabled, level interrupt should be
>>> de-asserted when resampling happens. On page 4-47 of GIC v3
>>> specification IHI0069D, it said,
>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
>>> interface, the IRI changes the status of the interrupt to active
>>> and pending if:
>>> • It is an edge-triggered interrupt, and another edge has been
>>> detected since the interrupt was acknowledged.
>>> • It is a level-sensitive interrupt, and the level has not been
>>> deasserted since the interrupt was acknowledged."
>>>
>>> GIC v2 specification IHI0048B.b has similar description on page
>>> 3-42 for state machine transition.
>>>
>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
>>> in samples/vfio-mdev) triggers a level interrupt, the status
>>> transition in LR is pending-->active-->active and pending.
>>> Then it will wait resampling to de-assert the interrupt.
>>>
>>> Current design of lr_signals_eoi_mi() will return false if state
>>> in LR is not invalid(Inactive). It causes resampling will not
>>> happen
>>> in mtty case.
>>>
>>> This will cause interrupt fired continuously to guest even 8250 IIR
>>> has no interrupt. When 8250's interrupt is configured in shared
>>> mode,
>>> it will pass interrupt to other drivers to handle. However, there
>>> is no other driver involved. Then, a "nobody cared" kernel
>>> complaint
>>> occurs.
>>>
>>> / # cat /dev/ttyS0
>>> [4.826836] random: crng init done
>>> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
>>> option)
>>> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
>>> [6.378927] Hardware name: linux,dummy-virt (DT)
>>> [6.380876] Call trace:
>>> [6.381937]  dump_backtrace+0x0/0x180
>>> [6.383495]  show_stack+0x14/0x1c
>>> [6.384902]  dump_stack+0x90/0xb4
>>> [6.386312]  __report_bad_irq+0x38/0xe0
>>> [6.387944]  note_interrupt+0x1f4/0x2b8
>>> [6.389568]  handle_irq_event_percpu+0x54/0x7c
>>> [6.391433]  handle_irq_event+0x44/0x74
>>> [6.393056]  handle_fasteoi_irq+0x9c/0x154
>>> [6.394784]  generic_handle_irq+0x24/0x38
>>> [6.396483]  __handle_domain_irq+0x60/0xb4
>>> [6.398207]  gic_handle_irq+0x98/0x1b0
>>> [6.399796]  el1_irq+0xb0/0x128
>>> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
>>> [6.403149]  __setup_irq+0x41c/0x678
>>> [6.404669]  request_threaded_irq+0xe0/0x190
>>> [6.406474]  univ8250_setup_irq+0x208/0x234
>>> [6.408250]  serial8250_do_startup+0x1b4/0x754
>>> [6.410123]  serial8250_startup+0x20/0x28
>>> [6.411826]  uart_startup.part.21+0x78/0x144
>>> [6.413633]  uart_port_activate+0x50/0x68
>>> [6.415328]  tty_port_open+0x84/0xd4
>>> [6.416851]  uart_open+0x34/0x44
>>> [6.418229]  tty_open+0xec/0x3c8
>>> [6.419610]  chrdev_open+0xb0/0x198
>>> [6.421093]  do_dentry_open+0x200/0x310
>>> [6.422714]  vfs_open+0x54/0x84
>>> [6.424054]  path_openat+0x2dc/0xf04
>>> [6.425569]  do_filp_open+0x68/0xd8
>>> [6.427044]  do_sys_open+0x16c/0x224
>>> [6.428563]  SyS_openat+0x10/0x18
>>> [6.429972]  el0_svc_naked+0x30/0x34
>>> [6.431494] handlers:
>>> [6.432479] [<0e9fb4bb>] serial8250_interrupt
>>> [6.434597] Disabling IRQ #41
>>>
>>> This patch changes the lr state condition in lr_signals_eoi_mi()
>>> from
>>> invalid(Inactive) to active and pending to avoid this.
>>>
>>> I am not sure about the original design of the condition of
>>> invalid(active). So, This RFC is sent out for comments.
>>>
>>> Cc: Joey Zheng 
>>> Signed-off-by: Shunyong Yang 
>>> ---
>>>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>>>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-
>>> v2.c
>>> index e9d840a75e7b..740ee9a5f551 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>>>  
>>>  static bool lr_signals_eoi_mi(u32 lr_val)
>>>  {
>>> -   return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI)
>>> &&
>>> -      !(lr_val & GICH_LR_HW);
>>> +   return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
>>> +      (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
>>>  }
>>>  
>>>  /*
>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-
>>> v3.c
>>> index 6b329414e57a..43111bba7af9 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
>>>  

Re: [PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

2018-03-08 Thread Borislav Petkov
On Wed, Mar 07, 2018 at 06:15:02PM +, James Morse wrote:
> Today its just x86 and arm64. arm64 doesn't have a hook to do this. I'm happy 
> to
> add an empty declaration or leave it under an ifdef until someone complains
> about any behaviour I missed!

So I did some more staring at the code and I think oops_begin() is
needed mainly, as you point out, to prevent two oops messages from
interleaving. And yap, the other stuff with printk() is not true anymore
because the commit which added oops_begin():

  81e88fdc432a ("ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI 
notification type support")

still saw an NMI-unsafe printk. Which is long taken care of now.

So only the interleaving issue remains.

Which begs the question: how are you guys preventing the interleaving on
arm64? Because arch/arm64/kernel/traps.c:200 grabs the die_lock too, so
interleaving can happen on arm64 too, AFAICT.

And by that logic, you should technically grab that lock here too in
_in_nmi_notify_one().

Or?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid

2018-03-08 Thread Marc Zyngier
On 07/03/18 23:34, Christoffer Dall wrote:
> On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier  wrote:
>> The vgic code is trying to be clever when injecting GICv2 SGIs,
>> and will happily populate LRs with the same interrupt number if
>> they come from multiple vcpus (after all, they are distinct
>> interrupt sources).
>>
>> Unfortunately, this is against the letter of the architecture,
>> and the GICv2 architecture spec says "Each valid interrupt stored
>> in the List registers must have a unique VirtualID for that
>> virtual CPU interface.". GICv3 has similar (although slightly
>> ambiguous) restrictions.
>>
>> This results in guests locking up when using GICv2-on-GICv3, for
>> example. The obvious fix is to stop trying so hard, and inject
>> a single vcpu per SGI per guest entry. After all, pending SGIs
>> with multiple source vcpus are pretty rare, and are mostly seen
>> in scenario where the physical CPUs are severely overcomitted.
>>
>> Cc: sta...@vger.kernel.org
>> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush 
>> framework")
>> Signed-off-by: Marc Zyngier 
>> ---
>>  virt/kvm/arm/vgic/vgic.c | 11 +--
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index c7c5ef190afa..1f7ff175f47b 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
>> list_for_each_entry(irq, _cpu->ap_list_head, ap_list) {
>> spin_lock(>irq_lock);
>>
>> -   if (unlikely(vgic_target_oracle(irq) != vcpu))
>> -   goto next;
>> -
>> -   /*
>> -* If we get an SGI with multiple sources, try to get
>> -* them in all at once.
>> -*/
>> -   do {
>> +   if (likely(vgic_target_oracle(irq) == vcpu))
>> vgic_populate_lr(vcpu, irq, count++);
> 
> I think we need to change vgic_populate_lr to set the EOI maintenance
> interrupt flag so that when the interrupt is deactivated, if there are
> additional pending sources, we exit the guest and pick up the
> interrupt.

Potentially. We need to be careful about about the semantics of EOI MI
with non-level interrupts (see the other thread about EOI signalling).

> An alternative would be to set the underflow interrupt, but I don't
> think that would be correct for multiple priorities, because the SGI
> could have a higher priority than other pending interrupts we put in
> the LR.

I don't think priorities are the issue (after all, we already sort the
irqs in order of priority). My worry is that underflow is allowed to
fire if there is one interrupt pending, which implies that you could
end-up in a livelock scenario if you only have one SGI pending with
multiple sources.

Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
GICv2), which delivers a a MI if no pending interrupts are present. Once
the SGI has been activated, we're guaranteed to be able to inject a new
pending one.

I like the latter, because it doesn't overload the rest of the code with
new semantics. Thoughts?

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Marc Zyngier
[updated Christoffer's email address]

Hi Shunyong,

On 08/03/18 07:01, Shunyong Yang wrote:
> When resampling irqfds is enabled, level interrupt should be
> de-asserted when resampling happens. On page 4-47 of GIC v3
> specification IHI0069D, it said,
> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> interface, the IRI changes the status of the interrupt to active
> and pending if:
> • It is an edge-triggered interrupt, and another edge has been
> detected since the interrupt was acknowledged.
> • It is a level-sensitive interrupt, and the level has not been
> deasserted since the interrupt was acknowledged."
> 
> GIC v2 specification IHI0048B.b has similar description on page
> 3-42 for state machine transition.
> 
> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> in samples/vfio-mdev) triggers a level interrupt, the status
> transition in LR is pending-->active-->active and pending.
> Then it will wait resampling to de-assert the interrupt.
> 
> Current design of lr_signals_eoi_mi() will return false if state
> in LR is not invalid(Inactive). It causes resampling will not happen
> in mtty case.

Let me rephrase this, and tell me if I understood it correctly:

- A level interrupt is injected, activated by the guest (LR state=active)
- guest exits, re-enters, (LR state=pending+active)
- guest EOIs the interrupt (LR state=pending)
- maintenance interrupt
- we don't signal the resampling because we're not in an invalid state

Is that correct?

That's an interesting case, because it seems to invalidate some of the 
optimization that went in over a year ago.

096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state
af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance 
operation

We could compare the value of the LR before the guest entry with
the value at exit time, but we still could miss it if we have a
transition such as P+A -> P -> A and assume a long enough propagation
delay for the maintenance interrupt (which is very likely).

In essence, we have lost the benefit of EISR, which was to give us a
way to deal with asynchronous signalling.

> 
> This will cause interrupt fired continuously to guest even 8250 IIR
> has no interrupt. When 8250's interrupt is configured in shared mode,
> it will pass interrupt to other drivers to handle. However, there
> is no other driver involved. Then, a "nobody cared" kernel complaint
> occurs.
> 
> / # cat /dev/ttyS0
> [4.826836] random: crng init done
> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> option)
> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> [6.378927] Hardware name: linux,dummy-virt (DT)
> [6.380876] Call trace:
> [6.381937]  dump_backtrace+0x0/0x180
> [6.383495]  show_stack+0x14/0x1c
> [6.384902]  dump_stack+0x90/0xb4
> [6.386312]  __report_bad_irq+0x38/0xe0
> [6.387944]  note_interrupt+0x1f4/0x2b8
> [6.389568]  handle_irq_event_percpu+0x54/0x7c
> [6.391433]  handle_irq_event+0x44/0x74
> [6.393056]  handle_fasteoi_irq+0x9c/0x154
> [6.394784]  generic_handle_irq+0x24/0x38
> [6.396483]  __handle_domain_irq+0x60/0xb4
> [6.398207]  gic_handle_irq+0x98/0x1b0
> [6.399796]  el1_irq+0xb0/0x128
> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> [6.403149]  __setup_irq+0x41c/0x678
> [6.404669]  request_threaded_irq+0xe0/0x190
> [6.406474]  univ8250_setup_irq+0x208/0x234
> [6.408250]  serial8250_do_startup+0x1b4/0x754
> [6.410123]  serial8250_startup+0x20/0x28
> [6.411826]  uart_startup.part.21+0x78/0x144
> [6.413633]  uart_port_activate+0x50/0x68
> [6.415328]  tty_port_open+0x84/0xd4
> [6.416851]  uart_open+0x34/0x44
> [6.418229]  tty_open+0xec/0x3c8
> [6.419610]  chrdev_open+0xb0/0x198
> [6.421093]  do_dentry_open+0x200/0x310
> [6.422714]  vfs_open+0x54/0x84
> [6.424054]  path_openat+0x2dc/0xf04
> [6.425569]  do_filp_open+0x68/0xd8
> [6.427044]  do_sys_open+0x16c/0x224
> [6.428563]  SyS_openat+0x10/0x18
> [6.429972]  el0_svc_naked+0x30/0x34
> [6.431494] handlers:
> [6.432479] [<0e9fb4bb>] serial8250_interrupt
> [6.434597] Disabling IRQ #41
> 
> This patch changes the lr state condition in lr_signals_eoi_mi() from
> invalid(Inactive) to active and pending to avoid this.
> 
> I am not sure about the original design of the condition of
> invalid(active). So, This RFC is sent out for comments.
> 
> Cc: Joey Zheng 
> Signed-off-by: Shunyong Yang 
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e9d840a75e7b..740ee9a5f551 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ 

Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-08 Thread Auger Eric
Hi,

On 08/03/18 08:01, Shunyong Yang wrote:
> When resampling irqfds is enabled, level interrupt should be
> de-asserted when resampling happens. On page 4-47 of GIC v3
> specification IHI0069D, it said,
> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> interface, the IRI changes the status of the interrupt to active
> and pending if:
> • It is an edge-triggered interrupt, and another edge has been
> detected since the interrupt was acknowledged.
> • It is a level-sensitive interrupt, and the level has not been
> deasserted since the interrupt was acknowledged."
> 
> GIC v2 specification IHI0048B.b has similar description on page
> 3-42 for state machine transition.
> 
> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> in samples/vfio-mdev) triggers a level interrupt, the status
> transition in LR is pending-->active-->active and pending.
> Then it will wait resampling to de-assert the interrupt.
> 
> Current design of lr_signals_eoi_mi() will return false if state
> in LR is not invalid(Inactive). It causes resampling will not happen
> in mtty case.
> 
> This will cause interrupt fired continuously to guest even 8250 IIR
> has no interrupt. When 8250's interrupt is configured in shared mode,
> it will pass interrupt to other drivers to handle. However, there
> is no other driver involved. Then, a "nobody cared" kernel complaint
> occurs.
> 
> / # cat /dev/ttyS0
> [4.826836] random: crng init done
> [6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> option)
> [6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> [6.378927] Hardware name: linux,dummy-virt (DT)
> [6.380876] Call trace:
> [6.381937]  dump_backtrace+0x0/0x180
> [6.383495]  show_stack+0x14/0x1c
> [6.384902]  dump_stack+0x90/0xb4
> [6.386312]  __report_bad_irq+0x38/0xe0
> [6.387944]  note_interrupt+0x1f4/0x2b8
> [6.389568]  handle_irq_event_percpu+0x54/0x7c
> [6.391433]  handle_irq_event+0x44/0x74
> [6.393056]  handle_fasteoi_irq+0x9c/0x154
> [6.394784]  generic_handle_irq+0x24/0x38
> [6.396483]  __handle_domain_irq+0x60/0xb4
> [6.398207]  gic_handle_irq+0x98/0x1b0
> [6.399796]  el1_irq+0xb0/0x128
> [6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> [6.403149]  __setup_irq+0x41c/0x678
> [6.404669]  request_threaded_irq+0xe0/0x190
> [6.406474]  univ8250_setup_irq+0x208/0x234
> [6.408250]  serial8250_do_startup+0x1b4/0x754
> [6.410123]  serial8250_startup+0x20/0x28
> [6.411826]  uart_startup.part.21+0x78/0x144
> [6.413633]  uart_port_activate+0x50/0x68
> [6.415328]  tty_port_open+0x84/0xd4
> [6.416851]  uart_open+0x34/0x44
> [6.418229]  tty_open+0xec/0x3c8
> [6.419610]  chrdev_open+0xb0/0x198
> [6.421093]  do_dentry_open+0x200/0x310
> [6.422714]  vfs_open+0x54/0x84
> [6.424054]  path_openat+0x2dc/0xf04
> [6.425569]  do_filp_open+0x68/0xd8
> [6.427044]  do_sys_open+0x16c/0x224
> [6.428563]  SyS_openat+0x10/0x18
> [6.429972]  el0_svc_naked+0x30/0x34
> [6.431494] handlers:
> [6.432479] [<0e9fb4bb>] serial8250_interrupt
> [6.434597] Disabling IRQ #41
> 
> This patch changes the lr state condition in lr_signals_eoi_mi() from
> invalid(Inactive) to active and pending to avoid this.
> 
> I am not sure about the original design of the condition of
> invalid(active). So, This RFC is sent out for comments.
> 
> Cc: Joey Zheng 
> Signed-off-by: Shunyong Yang 
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e9d840a75e7b..740ee9a5f551 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>  
>  static bool lr_signals_eoi_mi(u32 lr_val)
>  {
> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) &&
> -!(lr_val & GICH_LR_HW);
> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
> +(lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
>  }
>  
>  /*
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 6b329414e57a..43111bba7af9 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
>  
>  static bool lr_signals_eoi_mi(u64 lr_val)
>  {
> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) &&
> -!(lr_val & ICH_LR_HW);
> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&
> +(lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);


In general don't we have this state transition

inactive -> pending -> pending + active (1) -> active -> inactive.

In that case won't we lower the virt irq level when folding the LR on
Pending +