Re: KVM Arm64 and Linux-RT issues

2019-08-13 Thread Julien Grall

Hi Sebastian,

On 8/13/19 1:58 PM, bige...@linutronix.de wrote:

On 2019-07-27 14:37:11 [+0100], Julien Grall wrote:

8<
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -80,7 +80,7 @@ static inline bool userspace_irqchip(str
   static void soft_timer_start(struct hrtimer *hrt, u64 ns)
   {
hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
- HRTIMER_MODE_ABS);
+ HRTIMER_MODE_ABS_HARD);
   }


That's pretty neat, and matches the patch you already have for
x86. Feel free to add my

Acked-by: Marc Zyngier 


I can confirm the warning now disappeared. Feel free to added my tested-by:

Tested-by: Julien Grall 



|kvm_hrtimer_expire()
| kvm_timer_update_irq()
|   kvm_vgic_inject_irq()
| vgic_lazy_init()
|if (unlikely(!vgic_initialized(kvm))) {
| if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2)
| return -EBUSY;
|
| mutex_lock(>lock);

Is this possible path of any concern? This should throw a warning also
for !RT so probably not…


Hmmm, theoretically yes. In practice, it looks like the hrtimer will not 
be started before kvm_vcpu_first_run_init() is called on the first run.


The function will call kvm_vgic_map_resources() which will initialize 
the vgic if not already done.


Looking around, I think this is here to cater the case where 
KVM_IRQ_LINE is called before running.


I am not yet familiar with the vgic, so I may have missed something.



I prepared the patch below. This one could go straight to tglx's timer tree
since he has the _HARD bits there. I *think* it requires to set the bits
_HARD during _init() and _start() otherwise there is (or was) a warning…

Sebastian
8<

From: Thomas Gleixner 
Date: Tue, 13 Aug 2019 14:29:41 +0200
Subject: [PATCH] KVM: arm/arm64: Let the timer expire in hardirq context on RT

The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.

Let the timer expire on hard-IRQ context even on -RT.

Signed-off-by: Thomas Gleixner 
Acked-by: Marc Zyngier 
Tested-by: Julien Grall 
Signed-off-by: Sebastian Andrzej Siewior 
---
  virt/kvm/arm/arch_timer.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 1be486d5d7cb4..0bfa7c5b5c890 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -80,7 +80,7 @@ static inline bool userspace_irqchip(struct kvm *kvm)
  static void soft_timer_start(struct hrtimer *hrt, u64 ns)
  {
hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
- HRTIMER_MODE_ABS);
+ HRTIMER_MODE_ABS_HARD);
  }
  
  static void soft_timer_cancel(struct hrtimer *hrt)

@@ -697,11 +697,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
ptimer->cntvoff = 0;
  
-	hrtimer_init(>bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);

+   hrtimer_init(>bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
timer->bg_timer.function = kvm_bg_timer_expire;
  
-	hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);

-   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
vtimer->hrtimer.function = kvm_hrtimer_expire;
ptimer->hrtimer.function = kvm_hrtimer_expire;
  



--
Julien Grall
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: KVM Arm64 and Linux-RT issues

2019-08-13 Thread Marc Zyngier
On Tue, 13 Aug 2019 16:44:21 +0100,
Julien Grall  wrote:
> 
> Hi Sebastian,
> 
> On 8/13/19 1:58 PM, bige...@linutronix.de wrote:
> > On 2019-07-27 14:37:11 [+0100], Julien Grall wrote:
>  8<
>  --- a/virt/kvm/arm/arch_timer.c
>  +++ b/virt/kvm/arm/arch_timer.c
>  @@ -80,7 +80,7 @@ static inline bool userspace_irqchip(str
> static void soft_timer_start(struct hrtimer *hrt, u64 ns)
> {
>   hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
>  -  HRTIMER_MODE_ABS);
>  +  HRTIMER_MODE_ABS_HARD);
> }
> >>> 
> >>> That's pretty neat, and matches the patch you already have for
> >>> x86. Feel free to add my
> >>> 
> >>> Acked-by: Marc Zyngier 
> >> 
> >> I can confirm the warning now disappeared. Feel free to added my tested-by:
> >> 
> >> Tested-by: Julien Grall 
> >> 
> > 
> > |kvm_hrtimer_expire()
> > | kvm_timer_update_irq()
> > |   kvm_vgic_inject_irq()
> > | vgic_lazy_init()
> > |if (unlikely(!vgic_initialized(kvm))) {
> > | if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2)
> > | return -EBUSY;
> > |
> > | mutex_lock(>lock);
> > 
> > Is this possible path of any concern? This should throw a warning also
> > for !RT so probably not…
> 
> Hmmm, theoretically yes. In practice, it looks like the hrtimer will
> not be started before kvm_vcpu_first_run_init() is called on the first
> run.

Exactly. Even if you restore the timer in a "firing" configuration,
you'll have to execute the vgic init before any background timer gets
programmed, let alone expired.

Yes, the interface is terrible.

> The function will call kvm_vgic_map_resources() which will initialize
> the vgic if not already done.
> 
> Looking around, I think this is here to cater the case where
> KVM_IRQ_LINE is called before running.
> 
> I am not yet familiar with the vgic, so I may have missed something.
> 
> > 
> > I prepared the patch below. This one could go straight to tglx's timer tree
> > since he has the _HARD bits there. I *think* it requires to set the bits
> > _HARD during _init() and _start() otherwise there is (or was) a warning…
> > 
> > Sebastian
> > 8<
> > 
> > From: Thomas Gleixner 
> > Date: Tue, 13 Aug 2019 14:29:41 +0200
> > Subject: [PATCH] KVM: arm/arm64: Let the timer expire in hardirq context on 
> > RT
> > 
> > The timers are canceled from an preempt-notifier which is invoked with
> > disabled preemption which is not allowed on PREEMPT_RT.
> > The timer callback is short so in could be invoked in hard-IRQ context
> > on -RT.
> > 
> > Let the timer expire on hard-IRQ context even on -RT.
> > 
> > Signed-off-by: Thomas Gleixner 
> > Acked-by: Marc Zyngier 
> > Tested-by: Julien Grall 
> > Signed-off-by: Sebastian Andrzej Siewior 
> > ---
> >   virt/kvm/arm/arch_timer.c | 8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 1be486d5d7cb4..0bfa7c5b5c890 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -80,7 +80,7 @@ static inline bool userspace_irqchip(struct kvm *kvm)
> >   static void soft_timer_start(struct hrtimer *hrt, u64 ns)
> >   {
> > hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
> > - HRTIMER_MODE_ABS);
> > + HRTIMER_MODE_ABS_HARD);
> >   }
> > static void soft_timer_cancel(struct hrtimer *hrt)
> > @@ -697,11 +697,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> > update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
> > ptimer->cntvoff = 0;
> >   - hrtimer_init(>bg_timer, CLOCK_MONOTONIC,
> > HRTIMER_MODE_ABS);
> > +   hrtimer_init(>bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> > timer->bg_timer.function = kvm_bg_timer_expire;
> >   - hrtimer_init(>hrtimer, CLOCK_MONOTONIC,
> > HRTIMER_MODE_ABS);
> > -   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> > +   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> > +   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> > vtimer->hrtimer.function = kvm_hrtimer_expire;
> > ptimer->hrtimer.function = kvm_hrtimer_expire;
> >   

Patch looks fine, please add it to the pile of RT stuff! ;-)

Thanks,

M.

-- 
Jazz is not dead, it just smells funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Can we boot a 512U kvm guest?

2019-08-13 Thread Auger Eric
Hi,

On 8/13/19 4:17 PM, Marc Zyngier wrote:
> On Tue, 13 Aug 2019 09:50:27 +0100,
> Zenghui Yu  wrote:
> 
> Hi Zenghui,
> 
>>
>> Hi folks,
>>
>> Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to
>> 512"), we seemed to be allowed to boot a 512U guest.  But I failed to
>> start it up with the latest QEMU.  I guess there are at least *two*
>> reasons (limitations).
>>
>> First I got a QEMU abort:
>>  "kvm_set_irq: Invalid argument"
>>
>> Enable the trace_kvm_irq_line() under debugfs, when it comed with
>> vcpu-256, I got:
>>  "Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0"
>> and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space...
>>
>> So the thing is that we only have 8 bits for vcpu_index field ([23:16])
>> in KVM_IRQ_LINE ioctl.  irq_type field will be corrupted if we inject a
>> PPI to vcpu-256, whose vcpu_index will take 9 bits.
> 
> Irk. Not great indeed. Clearly, we have a couple of holes in the way
> we test these ABI changes (/me eyes Eric...).

My bad. At that time I was able to test with 224 vcpus with QEMU (256
vcpus were tested as well) and I failed to see those other limitations,
thinking the only limitation left was the number of redistributor
regions we could register.
> 
>>
>> I temporarily patched the KVM and QEMU with the following diff:
>>
>> ---8<---
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>> b/arch/arm64/include/uapi/asm/kvm.h
>> index 95516a4..39a0fb1 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -325,10 +325,10 @@ struct kvm_vcpu_events {
>>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
>>
>>  /* KVM_IRQ_LINE irq field index values */
>> -#define KVM_ARM_IRQ_TYPE_SHIFT  24
>> -#define KVM_ARM_IRQ_TYPE_MASK   0xff
>> +#define KVM_ARM_IRQ_TYPE_SHIFT  28
>> +#define KVM_ARM_IRQ_TYPE_MASK   0xf
>>  #define KVM_ARM_IRQ_VCPU_SHIFT  16
>> -#define KVM_ARM_IRQ_VCPU_MASK   0xff
>> +#define KVM_ARM_IRQ_VCPU_MASK   0xfff
>>  #define KVM_ARM_IRQ_NUM_SHIFT   0
>>  #define KVM_ARM_IRQ_NUM_MASK0x
>>
>> ---8<---
>>
>> It makes things a bit better, it also immediately BREAKs the api with
>> old versions.
> 
> Yes, and we can't have that (specially if you consider that this API
> is shared between 32 and 64bit). One "get out of jail card" is to
> steal a few bits from the top of the word, and encode things there:
> 
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index 4602464ebdfb..86db092e4c2f 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -254,8 +254,10 @@ struct kvm_vcpu_events {
>  #define   KVM_DEV_ARM_ITS_CTRL_RESET 4
>  
>  /* KVM_IRQ_LINE irq field index values */
> +#define KVM_ARM_IRQ_VCPU2_SHIFT  28
> +#define KVM_ARM_IRQ_VCPU2_MASK   0xf
>  #define KVM_ARM_IRQ_TYPE_SHIFT   24
> -#define KVM_ARM_IRQ_TYPE_MASK0xff
> +#define KVM_ARM_IRQ_TYPE_MASK0xf
>  #define KVM_ARM_IRQ_VCPU_SHIFT   16
>  #define KVM_ARM_IRQ_VCPU_MASK0xff
>  #define KVM_ARM_IRQ_NUM_SHIFT0
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 7b7ac0f6cec9..44cb25bfc95e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -308,8 +308,10 @@ struct kvm_vcpu_events {
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER  1
>  
>  /* KVM_IRQ_LINE irq field index values */
> +#define KVM_ARM_IRQ_VCPU2_SHIFT  28
> +#define KVM_ARM_IRQ_VCPU2_MASK   0xf
>  #define KVM_ARM_IRQ_TYPE_SHIFT   24
> -#define KVM_ARM_IRQ_TYPE_MASK0xff
> +#define KVM_ARM_IRQ_TYPE_MASK0xf
>  #define KVM_ARM_IRQ_VCPU_SHIFT   16
>  #define KVM_ARM_IRQ_VCPU_MASK0xff
>  #define KVM_ARM_IRQ_NUM_SHIFT0
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 90cedebaeb94..fb685c1c0514 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -889,6 +889,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
> kvm_irq_level *irq_level,
>  
>   irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
>   vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
> + vcpu_idx += ((irq >> KVM_ARM_IRQ_VCPU2_SHIFT) & KVM_ARM_IRQ_VCPU2_MASK) 
> * (KVM_ARM_IRQ_VCPU_MASK + 1);
>   irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
>  
>   trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
> 
> It should work because we've been careful not to allow value outside
> of {0, 1, 2} for irq_type. I don't like it, but I really don't feel
> like adding another IRQ related ioctl. We still have 16 irq types
> (which is already a waste of space), and we can go up to 4096 vcpu.


[PATCH AUTOSEL 5.2 121/123] KVM: arm64: Don't write junk to sysregs on reset

2019-08-13 Thread Sasha Levin
From: Marc Zyngier 

[ Upstream commit 03fdfb2690099c19160a3f2c5b77db60b3afeded ]

At the moment, the way we reset system registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.

The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a system register while junk is in there, bad thing may happen.

Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something. This requires fixing a couple of
sysreg refinition in the trap table.

In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the sysregs leave outside of
the sys_regs[] array). It may well be axed in the near future.

Tested-by: Zenghui Yu 
Signed-off-by: Marc Zyngier 
Signed-off-by: Sasha Levin 
---
 arch/arm64/kvm/sys_regs.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ce933f2960496..5b7085ca213df 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -632,7 +632,7 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *r)
 */
val = ((pmcr & ~ARMV8_PMU_PMCR_MASK)
   | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
-   __vcpu_sys_reg(vcpu, PMCR_EL0) = val;
+   __vcpu_sys_reg(vcpu, r->reg) = val;
 }
 
 static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags)
@@ -981,13 +981,13 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, 
struct sys_reg_params *p,
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
{ SYS_DESC(SYS_DBGBVRn_EL1(n)), \
- trap_bvr, reset_bvr, n, 0, get_bvr, set_bvr },\
+ trap_bvr, reset_bvr, 0, 0, get_bvr, set_bvr },\
{ SYS_DESC(SYS_DBGBCRn_EL1(n)), \
- trap_bcr, reset_bcr, n, 0, get_bcr, set_bcr },\
+ trap_bcr, reset_bcr, 0, 0, get_bcr, set_bcr },\
{ SYS_DESC(SYS_DBGWVRn_EL1(n)), \
- trap_wvr, reset_wvr, n, 0,  get_wvr, set_wvr },   \
+ trap_wvr, reset_wvr, 0, 0,  get_wvr, set_wvr },   \
{ SYS_DESC(SYS_DBGWCRn_EL1(n)), \
- trap_wcr, reset_wcr, n, 0,  get_wcr, set_wcr }
+ trap_wcr, reset_wcr, 0, 0,  get_wcr, set_wcr }
 
 /* Macro to expand the PMEVCNTRn_EL0 register */
 #define PMU_PMEVCNTR_EL0(n)\
@@ -1540,7 +1540,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
{ SYS_DESC(SYS_CTR_EL0), access_ctr },
 
-   { SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, },
+   { SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, PMCR_EL0 },
{ SYS_DESC(SYS_PMCNTENSET_EL0), access_pmcnten, reset_unknown, 
PMCNTENSET_EL0 },
{ SYS_DESC(SYS_PMCNTENCLR_EL0), access_pmcnten, NULL, PMCNTENSET_EL0 },
{ SYS_DESC(SYS_PMOVSCLR_EL0), access_pmovs, NULL, PMOVSSET_EL0 },
@@ -2254,13 +2254,19 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 }
 
 static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
- const struct sys_reg_desc *table, size_t num)
+   const struct sys_reg_desc *table, size_t num,
+   unsigned long *bmap)
 {
unsigned long i;
 
for (i = 0; i < num; i++)
-   if (table[i].reset)
+   if (table[i].reset) {
+   int reg = table[i].reg;
+
table[i].reset(vcpu, [i]);
+   if (reg > 0 && reg < NR_SYS_REGS)
+   set_bit(reg, bmap);
+   }
 }
 
 /**
@@ -2774,18 +2780,16 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 {
size_t num;
const struct sys_reg_desc *table;
-
-   /* Catch someone adding a register without putting in reset entry. */
-   memset(>arch.ctxt.sys_regs, 0x42, 
sizeof(vcpu->arch.ctxt.sys_regs));
+   DECLARE_BITMAP(bmap, NR_SYS_REGS) = { 0, };
 
/* Generic chip reset first (so target could override). */
-   reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
+   reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs), 
bmap);
 
table = get_target_table(vcpu->arch.target, true, );
-   reset_sys_reg_descs(vcpu, table, num);
+   reset_sys_reg_descs(vcpu, table, num, bmap);
 
for (num = 1; num < NR_SYS_REGS; num++) {
-   if 

[PATCH AUTOSEL 5.2 122/123] KVM: arm: Don't write junk to CP15 registers on reset

2019-08-13 Thread Sasha Levin
From: Marc Zyngier 

[ Upstream commit c69509c70aa45a8c4954c88c629a64acf4ee4a36 ]

At the moment, the way we reset CP15 registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.

The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a CP15 register while junk is in there, bad thing may happen.

Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something.

In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the CP15 reg leave outside
of the cp15_regs[] array). It may well be axed in the near future.

Signed-off-by: Marc Zyngier 
Signed-off-by: Sasha Levin 
---
 arch/arm/kvm/coproc.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index d2806bcff8bbb..07745ee022a12 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -651,13 +651,22 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 }
 
 static void reset_coproc_regs(struct kvm_vcpu *vcpu,
- const struct coproc_reg *table, size_t num)
+ const struct coproc_reg *table, size_t num,
+ unsigned long *bmap)
 {
unsigned long i;
 
for (i = 0; i < num; i++)
-   if (table[i].reset)
+   if (table[i].reset) {
+   int reg = table[i].reg;
+
table[i].reset(vcpu, [i]);
+   if (reg > 0 && reg < NR_CP15_REGS) {
+   set_bit(reg, bmap);
+   if (table[i].is_64bit)
+   set_bit(reg + 1, bmap);
+   }
+   }
 }
 
 static struct coproc_params decode_32bit_hsr(struct kvm_vcpu *vcpu)
@@ -1432,17 +1441,15 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 {
size_t num;
const struct coproc_reg *table;
-
-   /* Catch someone adding a register without putting in reset entry. */
-   memset(vcpu->arch.ctxt.cp15, 0x42, sizeof(vcpu->arch.ctxt.cp15));
+   DECLARE_BITMAP(bmap, NR_CP15_REGS) = { 0, };
 
/* Generic chip reset first (so target could override). */
-   reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
+   reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs), bmap);
 
table = get_target_table(vcpu->arch.target, );
-   reset_coproc_regs(vcpu, table, num);
+   reset_coproc_regs(vcpu, table, num, bmap);
 
for (num = 1; num < NR_CP15_REGS; num++)
-   WARN(vcpu_cp15(vcpu, num) == 0x42424242,
+   WARN(!test_bit(num, bmap),
 "Didn't reset vcpu_cp15(vcpu, %zi)", num);
 }
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH AUTOSEL 4.19 66/68] KVM: arm64: Don't write junk to sysregs on reset

2019-08-13 Thread Sasha Levin
From: Marc Zyngier 

[ Upstream commit 03fdfb2690099c19160a3f2c5b77db60b3afeded ]

At the moment, the way we reset system registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.

The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a system register while junk is in there, bad thing may happen.

Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something. This requires fixing a couple of
sysreg refinition in the trap table.

In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the sysregs leave outside of
the sys_regs[] array). It may well be axed in the near future.

Tested-by: Zenghui Yu 
Signed-off-by: Marc Zyngier 
Signed-off-by: Sasha Levin 
---
 arch/arm64/kvm/sys_regs.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d112af75680bb..6da2bbdb9648f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -626,7 +626,7 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *r)
 */
val = ((pmcr & ~ARMV8_PMU_PMCR_MASK)
   | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
-   __vcpu_sys_reg(vcpu, PMCR_EL0) = val;
+   __vcpu_sys_reg(vcpu, r->reg) = val;
 }
 
 static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags)
@@ -968,13 +968,13 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, 
struct sys_reg_params *p,
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
{ SYS_DESC(SYS_DBGBVRn_EL1(n)), \
- trap_bvr, reset_bvr, n, 0, get_bvr, set_bvr },\
+ trap_bvr, reset_bvr, 0, 0, get_bvr, set_bvr },\
{ SYS_DESC(SYS_DBGBCRn_EL1(n)), \
- trap_bcr, reset_bcr, n, 0, get_bcr, set_bcr },\
+ trap_bcr, reset_bcr, 0, 0, get_bcr, set_bcr },\
{ SYS_DESC(SYS_DBGWVRn_EL1(n)), \
- trap_wvr, reset_wvr, n, 0,  get_wvr, set_wvr },   \
+ trap_wvr, reset_wvr, 0, 0,  get_wvr, set_wvr },   \
{ SYS_DESC(SYS_DBGWCRn_EL1(n)), \
- trap_wcr, reset_wcr, n, 0,  get_wcr, set_wcr }
+ trap_wcr, reset_wcr, 0, 0,  get_wcr, set_wcr }
 
 /* Macro to expand the PMEVCNTRn_EL0 register */
 #define PMU_PMEVCNTR_EL0(n)\
@@ -1359,7 +1359,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
{ SYS_DESC(SYS_CSSELR_EL1), NULL, reset_unknown, CSSELR_EL1 },
 
-   { SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, },
+   { SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, PMCR_EL0 },
{ SYS_DESC(SYS_PMCNTENSET_EL0), access_pmcnten, reset_unknown, 
PMCNTENSET_EL0 },
{ SYS_DESC(SYS_PMCNTENCLR_EL0), access_pmcnten, NULL, PMCNTENSET_EL0 },
{ SYS_DESC(SYS_PMOVSCLR_EL0), access_pmovs, NULL, PMOVSSET_EL0 },
@@ -2072,13 +2072,19 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 }
 
 static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
- const struct sys_reg_desc *table, size_t num)
+   const struct sys_reg_desc *table, size_t num,
+   unsigned long *bmap)
 {
unsigned long i;
 
for (i = 0; i < num; i++)
-   if (table[i].reset)
+   if (table[i].reset) {
+   int reg = table[i].reg;
+
table[i].reset(vcpu, [i]);
+   if (reg > 0 && reg < NR_SYS_REGS)
+   set_bit(reg, bmap);
+   }
 }
 
 /**
@@ -2576,18 +2582,16 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 {
size_t num;
const struct sys_reg_desc *table;
-
-   /* Catch someone adding a register without putting in reset entry. */
-   memset(>arch.ctxt.sys_regs, 0x42, 
sizeof(vcpu->arch.ctxt.sys_regs));
+   DECLARE_BITMAP(bmap, NR_SYS_REGS) = { 0, };
 
/* Generic chip reset first (so target could override). */
-   reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
+   reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs), 
bmap);
 
table = get_target_table(vcpu->arch.target, true, );
-   reset_sys_reg_descs(vcpu, table, num);
+   reset_sys_reg_descs(vcpu, table, num, bmap);
 
for (num = 1; num < NR_SYS_REGS; num++) {
-   if (WARN(__vcpu_sys_reg(vcpu, num) == 0x4242424242424242,
+  

[PATCH AUTOSEL 4.19 67/68] KVM: arm: Don't write junk to CP15 registers on reset

2019-08-13 Thread Sasha Levin
From: Marc Zyngier 

[ Upstream commit c69509c70aa45a8c4954c88c629a64acf4ee4a36 ]

At the moment, the way we reset CP15 registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.

The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a CP15 register while junk is in there, bad thing may happen.

Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something.

In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the CP15 reg leave outside
of the cp15_regs[] array). It may well be axed in the near future.

Signed-off-by: Marc Zyngier 
Signed-off-by: Sasha Levin 
---
 arch/arm/kvm/coproc.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index fd6cde23bb5d0..871fa50a09f19 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -658,13 +658,22 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 }
 
 static void reset_coproc_regs(struct kvm_vcpu *vcpu,
- const struct coproc_reg *table, size_t num)
+ const struct coproc_reg *table, size_t num,
+ unsigned long *bmap)
 {
unsigned long i;
 
for (i = 0; i < num; i++)
-   if (table[i].reset)
+   if (table[i].reset) {
+   int reg = table[i].reg;
+
table[i].reset(vcpu, [i]);
+   if (reg > 0 && reg < NR_CP15_REGS) {
+   set_bit(reg, bmap);
+   if (table[i].is_64bit)
+   set_bit(reg + 1, bmap);
+   }
+   }
 }
 
 static struct coproc_params decode_32bit_hsr(struct kvm_vcpu *vcpu)
@@ -1439,17 +1448,15 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 {
size_t num;
const struct coproc_reg *table;
-
-   /* Catch someone adding a register without putting in reset entry. */
-   memset(vcpu->arch.ctxt.cp15, 0x42, sizeof(vcpu->arch.ctxt.cp15));
+   DECLARE_BITMAP(bmap, NR_CP15_REGS) = { 0, };
 
/* Generic chip reset first (so target could override). */
-   reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
+   reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs), bmap);
 
table = get_target_table(vcpu->arch.target, );
-   reset_coproc_regs(vcpu, table, num);
+   reset_coproc_regs(vcpu, table, num, bmap);
 
for (num = 1; num < NR_CP15_REGS; num++)
-   WARN(vcpu_cp15(vcpu, num) == 0x42424242,
+   WARN(!test_bit(num, bmap),
 "Didn't reset vcpu_cp15(vcpu, %zi)", num);
 }
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 9/9] arm64: Retrieve stolen time as paravirtualized guest

2019-08-13 Thread Zenghui Yu

On 2019/8/12 18:39, Steven Price wrote:

On 09/08/2019 14:51, Zenghui Yu wrote:
[...]

Hi Steven,

Since userspace is not involved yet (right?), no one will create the
PV_TIME device for guest (and no one will specify the IPA of the shared
stolen time region), and I guess we will get a "not supported" error
here.

So what should we do if we want to test this series now?  Any userspace
tools?  If no, do you have any plans for userspace developing? ;-)


At the moment I have the following patch to kvmtool which creates the
PV_TIME device - this isn't in a state to go upstream, and Marc has
asked that I rework the memory allocation, so this will need to change.

It's a little ugly as it simply reserves the first page of RAM to use
for the PV time structures.


Thanks for sharing the code. It's good enough to show what is required
in user-space.

(I'm not familiar with kvmtool. I will first take some time to move the
steal time part to Qemu and see what will happen.)


Thanks,
zenghui


8<
diff --git a/Makefile b/Makefile
index 3862112..a79956b 100644
--- a/Makefile
+++ b/Makefile
@@ -158,7 +158,7 @@ endif
  # ARM
  OBJS_ARM_COMMON   := arm/fdt.o arm/gic.o arm/gicv2m.o 
arm/ioport.o \
   arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o \
-  arm/pmu.o
+  arm/pmu.o arm/pvtime.o
  HDRS_ARM_COMMON   := arm/include
  ifeq ($(ARCH), arm)
DEFINES += -DCONFIG_ARM
diff --git a/arm/fdt.c b/arm/fdt.c
index c80e6da..19eccbc 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -119,6 +119,7 @@ static int setup_fdt(struct kvm *kvm)
  
  	/* Create new tree without a reserve map */

_FDT(fdt_create(fdt, FDT_MAX_SIZE));
+   _FDT(fdt_add_reservemap_entry(fdt, kvm->arch.memory_guest_start, 4096));
_FDT(fdt_finish_reservemap(fdt));
  
  	/* Header */

diff --git a/arm/kvm.c b/arm/kvm.c
index 1f85fc6..8bbfef1 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -11,6 +11,8 @@
  #include 
  #include 
  
+int pvtime_create(struct kvm *kvm);

+
  struct kvm_ext kvm_req_ext[] = {
{ DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
{ DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
@@ -86,6 +88,10 @@ void kvm__arch_init(struct kvm *kvm, const char 
*hugetlbfs_path, u64 ram_size)
/* Create the virtual GIC. */
if (gic__create(kvm, kvm->cfg.arch.irqchip))
die("Failed to create virtual GIC");
+
+   /* Setup PV time */
+   if (pvtime_create(kvm))
+   die("Failed to initialise PV time");
  }
  
  #define FDT_ALIGN	SZ_2M

diff --git a/arm/pvtime.c b/arm/pvtime.c
new file mode 100644
index 000..abcaab3
--- /dev/null
+++ b/arm/pvtime.c
@@ -0,0 +1,77 @@
+#include "kvm/kvm.h"
+
+#define KVM_DEV_TYPE_ARM_PV_TIME (KVM_DEV_TYPE_ARM_VGIC_ITS+2)
+
+/* Device Control API: PV_TIME */
+#define KVM_DEV_ARM_PV_TIME_PADDR  0
+#define KVM_DEV_ARM_PV_TIME_FREQUENCY  3
+
+#define KVM_DEV_ARM_PV_TIME_ST 0
+#define KVM_DEV_ARM_PV_TIME_LPT1
+
+static int pvtime_fd;
+
+int pvtime_create(struct kvm *kvm);
+
+int pvtime_create(struct kvm *kvm)
+{
+   int err;
+   u64 lpt_paddr = 0x1000;
+   u64 st_paddr = lpt_paddr + 4096;
+   u32 frequency = 100 * 1000 * 1000;
+
+   printf("lpt_paddr=%llx\n", lpt_paddr);
+
+   struct kvm_create_device pvtime_device = {
+   .type = KVM_DEV_TYPE_ARM_PV_TIME,
+   .flags = 0,
+   };
+
+   err = ioctl(kvm->vm_fd, KVM_CREATE_DEVICE, _device);
+   if (err) {
+   printf("Failed to create PV device\n");
+   return 0;
+   }
+
+   pvtime_fd = pvtime_device.fd;
+
+   struct kvm_device_attr lpt_base = {
+   .group = KVM_DEV_ARM_PV_TIME_PADDR,
+   .attr = KVM_DEV_ARM_PV_TIME_LPT,
+   .addr = (u64)(unsigned long)_paddr
+   };
+   struct kvm_device_attr st_base = {
+   .group = KVM_DEV_ARM_PV_TIME_PADDR,
+   .attr = KVM_DEV_ARM_PV_TIME_ST,
+   .addr = (u64)(unsigned long)_paddr
+   };
+
+   struct kvm_device_attr lpt_freq = {
+   .group = KVM_DEV_ARM_PV_TIME_FREQUENCY,
+   .attr = KVM_DEV_ARM_PV_TIME_LPT,
+   .addr = (u64)(unsigned long)
+   };
+
+   err = ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, _base);
+   if (err) {
+   perror("ioctl lpt_base failed");
+   printf("Ignoring LPT...\n");
+   }
+   err = ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, _base);
+   if (err) {
+   perror("ioctl st_base failed");
+   goto out_err;
+   }
+   err = ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, _freq);
+   if (err) {
+   perror("ioctl lpt_freq failed");
+   printf("Ignoring LPT...\n");
+   }
+
+   printf("PV time setup\n");
+
+   return 0;
+out_err:
+   close(pvtime_fd);
+   return err;
+}


___
kvmarm 

Can we boot a 512U kvm guest?

2019-08-13 Thread Zenghui Yu

Hi folks,

Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to
512"), we seemed to be allowed to boot a 512U guest.  But I failed to
start it up with the latest QEMU.  I guess there are at least *two*
reasons (limitations).

First I got a QEMU abort:
"kvm_set_irq: Invalid argument"

Enable the trace_kvm_irq_line() under debugfs, when it comed with
vcpu-256, I got:
"Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0"
and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space...

So the thing is that we only have 8 bits for vcpu_index field ([23:16])
in KVM_IRQ_LINE ioctl.  irq_type field will be corrupted if we inject a
PPI to vcpu-256, whose vcpu_index will take 9 bits.

I temporarily patched the KVM and QEMU with the following diff:

---8<---
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h

index 95516a4..39a0fb1 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -325,10 +325,10 @@ struct kvm_vcpu_events {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1

 /* KVM_IRQ_LINE irq field index values */
-#define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK  0xff
+#define KVM_ARM_IRQ_TYPE_SHIFT 28
+#define KVM_ARM_IRQ_TYPE_MASK  0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT 16
-#define KVM_ARM_IRQ_VCPU_MASK  0xff
+#define KVM_ARM_IRQ_VCPU_MASK  0xfff
 #define KVM_ARM_IRQ_NUM_SHIFT  0
 #define KVM_ARM_IRQ_NUM_MASK   0x

---8<---

It makes things a bit better, it also immediately BREAKs the api with
old versions.


Next comes one more QEMU abort (with the "fix" above):
"Failed to set device address: No space left on device"

We register two io devices (rd_dev and sgi_dev) on KVM_MMIO_BUS for
each redistributor. 512 vcpus take 1024 io devices, which is beyond the
maximum limitation of the current kernel - NR_IOBUS_DEVS (1000).
So we get a ENOSPC error here.


I don't know if the similar problems have been discussed before in ML.
Is it time to really support the 512U guest?


Thanks,
zenghui

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Can we boot a 512U kvm guest?

2019-08-13 Thread Marc Zyngier
On Tue, 13 Aug 2019 09:50:27 +0100,
Zenghui Yu  wrote:

Hi Zenghui,

> 
> Hi folks,
> 
> Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to
> 512"), we seemed to be allowed to boot a 512U guest.  But I failed to
> start it up with the latest QEMU.  I guess there are at least *two*
> reasons (limitations).
> 
> First I got a QEMU abort:
>   "kvm_set_irq: Invalid argument"
> 
> Enable the trace_kvm_irq_line() under debugfs, when it comed with
> vcpu-256, I got:
>   "Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0"
> and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space...
> 
> So the thing is that we only have 8 bits for vcpu_index field ([23:16])
> in KVM_IRQ_LINE ioctl.  irq_type field will be corrupted if we inject a
> PPI to vcpu-256, whose vcpu_index will take 9 bits.

Irk. Not great indeed. Clearly, we have a couple of holes in the way
we test these ABI changes (/me eyes Eric...).

> 
> I temporarily patched the KVM and QEMU with the following diff:
> 
> ---8<---
> diff --git a/arch/arm64/include/uapi/asm/kvm.h
> b/arch/arm64/include/uapi/asm/kvm.h
> index 95516a4..39a0fb1 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -325,10 +325,10 @@ struct kvm_vcpu_events {
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER  1
> 
>  /* KVM_IRQ_LINE irq field index values */
> -#define KVM_ARM_IRQ_TYPE_SHIFT   24
> -#define KVM_ARM_IRQ_TYPE_MASK0xff
> +#define KVM_ARM_IRQ_TYPE_SHIFT   28
> +#define KVM_ARM_IRQ_TYPE_MASK0xf
>  #define KVM_ARM_IRQ_VCPU_SHIFT   16
> -#define KVM_ARM_IRQ_VCPU_MASK0xff
> +#define KVM_ARM_IRQ_VCPU_MASK0xfff
>  #define KVM_ARM_IRQ_NUM_SHIFT0
>  #define KVM_ARM_IRQ_NUM_MASK 0x
> 
> ---8<---
> 
> It makes things a bit better, it also immediately BREAKs the api with
> old versions.

Yes, and we can't have that (specially if you consider that this API
is shared between 32 and 64bit). One "get out of jail card" is to
steal a few bits from the top of the word, and encode things there:

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 4602464ebdfb..86db092e4c2f 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -254,8 +254,10 @@ struct kvm_vcpu_events {
 #define   KVM_DEV_ARM_ITS_CTRL_RESET   4
 
 /* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT28
+#define KVM_ARM_IRQ_VCPU2_MASK 0xf
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK  0xff
+#define KVM_ARM_IRQ_TYPE_MASK  0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT 16
 #define KVM_ARM_IRQ_VCPU_MASK  0xff
 #define KVM_ARM_IRQ_NUM_SHIFT  0
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 7b7ac0f6cec9..44cb25bfc95e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -308,8 +308,10 @@ struct kvm_vcpu_events {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
 
 /* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT28
+#define KVM_ARM_IRQ_VCPU2_MASK 0xf
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK  0xff
+#define KVM_ARM_IRQ_TYPE_MASK  0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT 16
 #define KVM_ARM_IRQ_VCPU_MASK  0xff
 #define KVM_ARM_IRQ_NUM_SHIFT  0
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 90cedebaeb94..fb685c1c0514 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -889,6 +889,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
kvm_irq_level *irq_level,
 
irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
+   vcpu_idx += ((irq >> KVM_ARM_IRQ_VCPU2_SHIFT) & KVM_ARM_IRQ_VCPU2_MASK) 
* (KVM_ARM_IRQ_VCPU_MASK + 1);
irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
 
trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);

It should work because we've been careful not to allow value outside
of {0, 1, 2} for irq_type. I don't like it, but I really don't feel
like adding another IRQ related ioctl. We still have 16 irq types
(which is already a waste of space), and we can go up to 4096 vcpu.

Peter, what do you think?

> Next comes one more QEMU abort (with the "fix" above):
>   "Failed to set device address: No space left on device"
> 
> We register two io devices (rd_dev and sgi_dev) on KVM_MMIO_BUS for
> each redistributor. 512 vcpus take 1024 io devices, which is beyond the
> maximum limitation of the current kernel - NR_IOBUS_DEVS (1000).
> So we get a ENOSPC error here.

I can reproduce that issue here ("499 vcpus on my Chromebook,
baby"). Not an ABI problem though, and we can bump 

Re: KVM Arm64 and Linux-RT issues

2019-08-13 Thread bige...@linutronix.de
On 2019-07-27 14:37:11 [+0100], Julien Grall wrote:
> > > 8<
> > > --- a/virt/kvm/arm/arch_timer.c
> > > +++ b/virt/kvm/arm/arch_timer.c
> > > @@ -80,7 +80,7 @@ static inline bool userspace_irqchip(str
> > >   static void soft_timer_start(struct hrtimer *hrt, u64 ns)
> > >   {
> > >   hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
> > > -   HRTIMER_MODE_ABS);
> > > +   HRTIMER_MODE_ABS_HARD);
> > >   }
> > 
> > That's pretty neat, and matches the patch you already have for
> > x86. Feel free to add my
> > 
> > Acked-by: Marc Zyngier 
> 
> I can confirm the warning now disappeared. Feel free to added my tested-by:
> 
> Tested-by: Julien Grall 
> 

|kvm_hrtimer_expire()
| kvm_timer_update_irq()
|   kvm_vgic_inject_irq()
| vgic_lazy_init()
|if (unlikely(!vgic_initialized(kvm))) {
| if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2)
| return -EBUSY;
| 
| mutex_lock(>lock);

Is this possible path of any concern? This should throw a warning also
for !RT so probably not…

I prepared the patch below. This one could go straight to tglx's timer tree
since he has the _HARD bits there. I *think* it requires to set the bits
_HARD during _init() and _start() otherwise there is (or was) a warning…

Sebastian
8<

From: Thomas Gleixner 
Date: Tue, 13 Aug 2019 14:29:41 +0200
Subject: [PATCH] KVM: arm/arm64: Let the timer expire in hardirq context on RT

The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.

Let the timer expire on hard-IRQ context even on -RT.

Signed-off-by: Thomas Gleixner 
Acked-by: Marc Zyngier 
Tested-by: Julien Grall 
Signed-off-by: Sebastian Andrzej Siewior 
---
 virt/kvm/arm/arch_timer.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 1be486d5d7cb4..0bfa7c5b5c890 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -80,7 +80,7 @@ static inline bool userspace_irqchip(struct kvm *kvm)
 static void soft_timer_start(struct hrtimer *hrt, u64 ns)
 {
hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
- HRTIMER_MODE_ABS);
+ HRTIMER_MODE_ABS_HARD);
 }
 
 static void soft_timer_cancel(struct hrtimer *hrt)
@@ -697,11 +697,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
ptimer->cntvoff = 0;
 
-   hrtimer_init(>bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+   hrtimer_init(>bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
timer->bg_timer.function = kvm_bg_timer_expire;
 
-   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
-   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+   hrtimer_init(>hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
vtimer->hrtimer.function = kvm_hrtimer_expire;
ptimer->hrtimer.function = kvm_hrtimer_expire;
 
-- 
2.23.0.rc1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm