Re: [PATCH 1/2] KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
Hi Robin, On 2020-06-09 12:41, Robin Murphy wrote: On 2020-06-09 09:49, Marc Zyngier wrote: AArch32 CP1x registers are overlayed on their AArch64 counterparts in the vcpu struct. This leads to an interesting problem as they are stored in their CPU-local format, and thus a CP1x register doesn't "hit" the lower 32bit portion of the AArch64 register on a BE host. To workaround this unfortunate situation, introduce a bias trick in the vcpu_cp1x() accessors which picks the correct half of the 64bit register. Cc: sta...@vger.kernel.org Reported-by: James Morse Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 59029e90b557..e80c0e06f235 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -404,8 +404,14 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg); * CP14 and CP15 live in the same array, as they are backed by the * same system registers. */ -#define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r)]) -#define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r)]) +#ifdef CPU_BIG_ENDIAN Ahem... I think you're missing a "CONFIG_" there ;) Duh! As I said, I didn't test the thing at all! ;-) Bonus trickery - for a 0 or 1 value you can simply use IS_ENABLED(). Beautiful! Definitely a must! :D Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/2] KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
On 2020-06-09 09:49, Marc Zyngier wrote: AArch32 CP1x registers are overlayed on their AArch64 counterparts in the vcpu struct. This leads to an interesting problem as they are stored in their CPU-local format, and thus a CP1x register doesn't "hit" the lower 32bit portion of the AArch64 register on a BE host. To workaround this unfortunate situation, introduce a bias trick in the vcpu_cp1x() accessors which picks the correct half of the 64bit register. Cc: sta...@vger.kernel.org Reported-by: James Morse Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 59029e90b557..e80c0e06f235 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -404,8 +404,14 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg); * CP14 and CP15 live in the same array, as they are backed by the * same system registers. */ -#define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r)]) -#define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r)]) +#ifdef CPU_BIG_ENDIAN Ahem... I think you're missing a "CONFIG_" there ;) Bonus trickery - for a 0 or 1 value you can simply use IS_ENABLED(). Robin. +#define CPx_OFFSET 1 +#else +#define CPx_OFFSET 0 +#endif + +#define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r) ^ CPx_OFFSET]) +#define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r) ^ CPx_OFFSET]) struct kvm_vm_stat { ulong remote_tlb_flush; ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [kvmtool][PATCH v2] arm64: Obtain text offset from kernel image
Hi, On 6/8/20 4:28 PM, Marc Zyngier wrote: > Recent changes made to Linux 5.8 have outlined that kvmtool > hardcodes the text offset instead of reading it from the arm64 > image itself. > > To address this, import the image header structure into kvmtool > and do the right thing. 32bit guests are still loaded to their > usual locations. > > While we're at it, check the image magic and default to the text > offset to be 0x8 when image_size is 0, as described in the > kernel's booting.rst document. > > Cc: Alexandru Elisei > Reported-by: Ard Biesheuvel > Signed-off-by: Marc Zyngier > --- > > Notes: > v2: Check magic, assume offset to be 0x8000 if image_size is 0. > > [..] I wanted to review the patch, but then I noticed that Will merged it. So I settled for testing it, to make sure nothing breaks. I was able to reproduce the issue reported on the mailing list - without this patch, when the kernel is compiled with CONFIG_RANDOMIZE_BASE not set, the kernel doesn't boot; with this patch applied, the same kernel boots successfully. I also tested it with an aarch32 kernel, this is what I got: $ taskset -c 4,5 ./lkvm run -c2 -m1024 -k ../kvmtool-vms/zImage -d ../kvmtool-vms/debian-10.3.0.img -p 'earlycon root=/dev/vda2' # lkvm run -k ../kvmtool-vms/zImage -m 1024 -c 2 --name guest-1834 Warning: Kernel image magic not matching Warning: unable to translate host address 0x13115a4c82d05a4d to guest Fatal: kernel image too big to contain in guest memory. IMO, works as expected. Thanks, Alex ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/2] KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
AArch32 CP1x registers are overlayed on their AArch64 counterparts in the vcpu struct. This leads to an interesting problem as they are stored in their CPU-local format, and thus a CP1x register doesn't "hit" the lower 32bit portion of the AArch64 register on a BE host. To workaround this unfortunate situation, introduce a bias trick in the vcpu_cp1x() accessors which picks the correct half of the 64bit register. Cc: sta...@vger.kernel.org Reported-by: James Morse Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 59029e90b557..e80c0e06f235 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -404,8 +404,14 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg); * CP14 and CP15 live in the same array, as they are backed by the * same system registers. */ -#define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r)]) -#define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r)]) +#ifdef CPU_BIG_ENDIAN +#define CPx_OFFSET 1 +#else +#define CPx_OFFSET 0 +#endif + +#define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r) ^ CPx_OFFSET]) +#define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r) ^ CPx_OFFSET]) struct kvm_vm_stat { ulong remote_tlb_flush; -- 2.26.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 0/2] KVM: arm64: Additional 32bit fixes
Here's a couple of patches that address the issues that James mentionned in [1], affecting 32bit guests. I lack a BE-capable host to properly test the first patch, but it is obviously correct (ha! ;-). [1] https://lore.kernel.org/r/20200526161834.29165-1-james.mo...@arm.com Marc Zyngier (2): KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception arch/arm64/include/asm/kvm_host.h | 10 -- arch/arm64/kvm/aarch32.c | 28 2 files changed, 36 insertions(+), 2 deletions(-) -- 2.26.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 2/2] KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
On a VHE system, the EL1 state is left in the CPU most of the time, and only syncronized back to memory when vcpu_put() is called (most of the time on preemption). Which means that when injecting an exception, we'd better have a way to either: (1) write directly to the EL1 sysregs (2) synchronize the state back to memory, and do the changes there For an AArch64, we already do (1), so we are safe. Unfortunately, doing the same thing for AArch32 would be pretty invasive. Instead, we can easily implement (2) by calling the put/load architectural backends, and keep preemption disabled. We can then reload the state back into EL1. Cc: sta...@vger.kernel.org Reported-by: James Morse Signed-off-by: Marc Zyngier --- arch/arm64/kvm/aarch32.c | 28 1 file changed, 28 insertions(+) diff --git a/arch/arm64/kvm/aarch32.c b/arch/arm64/kvm/aarch32.c index 0a356aa91aa1..40a62a99fbf8 100644 --- a/arch/arm64/kvm/aarch32.c +++ b/arch/arm64/kvm/aarch32.c @@ -33,6 +33,26 @@ static const u8 return_offsets[8][2] = { [7] = { 4, 4 }, /* FIQ, unused */ }; +static bool pre_fault_synchronize(struct kvm_vcpu *vcpu) +{ + preempt_disable(); + if (vcpu->arch.sysregs_loaded_on_cpu) { + kvm_arch_vcpu_put(vcpu); + return true; + } + + preempt_enable(); + return false; +} + +static void post_fault_synchronize(struct kvm_vcpu *vcpu, bool loaded) +{ + if (loaded) { + kvm_arch_vcpu_load(vcpu, smp_processor_id()); + preempt_enable(); + } +} + /* * When an exception is taken, most CPSR fields are left unchanged in the * handler. However, some are explicitly overridden (e.g. M[4:0]). @@ -155,7 +175,10 @@ static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset) void kvm_inject_undef32(struct kvm_vcpu *vcpu) { + bool loaded = pre_fault_synchronize(vcpu); + prepare_fault32(vcpu, PSR_AA32_MODE_UND, 4); + post_fault_synchronize(vcpu, loaded); } /* @@ -168,6 +191,9 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt, u32 vect_offset; u32 *far, *fsr; bool is_lpae; + bool loaded; + + loaded = pre_fault_synchronize(vcpu); if (is_pabt) { vect_offset = 12; @@ -191,6 +217,8 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt, /* no need to shuffle FS[4] into DFSR[10] as its 0 */ *fsr = DFSR_FSC_EXTABT_nLPAE; } + + post_fault_synchronize(vcpu, loaded); } void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr) -- 2.26.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Allow in-atomic injection of SPIs
On 2020-06-09 08:48, Auger Eric wrote: Hi Marc, On 6/8/20 7:19 PM, Marc Zyngier wrote: Hi Eric, On 2020-06-08 17:58, Auger Eric wrote: Hi Marc, On 5/26/20 6:11 PM, Marc Zyngier wrote: On a system that uses SPIs to implement MSIs (as it would be the case on a GICv2 system exposing a GICv2m to its guests), we deny the possibility of injecting SPIs on the in-atomic fast-path. This results in a very large amount of context-switches (roughly equivalent to twice the interrupt rate) on the host, and suboptimal performance for the guest (as measured with a test workload involving a virtio interface backed by vhost-net). Given that GICv2 systems are usually on the low-end of the spectrum performance wise, they could do without the aggravation. We solved this for GICv3+ITS by having a translation cache. But SPIs do not need any extra infrastructure, and can be immediately injected in the virtual distributor as the locking is already heavy enough that we don't need to worry about anything. This halves the number of context switches for the same workload. Signed-off-by: Marc Zyngier --- arch/arm64/kvm/vgic/vgic-irqfd.c | 20 arch/arm64/kvm/vgic/vgic-its.c | 3 +-- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c index d8cdfea5cc96..11a9f81115ab 100644 --- a/arch/arm64/kvm/vgic/vgic-irqfd.c +++ b/arch/arm64/kvm/vgic/vgic-irqfd.c There is still a comment above saying * Currently only direct MSI injection is supported. I believe this comment to be correct. There is no path other than MSI injection that leads us here. Case in point, we only ever inject a rising edge through this API, never a falling one. Isn't this path entered whenever you have either of the combo being used? KVM_SET_MSI_ROUTING + KVM_IRQFD KVM_SET_GSI_ROUTING + KVM_IRQFD I've always considered these to be MSIs, but maybe I'm narrow minded... ;-) I had in mind direct MSI injection was KVM_SIGNAL_MSI/ kvm_send_userspace_msi/kvm_set_msi Fair enough. Zengui was also unhappy about this comment, so I'll remove it. After all, we shouldn't care whether that's a MSI or not. @@ -107,15 +107,27 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, int irq_source_id, int level, bool line_status) { - if (e->type == KVM_IRQ_ROUTING_MSI && vgic_has_its(kvm) && level) { + if (!level) + return -EWOULDBLOCK; + + switch (e->type) { + case KVM_IRQ_ROUTING_MSI: { struct kvm_msi msi; + if (!vgic_has_its(kvm)) + return -EINVAL; Shouldn't we return -EWOULDBLOCK by default? QEMU does not use that path with GICv2m but in kvm_set_routing_entry() I don't see any check related to the ITS. Fair enough. I really don't anticipate anyone to be using KVM_IRQ_ROUTING_MSI with anything but the ITS, but who knows, people are crazy! ;-) + kvm_populate_msi(e, ); - if (!vgic_its_inject_cached_translation(kvm, )) - return 0; + return vgic_its_inject_cached_translation(kvm, ); } - return -EWOULDBLOCK; + case KVM_IRQ_ROUTING_IRQCHIP: + /* Injecting SPIs is always possible in atomic context */ + return vgic_irqfd_set_irq(e, kvm, irq_source_id, 1, line_status); what about the mutex_lock(>lock) called from within vgic_irqfd_set_irq/kvm_vgic_inject_irq/vgic_lazy_init Holy crap. The lazy GIC init strikes again :-(. How about this on top of the existing patch: diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c index 11a9f81115ab..6e5ca04d5589 100644 --- a/arch/arm64/kvm/vgic/vgic-irqfd.c +++ b/arch/arm64/kvm/vgic/vgic-irqfd.c @@ -115,19 +115,23 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e, struct kvm_msi msi; if (!vgic_has_its(kvm)) - return -EINVAL; + break; kvm_populate_msi(e, ); return vgic_its_inject_cached_translation(kvm, ); } case KVM_IRQ_ROUTING_IRQCHIP: - /* Injecting SPIs is always possible in atomic context */ + /* + * Injecting SPIs is always possible in atomic context + * as long as the damn vgic is initialized. + */ + if (unlikely(!vgic_initialized(kvm))) + break; Yes this should prevent the wait situation. But may be worth to deep into the call stack. Won't analyzers complain at some point? I have gone all the way in the call stack, to be sure. The init is the only point we use a mutex, and we're under a hard spinlock just after (all the injection proper is happening with interrupt disabled). As for things like lockdep, they track the dynamic state, and not whether certain branches are simply "possible". A code analyser that wouldn't take control flow into account would be pretty broken! ;-) Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] KVM: arm64: Allow in-atomic injection of SPIs
Hi Marc, On 6/8/20 7:19 PM, Marc Zyngier wrote: > Hi Eric, > > On 2020-06-08 17:58, Auger Eric wrote: >> Hi Marc, >> >> On 5/26/20 6:11 PM, Marc Zyngier wrote: >>> On a system that uses SPIs to implement MSIs (as it would be >>> the case on a GICv2 system exposing a GICv2m to its guests), >>> we deny the possibility of injecting SPIs on the in-atomic >>> fast-path. >>> >>> This results in a very large amount of context-switches >>> (roughly equivalent to twice the interrupt rate) on the host, >>> and suboptimal performance for the guest (as measured with >>> a test workload involving a virtio interface backed by vhost-net). >>> Given that GICv2 systems are usually on the low-end of the spectrum >>> performance wise, they could do without the aggravation. >>> >>> We solved this for GICv3+ITS by having a translation cache. But >>> SPIs do not need any extra infrastructure, and can be immediately >>> injected in the virtual distributor as the locking is already >>> heavy enough that we don't need to worry about anything. >>> >>> This halves the number of context switches for the same workload. >>> >>> Signed-off-by: Marc Zyngier >>> --- >>> arch/arm64/kvm/vgic/vgic-irqfd.c | 20 >>> arch/arm64/kvm/vgic/vgic-its.c | 3 +-- >>> 2 files changed, 17 insertions(+), 6 deletions(-) >>> >>> diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c >>> b/arch/arm64/kvm/vgic/vgic-irqfd.c >>> index d8cdfea5cc96..11a9f81115ab 100644 >>> --- a/arch/arm64/kvm/vgic/vgic-irqfd.c >>> +++ b/arch/arm64/kvm/vgic/vgic-irqfd.c >> There is still a comment above saying >> * Currently only direct MSI injection is supported. > > I believe this comment to be correct. There is no path other > than MSI injection that leads us here. Case in point, we only > ever inject a rising edge through this API, never a falling one. Isn't this path entered whenever you have either of the combo being used? KVM_SET_MSI_ROUTING + KVM_IRQFD KVM_SET_GSI_ROUTING + KVM_IRQFD I had in mind direct MSI injection was KVM_SIGNAL_MSI/ kvm_send_userspace_msi/kvm_set_msi > >>> @@ -107,15 +107,27 @@ int kvm_arch_set_irq_inatomic(struct >>> kvm_kernel_irq_routing_entry *e, >>> struct kvm *kvm, int irq_source_id, int level, >>> bool line_status) >>> { >>> - if (e->type == KVM_IRQ_ROUTING_MSI && vgic_has_its(kvm) && level) { >>> + if (!level) >>> + return -EWOULDBLOCK; >>> + >>> + switch (e->type) { >>> + case KVM_IRQ_ROUTING_MSI: { >>> struct kvm_msi msi; >>> >>> + if (!vgic_has_its(kvm)) >>> + return -EINVAL; >> Shouldn't we return -EWOULDBLOCK by default? >> QEMU does not use that path with GICv2m but in kvm_set_routing_entry() I >> don't see any check related to the ITS. > > Fair enough. I really don't anticipate anyone to be using > KVM_IRQ_ROUTING_MSI with anything but the ITS, but who knows, > people are crazy! ;-) > >>> + >>> kvm_populate_msi(e, ); >>> - if (!vgic_its_inject_cached_translation(kvm, )) >>> - return 0; >>> + return vgic_its_inject_cached_translation(kvm, ); >> >>> } >>> >>> - return -EWOULDBLOCK; >>> + case KVM_IRQ_ROUTING_IRQCHIP: >>> + /* Injecting SPIs is always possible in atomic context */ >>> + return vgic_irqfd_set_irq(e, kvm, irq_source_id, 1, >>> line_status); >> what about the mutex_lock(>lock) called from within >> vgic_irqfd_set_irq/kvm_vgic_inject_irq/vgic_lazy_init > > Holy crap. The lazy GIC init strikes again :-(. > How about this on top of the existing patch: > > diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c > b/arch/arm64/kvm/vgic/vgic-irqfd.c > index 11a9f81115ab..6e5ca04d5589 100644 > --- a/arch/arm64/kvm/vgic/vgic-irqfd.c > +++ b/arch/arm64/kvm/vgic/vgic-irqfd.c > @@ -115,19 +115,23 @@ int kvm_arch_set_irq_inatomic(struct > kvm_kernel_irq_routing_entry *e, > struct kvm_msi msi; > > if (!vgic_has_its(kvm)) > - return -EINVAL; > + break; > > kvm_populate_msi(e, ); > return vgic_its_inject_cached_translation(kvm, ); > } > > case KVM_IRQ_ROUTING_IRQCHIP: > - /* Injecting SPIs is always possible in atomic context */ > + /* > + * Injecting SPIs is always possible in atomic context > + * as long as the damn vgic is initialized. > + */ > + if (unlikely(!vgic_initialized(kvm))) > + break; Yes this should prevent the wait situation. But may be worth to deep into the call stack. Won't analyzers complain at some point? Thanks Eric > return vgic_irqfd_set_irq(e, kvm, irq_source_id, 1, line_status); > - > - default: > - return -EWOULDBLOCK; > } > + > + return -EWOULDBLOCK; > } > > int kvm_vgic_setup_default_irq_routing(struct kvm *kvm) > > > Thanks, > > M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu
Re: [PATCH 3/3] KVM: arm64: Enforce PtrAuth being disabled if not advertized
Hi Mark, On 2020-06-04 16:39, Mark Rutland wrote: Hi Marc, On Thu, Jun 04, 2020 at 02:33:54PM +0100, Marc Zyngier wrote: Even if we don't expose PtrAuth to a guest, the guest can still write to its SCTIRLE_1 register and set the En{I,D}{A,B} bits and execute PtrAuth instructions from the NOP space. This has the effect of trapping to EL2, and we currently inject an UNDEF. I think it's worth noting that this is an ill-behaved guest, as those bits are RES0 when pointer authentication isn't implemented. The rationale for RES0/RES1 bits is that new HW can rely on old SW programming them with the 0/1 as appropriate, and that old SW that does not do so may encounter behaviour which from its PoV is UNPREDICTABLE. The SW side of the contract is that you must program them as 0/1 unless you know they're allocated with a specific meaning. With that in mind I think the current behaviour is legitimate: from the guest's PoV it's the same as there being a distinct extension which it is not aware of where the En{I,D}{A,B} bits means "trap some HINTs to EL1". I don't think that we should attempt to work around broken software here unless we absolutely have to, as it only adds complexity for no real gain. Fair enough. I was worried of the behaviour difference between HW without PtrAuth and a guest with HW not advertised. Ideally, they should have the same behaviour, but the architecture feels a bit brittle here. Anyway, I'll drop this patch, and hopefully no guest will play this game (they'll know pretty quickly about the issue anyway). Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm