Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On 2015-06-25 11:25, Claudio Fontana wrote: > On 25.06.2015 11:10, Peter Maydell wrote: >> On 25 June 2015 at 09:59, Claudio Fontana wrote: >>> Once the VM is created, I think QEMU should not request kvm to >>> change the virtual offset of the VM anymore: maybe an unexpected >>> consequence of QEMU's target-arm/kvm64.c::kvm_arch_put_registers ? >> >> Hmm. In general we assume that we can: >> * stop the VM >> * read all the guest system registers >> * write those values back again >> * restart the VM >> >> if we need to. Is that what's happening here, or are we doing >> something odder? >> >> -- PMM >> > > What I guess could be happening by looking at the code in linux > > virt/kvm/arm/arch_timer.c::kvm_arm_timer_set_reg > > is that QEMU tries to set the KVM_REG_ARM_TIMER_CNT register from exactly the > previous value, > but just because of the fact that the set function is called, cntvoff is > updated, > since the value provided by the user is apparently assumed to be _relative_ > to the physical timer. > > This is apparent to me in the code in that function which says: > > case KVM_REG_ARM_TIMER_CNT: { > /* ... */ > u64 cntvoff = kvm_phys_timer_read() - value; > /* ... */ > } > > And this is matched by the corresponding get function kvm_arm_timer_get_reg > where it says: > > case KVM_REG_ARM_TIMER_CNT: >return kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff; > > The time difference between when the GET is issued by QEMU and when the PUT > is issued then would account for the difference. QEMU has the concept of write-back levels: KVM_PUT_RUNTIME_STATE, KVM_PUT_RESET_STATE and KVM_PUT_FULL_STATE. I suspect this registers is just sorted into the wrong category, thus written as part of the RUNTIME_STATE. We had such bug patterns during the x86 maturing phase as well. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 4/4] KVM: x86: Add support for local interrupt requests from userspace
> However, why is the roundtrip to userspace necessary? Could you pass > the extint index directly as an argument to KVM_INTERRUPT? It's > backwards-compatible, because KVM_INTERRUPT so far could not be used > together with an in-kernel LAPIC. If you could do that, you could also > avoid the new userspace_extint_available field. Implemented a basic version of this, and ran into some potential issues with this strategy. Supporting PIC masking/unmasking by the CPU/APIC means that either: A) PIC interrupts need to be bufferable in the kernel (with some way of comparing priorities). B) the APIC state needs to be read in order to fetch the bit as to whether or not the PIC is being masked (which I believe can be done from userspace via the APIC state ioctl). C) something hacky that doesn't conform to the PIC spec but still happens to boot common OSes (like buffering the interrupts and injecting them in the order of arrival (which is wrong)). Steve -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] arm/arm64: speed up spinlocks and atomic ops
spinlock torture tests made it clear that checking mmu_enabled() every time we call spin_lock is a bad idea. As most tests will want the MMU enabled the entire time, then we can inline a light weight "nobody disabled the mmu" check, and bail out early. Adding a light weight inlined check vs. a compile-time flag suggested by Paolo. Signed-off-by: Andrew Jones --- lib/arm/asm/mmu-api.h | 7 ++- lib/arm/mmu.c | 9 +++-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/lib/arm/asm/mmu-api.h b/lib/arm/asm/mmu-api.h index 12fdc57918c27..b2fc900a30add 100644 --- a/lib/arm/asm/mmu-api.h +++ b/lib/arm/asm/mmu-api.h @@ -1,7 +1,12 @@ #ifndef __ASMARM_MMU_API_H_ #define __ASMARM_MMU_API_H_ extern pgd_t *mmu_idmap; -extern bool mmu_enabled(void); +extern unsigned int mmu_disabled_cpu_count; +extern bool __mmu_enabled(void); +static inline bool mmu_enabled(void) +{ + return mmu_disabled_cpu_count == 0 || __mmu_enabled(); +} extern void mmu_enable(pgd_t *pgtable); extern void mmu_disable(void); extern void mmu_enable_idmap(void); diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c index ad558e177dd1c..e661d4f26e4b7 100644 --- a/lib/arm/mmu.c +++ b/lib/arm/mmu.c @@ -15,8 +15,9 @@ extern unsigned long etext; pgd_t *mmu_idmap; static cpumask_t mmu_disabled_cpumask; +unsigned int mmu_disabled_cpu_count; -bool mmu_enabled(void) +bool __mmu_enabled(void) { int cpu = current_thread_info()->cpu; @@ -30,7 +31,9 @@ void mmu_enable(pgd_t *pgtable) asm_mmu_enable(__pa(pgtable)); flush_tlb_all(); - cpumask_clear_cpu(cpu, &mmu_disabled_cpumask); + + if (cpumask_test_and_clear_cpu(cpu, &mmu_disabled_cpumask)) + --mmu_disabled_cpu_count; } extern void asm_mmu_disable(void); @@ -39,6 +42,8 @@ void mmu_disable(void) int cpu = current_thread_info()->cpu; cpumask_set_cpu(cpu, &mmu_disabled_cpumask); + ++mmu_disabled_cpu_count; + asm_mmu_disable(); } -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] arm/arm64: Introduce mmu_disable
Allow unit test cpus to disable the MMU. Why not? We want the test framework to be as flexible as possible. Callers will have to deal with the cache coherency fallout... Cache flush support is still forthcoming to the framework though. Signed-off-by: Andrew Jones --- arm/cstart.S | 9 + arm/cstart64.S| 8 lib/arm/asm/mmu-api.h | 1 + lib/arm/mmu.c | 8 4 files changed, 26 insertions(+) diff --git a/arm/cstart.S b/arm/cstart.S index 2b7f8f3d200ef..3943867d2f219 100644 --- a/arm/cstart.S +++ b/arm/cstart.S @@ -159,6 +159,15 @@ asm_mmu_enable: mov pc, lr +.globl asm_mmu_disable +asm_mmu_disable: + /* SCTLR */ + mrc p15, 0, r0, c1, c0, 0 + bic r0, #CR_M + mcr p15, 0, r0, c1, c0, 0 + isb + mov pc, lr + /* * Vector stubs * Simplified version of the Linux kernel implementation diff --git a/arm/cstart64.S b/arm/cstart64.S index cdda13c17af9e..44cff32d0f18e 100644 --- a/arm/cstart64.S +++ b/arm/cstart64.S @@ -146,6 +146,14 @@ asm_mmu_enable: ret +.globl asm_mmu_disable +asm_mmu_disable: + mrs x0, sctlr_el1 + bic x0, x0, SCTLR_EL1_M + msr sctlr_el1, x0 + isb + ret + /* * Vectors * Adapted from arch/arm64/kernel/entry.S diff --git a/lib/arm/asm/mmu-api.h b/lib/arm/asm/mmu-api.h index 68dc707d67241..c46c4b08b14cc 100644 --- a/lib/arm/asm/mmu-api.h +++ b/lib/arm/asm/mmu-api.h @@ -4,6 +4,7 @@ extern pgd_t *mmu_idmap; extern bool mmu_enabled(void); extern void mmu_set_enabled(void); extern void mmu_enable(pgd_t *pgtable); +extern void mmu_disable(void); extern void mmu_enable_idmap(void); extern void mmu_init_io_sect(pgd_t *pgtable, unsigned long virt_offset); extern void mmu_set_range_sect(pgd_t *pgtable, unsigned long virt_offset, diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c index 732000a8eb088..5966b408cb455 100644 --- a/lib/arm/mmu.c +++ b/lib/arm/mmu.c @@ -35,6 +35,14 @@ void mmu_enable(pgd_t *pgtable) mmu_set_enabled(); } +extern void asm_mmu_disable(void); +void mmu_disable(void) +{ + struct thread_info *ti = current_thread_info(); + cpumask_clear_cpu(ti->cpu, &mmu_enabled_cpumask); + asm_mmu_disable(); +} + void mmu_set_range_ptes(pgd_t *pgtable, unsigned long virt_offset, unsigned long phys_start, unsigned long phys_end, pgprot_t prot) -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] arm/arm64: drop mmu_set_enabled
The mmu is enabled automatically for all cpus, they must disable it themselves if they don't want it on. Switch from managing a cpumask of enabled cpus to one of disabled cpus. This allows us to remove the mmu_set_enabled call from secondary_cinit, and the function all together. Signed-off-by: Andrew Jones --- lib/arm/asm/mmu-api.h | 1 - lib/arm/mmu.c | 21 ++--- lib/arm/smp.c | 1 - 3 files changed, 10 insertions(+), 13 deletions(-) diff --git a/lib/arm/asm/mmu-api.h b/lib/arm/asm/mmu-api.h index c46c4b08b14cc..12fdc57918c27 100644 --- a/lib/arm/asm/mmu-api.h +++ b/lib/arm/asm/mmu-api.h @@ -2,7 +2,6 @@ #define __ASMARM_MMU_API_H_ extern pgd_t *mmu_idmap; extern bool mmu_enabled(void); -extern void mmu_set_enabled(void); extern void mmu_enable(pgd_t *pgtable); extern void mmu_disable(void); extern void mmu_enable_idmap(void); diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c index 5966b408cb455..ad558e177dd1c 100644 --- a/lib/arm/mmu.c +++ b/lib/arm/mmu.c @@ -14,32 +14,31 @@ extern unsigned long etext; pgd_t *mmu_idmap; -static cpumask_t mmu_enabled_cpumask; +static cpumask_t mmu_disabled_cpumask; + bool mmu_enabled(void) { - struct thread_info *ti = current_thread_info(); - return cpumask_test_cpu(ti->cpu, &mmu_enabled_cpumask); -} + int cpu = current_thread_info()->cpu; -void mmu_set_enabled(void) -{ - struct thread_info *ti = current_thread_info(); - cpumask_set_cpu(ti->cpu, &mmu_enabled_cpumask); + return !cpumask_test_cpu(cpu, &mmu_disabled_cpumask); } extern void asm_mmu_enable(phys_addr_t pgtable); void mmu_enable(pgd_t *pgtable) { + int cpu = current_thread_info()->cpu; + asm_mmu_enable(__pa(pgtable)); flush_tlb_all(); - mmu_set_enabled(); + cpumask_clear_cpu(cpu, &mmu_disabled_cpumask); } extern void asm_mmu_disable(void); void mmu_disable(void) { - struct thread_info *ti = current_thread_info(); - cpumask_clear_cpu(ti->cpu, &mmu_enabled_cpumask); + int cpu = current_thread_info()->cpu; + + cpumask_set_cpu(cpu, &mmu_disabled_cpumask); asm_mmu_disable(); } diff --git a/lib/arm/smp.c b/lib/arm/smp.c index f389ba6598faa..ca435dcd5f4a2 100644 --- a/lib/arm/smp.c +++ b/lib/arm/smp.c @@ -23,7 +23,6 @@ secondary_entry_fn secondary_cinit(void) secondary_entry_fn entry; thread_info_init(ti, 0); - mmu_set_enabled(); /* * Save secondary_data.entry locally to avoid opening a race -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] arm/arm64: rework mmu_enabled, for spinlock speedup
Andrew Jones (3): arm/arm64: Introduce mmu_disable arm/arm64: drop mmu_set_enabled arm/arm64: speed up spinlocks and atomic ops arm/cstart.S | 9 + arm/cstart64.S| 8 lib/arm/asm/mmu-api.h | 9 +++-- lib/arm/mmu.c | 32 ++-- lib/arm/smp.c | 1 - 5 files changed, 46 insertions(+), 13 deletions(-) -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v5] i386: Introduce ARAT CPU feature
On Sun, Jun 07, 2015 at 11:15:08AM +0200, Jan Kiszka wrote: > From: Jan Kiszka > > ARAT signals that the APIC timer does not stop in power saving states. > As our APICs are emulated, it's fine to expose this feature to guests, > at least when asking for KVM host features or with CPU types that > include the flag. The exact model number that introduced the feature is > not known, but reports can be found that it's at least available since > Sandy Bridge. > > Signed-off-by: Jan Kiszka Reviewed-by: Eduardo Habkost Applied to the x86 tree. Thanks! -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] vhost: add ioctl to query nregions upper limit
On Wed, 24 Jun 2015 17:08:56 +0200 "Michael S. Tsirkin" wrote: > On Wed, Jun 24, 2015 at 04:52:29PM +0200, Igor Mammedov wrote: > > On Wed, 24 Jun 2015 16:17:46 +0200 > > "Michael S. Tsirkin" wrote: > > > > > On Wed, Jun 24, 2015 at 04:07:27PM +0200, Igor Mammedov wrote: > > > > On Wed, 24 Jun 2015 15:49:27 +0200 > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > Userspace currently simply tries to give vhost as many regions > > > > > as it happens to have, but you only have the mem table > > > > > when you have initialized a large part of VM, so graceful > > > > > failure is very hard to support. > > > > > > > > > > The result is that userspace tends to fail catastrophically. > > > > > > > > > > Instead, add a new ioctl so userspace can find out how much > > > > > kernel supports, up front. This returns a positive value that > > > > > we commit to. > > > > > > > > > > Also, document our contract with legacy userspace: when > > > > > running on an old kernel, you get -1 and you can assume at > > > > > least 64 slots. Since 0 value's left unused, let's make that > > > > > mean that the current userspace behaviour (trial and error) > > > > > is required, just in case we want it back. > > > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > Cc: Igor Mammedov > > > > > Cc: Paolo Bonzini > > > > > --- > > > > > include/uapi/linux/vhost.h | 17 - > > > > > drivers/vhost/vhost.c | 5 + > > > > > 2 files changed, 21 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/include/uapi/linux/vhost.h > > > > > b/include/uapi/linux/vhost.h index ab373191..f71fa6d 100644 > > > > > --- a/include/uapi/linux/vhost.h > > > > > +++ b/include/uapi/linux/vhost.h > > > > > @@ -80,7 +80,7 @@ struct vhost_memory { > > > > > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > > > > > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > > > > > > > > > -/* Set up/modify memory layout */ > > > > > +/* Set up/modify memory layout: see also > > > > > VHOST_GET_MEM_MAX_NREGIONS below. */ #define > > > > > VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct > > > > > vhost_memory) /* Write logging setup. */ > > > > > @@ -127,6 +127,21 @@ struct vhost_memory { > > > > > /* Set eventfd to signal an error */ > > > > > #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct > > > > > vhost_vring_file) > > > > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE > > > > > arguments. > > > > > + * Returns: > > > > > + * 0 < value <= MAX_INT - gives the upper limit, > > > > > higher values will fail > > > > > + * 0 - there's no static limit: try and see if it > > > > > works > > > > > + * -1 - on failure > > > > > + */ > > > > > +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) > > > > > + > > > > > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no > > > > > static limit: > > > > > + * try and it'll work if you are lucky. */ > > > > > +#define VHOST_MEM_MAX_NREGIONS_NONE 0 > > > > is it needed? we always have a limit, > > > > or don't have IOCTL => -1 => old try and see way > > > > > > > > > +/* We support at least as many nregions in > > > > > VHOST_SET_MEM_TABLE: > > > > > + * for use on legacy kernels without > > > > > VHOST_GET_MEM_MAX_NREGIONS support. */ +#define > > > > > VHOST_MEM_MAX_NREGIONS_DEFAULT 64 > > > > ^^^ not used below, > > > > if it's for legacy then perhaps s/DEFAULT/LEGACY/ > > > > > > The assumption was that userspace detecting old kernels will just > > > use 64, this means we do want a flag to get the old way. > > > > > > OTOH if you won't think it's useful, let me know. > > this header will be synced into QEMU's tree so that we could use > > this define there, isn't it? IMHO then _LEGACY is more exact > > description of macro. > > > > As for 0 return value, -1 is just fine for detecting old kernels > > (i.e. try and see if it works), so 0 looks unnecessary but it > > doesn't in any way hurt either. For me limit or -1 is enough to try > > fix userspace. > > OK. > Do you want to try now before I do v2? I've just tried, idea to check limit is unusable in this case. here is a link to a patch that implements it: https://github.com/imammedo/qemu/commits/vhost_slot_limit_check slots count is changing dynamically depending on used devices and more importantly guest OS could change slots count during its runtime when during managing devices it could trigger repartitioning of current memory table as device's memory regions mapped into address space. That leads to 2 different values of used slots at guest startup time and after guest booted or after hotplug. I my case guest could be started with max 58 DIMMs coldplugged, but after boot 3 more slots are freed and it's possible to hotadd 3 more DIMMs. That however leads to the guest that can't be migrated to since by QEMU design all hotplugged devices should be present at target's startup time i.e. 60 DIMMs total and that obviously goes above vhost limit
Re: [PATCH 2/3] arm/arm64: speed up spinlocks and atomic ops
On Thu, Jun 25, 2015 at 06:23:48PM +0200, Paolo Bonzini wrote: > > > On 25/06/2015 18:12, Andrew Jones wrote: > > spinlock torture tests made it clear that checking mmu_enabled() > > every time we call spin_lock is a bad idea. As most tests will > > want the MMU enabled the entire time, then just hard code > > mmu_enabled() to true. Tests that want to play with the MMU can > > be compiled with CONFIG_MAY_DISABLE_MMU to get the actual check > > back. > > This doesn't work if you compile mmu.o just once. Can you make > something like > > static inline bool mmu_enabled(void) > { > return disabled_mmu_cpu_count == 0 || __mmu_enabled(); > } > > ... > > bool __mmu_enabled(void) > { > struct thread_info *ti = current_thread_info(); > return cpumask_test_cpu(ti->cpu, &mmu_enabled_cpumask); > } > > ? Agreed. But I might as well actually add the support for disabling the mmu, if I'm going to add yet another variable dependant on it. We should drop this patch from this series, and I'll submit another series of a few patches that - introduce mmu_disable - switch to assuming the mmu is enabled, and then manage disabled state instead - optimize mmu_enabled() Thanks, drew > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] arm/arm64: speed up spinlocks and atomic ops
On 25/06/2015 18:12, Andrew Jones wrote: > spinlock torture tests made it clear that checking mmu_enabled() > every time we call spin_lock is a bad idea. As most tests will > want the MMU enabled the entire time, then just hard code > mmu_enabled() to true. Tests that want to play with the MMU can > be compiled with CONFIG_MAY_DISABLE_MMU to get the actual check > back. This doesn't work if you compile mmu.o just once. Can you make something like static inline bool mmu_enabled(void) { return disabled_mmu_cpu_count == 0 || __mmu_enabled(); } ... bool __mmu_enabled(void) { struct thread_info *ti = current_thread_info(); return cpumask_test_cpu(ti->cpu, &mmu_enabled_cpumask); } ? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] arm/arm64: spinlocks: fix memory barriers
It shouldn't be necessary to use a barrier on the way into spin_lock. We'll be focused on a single address until we get it (exclusively) set, and then we'll do a barrier on the way out. Also, it does make sense to do a barrier on the way in to spin_unlock, i.e. ensure what we did in the critical section is ordered wrt to what we do outside it, before we announce that we're outside. Signed-off-by: Andrew Jones --- lib/arm/spinlock.c | 8 lib/arm64/spinlock.c | 5 ++--- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c index 3b023ceaebf71..116ea5d7db930 100644 --- a/lib/arm/spinlock.c +++ b/lib/arm/spinlock.c @@ -7,10 +7,9 @@ void spin_lock(struct spinlock *lock) { u32 val, fail; - dmb(); - if (!mmu_enabled()) { lock->v = 1; + smp_mb(); return; } @@ -25,11 +24,12 @@ void spin_lock(struct spinlock *lock) : "r" (&lock->v) : "cc" ); } while (fail); - dmb(); + + smp_mb(); } void spin_unlock(struct spinlock *lock) { + smp_mb(); lock->v = 0; - dmb(); } diff --git a/lib/arm64/spinlock.c b/lib/arm64/spinlock.c index 68b68b75ba60d..a3907f03cacda 100644 --- a/lib/arm64/spinlock.c +++ b/lib/arm64/spinlock.c @@ -13,10 +13,9 @@ void spin_lock(struct spinlock *lock) { u32 val, fail; - smp_mb(); - if (!mmu_enabled()) { lock->v = 1; + smp_mb(); return; } @@ -35,9 +34,9 @@ void spin_lock(struct spinlock *lock) void spin_unlock(struct spinlock *lock) { + smp_mb(); if (mmu_enabled()) asm volatile("stlrh wzr, [%0]" :: "r" (&lock->v)); else lock->v = 0; - smp_mb(); } -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] arm/arm64: tcg_baremetal_tests inspired patches
While porting the test in virtualopensystems' tcg_baremetal_tests to kvm-unit-tests, I found a couple places to improve the spinlock implementation. I also added a way to build a single test that doesn't necessary have an entry in the makefile, which should be handy for experimental tests. Andrew Jones (3): arm/arm64: spinlocks: fix memory barriers arm/arm64: allow building a single test arm/arm64: speed up spinlocks and atomic ops config/config-arm-common.mak | 6 ++ lib/arm/asm/mmu-api.h| 4 lib/arm/mmu.c| 3 +++ lib/arm/spinlock.c | 8 lib/arm64/spinlock.c | 5 ++--- 5 files changed, 19 insertions(+), 7 deletions(-) -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] arm/arm64: allow building a single test
This is mostly useful for building new tests that don't yet (and may never) have entries in the makefiles (config-arm*.mak). Of course it can be used to build tests that do have entries as well, in order to avoid building all tests, if the plan is to run just the one. Just do 'make TEST=some-test' to use it, where "some-test" matches the name of the source file, i.e. arm/some-test.c Signed-off-by: Andrew Jones --- config/config-arm-common.mak | 6 ++ 1 file changed, 6 insertions(+) diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak index 314261ef60cf7..2ad29c115f946 100644 --- a/config/config-arm-common.mak +++ b/config/config-arm-common.mak @@ -12,6 +12,11 @@ endif tests-common = \ $(TEST_DIR)/selftest.flat +ifneq ($(TEST),) + tests = $(TEST_DIR)/$(TEST).flat + tests-common = +endif + all: test_cases ## @@ -69,4 +74,5 @@ generated_files = $(asm-offsets) test_cases: $(generated_files) $(tests-common) $(tests) +$(TEST_DIR)/$(TEST).elf: $(cstart.o) $(TEST_DIR)/$(TEST).o $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] arm/arm64: speed up spinlocks and atomic ops
spinlock torture tests made it clear that checking mmu_enabled() every time we call spin_lock is a bad idea. As most tests will want the MMU enabled the entire time, then just hard code mmu_enabled() to true. Tests that want to play with the MMU can be compiled with CONFIG_MAY_DISABLE_MMU to get the actual check back. Signed-off-by: Andrew Jones --- lib/arm/asm/mmu-api.h | 4 lib/arm/mmu.c | 3 +++ 2 files changed, 7 insertions(+) diff --git a/lib/arm/asm/mmu-api.h b/lib/arm/asm/mmu-api.h index 68dc707d67241..1a4d91163c398 100644 --- a/lib/arm/asm/mmu-api.h +++ b/lib/arm/asm/mmu-api.h @@ -1,7 +1,11 @@ #ifndef __ASMARM_MMU_API_H_ #define __ASMARM_MMU_API_H_ extern pgd_t *mmu_idmap; +#ifdef CONFIG_MAY_DISABLE_MMU extern bool mmu_enabled(void); +#else +#define mmu_enabled() (1) +#endif extern void mmu_set_enabled(void); extern void mmu_enable(pgd_t *pgtable); extern void mmu_enable_idmap(void); diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c index 732000a8eb088..405717b6332bf 100644 --- a/lib/arm/mmu.c +++ b/lib/arm/mmu.c @@ -15,11 +15,14 @@ extern unsigned long etext; pgd_t *mmu_idmap; static cpumask_t mmu_enabled_cpumask; + +#ifdef CONFIG_MAY_DISABLE_MMU bool mmu_enabled(void) { struct thread_info *ti = current_thread_info(); return cpumask_test_cpu(ti->cpu, &mmu_enabled_cpumask); } +#endif void mmu_set_enabled(void) { -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
On Thu, 2015-06-25 at 09:37 +, Wu, Feng wrote: > > > -Original Message- > > From: Joerg Roedel [mailto:j...@8bytes.org] > > Sent: Wednesday, June 24, 2015 11:46 PM > > To: Alex Williamson > > Cc: Wu, Feng; Eric Auger; Avi Kivity; kvm@vger.kernel.org; > > linux-ker...@vger.kernel.org; pbonz...@redhat.com; mtosa...@redhat.com > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > > > On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote: > > > There are plenty of details to be filled in, > > > > I also need to fill plenty of details in my head first, so here are some > > suggestions based on my current understanding. Please don't hesitate to > > correct me if where I got something wrong. > > > > So first I totally agree that the handling of PI/non-PI configurations > > should be transparent to user-space. > > After thinking about this a bit more, I recall that why I used user-space > to trigger the IRTE update for posted-interrupts, here is the reason: > > Let's take MSI for an example: > When guest updates the MSI configuration, here is the code path in > QEMU and KVM: > > vfio_update_msi() --> vfio_update_kvm_msi_virq() --> > kvm_irqchip_update_msi_route() --> kvm_update_routing_entry() --> > kvm_irqchip_commit_routes() --> kvm_irqchip_commit_routes() --> > KVM_SET_GSI_ROUTING --> kvm_set_irq_routing() > > It will finally go to kvm_set_irq_routing() in KVM, there are two problem: > 1. It use RCU in this function, it is hard to find which entry in the irq > routing > table is being updated. > 2. Even we find the updated entry, it is hard to find the associated assigned > device with this irq routing entry. > > So I used a VFIO API to notify KVM the updated MSI/MSIx configuration and > the associated assigned devices. I think we need to find a way to address > the above two issues before going forward. Alex, what is your opinion? So the trouble is that QEMU vfio updates a single MSI vector, but that just updates a single entry within a whole table of routes, then the whole table is pushed to KVM. But in kvm_set_irq_routing() we have access to both the new and the old tables, so we do have the ability to detect the change. We can therefore detect which GSI changed and cross-reference that to KVMs irqfds. If we have an irqfd that matches the GSI then we have all the information we need, right? We can use the eventfd_ctx of the irqfd to call into the IRQ bypass manager if we need to. If it's an irqfd that's already enabled for bypass then we may already have the data we need to tweak the PI config. Yes, I agree it's more difficult, but it doesn't appear to be impossible, right? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
my name is Mrs. Alice Walton,i have a charity proposal for you -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/kvm: Add generic v8 KVM target
On 25 June 2015 at 14:44, Marc Zyngier wrote: > It should always be possible to emulate a "known" CPU on a generic host, > and it should be able to migrate. The case we can't migrate is when we > let the guest be generic (which I guess should really be unknown, and > not generic). > > So if the user specify "-cpu cortex-a57" on the command line, the guest > should be able to migrate from an A72 to an A53. if the user specified > "-cpu host", the resulting guest won't be able to migrate. > > Does it make sense? Yes. We've always said "-cpu host" won't be cross-host migratable from a QEMU point of view. -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/kvm: Add generic v8 KVM target
On 25/06/15 13:40, Marc Zyngier wrote: > On 25/06/15 13:30, Christoffer Dall wrote: >> On Wed, Jun 24, 2015 at 10:32:38AM +0100, Marc Zyngier wrote: >>> Hi Christoffer, >>> >>> On 24/06/15 09:51, Christoffer Dall wrote: On Wed, Jun 24, 2015 at 09:29:56AM +0100, Marc Zyngier wrote: > On 22/06/15 09:44, Peter Maydell wrote: >> On 17 June 2015 at 10:00, Suzuki K. Poulose >> wrote: >>> From: "Suzuki K. Poulose" >>> >>> This patch adds a generic ARM v8 KVM target cpu type for use >>> by the new CPUs which eventualy ends up using the common sys_reg >>> table. For backward compatibility the existing targets have been >>> preserved. Any new target CPU that can be covered by generic v8 >>> sys_reg tables should make use of the new generic target. >> >> How do you intend this to work for cross-host migration? > > It is not meant to work for cross migration at all. > >> Is the idea that the kernel guarantees that "generic" looks >> 100% the same to the guest regardless of host hardware? I'm >> not sure that can be made to work, given impdef differences >> in ID register values, bp/wp registers, and so on. >> >> Given that, it seems to me that we still need to provide >> KVM_ARM_TARGET_$THISCPU defines so userspace can request >> a specific guest CPU flavour; so what does this patch >> provide that isn't already provided by just having userspace >> query for the "preferred" CPU type as it does already? > > The way I see this working is that a "generic" CPU cannot be migrated > (because we don't know anything about it). If it can be identified as a > known (non generic) implementation, then we can migrate it. > Concretely, how should this work? Be enforced by userspace or should we deny certain SET_ONE_REG operations from working on this target? >>> >>> I can definitely see MIDR overriding failing on a generic CPU. Also, you >>> shouldn't be able to select a generic CPU if we know about the physical CPU. >>> >> >> If I am migrating from an A57 to an A53, but the VM was started as >> "generic CPU" then we rely on the user discovering that this is not >> supported when trying to set the MIDR from userspace in KVM? > > A57 to A53 is probably a bad example because we actively support both. > So let's replace your A57 with an A72. With this patch, the A72 would be > reported as "generic CPU", and MIDR access would fail on the A53. Having thought about this a bit more... It should always be possible to emulate a "known" CPU on a generic host, and it should be able to migrate. The case we can't migrate is when we let the guest be generic (which I guess should really be unknown, and not generic). So if the user specify "-cpu cortex-a57" on the command line, the guest should be able to migrate from an A72 to an A53. if the user specified "-cpu host", the resulting guest won't be able to migrate. Does it make sense? Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/kvm: Add generic v8 KVM target
On 25/06/15 13:30, Christoffer Dall wrote: > On Wed, Jun 24, 2015 at 10:32:38AM +0100, Marc Zyngier wrote: >> Hi Christoffer, >> >> On 24/06/15 09:51, Christoffer Dall wrote: >>> On Wed, Jun 24, 2015 at 09:29:56AM +0100, Marc Zyngier wrote: On 22/06/15 09:44, Peter Maydell wrote: > On 17 June 2015 at 10:00, Suzuki K. Poulose > wrote: >> From: "Suzuki K. Poulose" >> >> This patch adds a generic ARM v8 KVM target cpu type for use >> by the new CPUs which eventualy ends up using the common sys_reg >> table. For backward compatibility the existing targets have been >> preserved. Any new target CPU that can be covered by generic v8 >> sys_reg tables should make use of the new generic target. > > How do you intend this to work for cross-host migration? It is not meant to work for cross migration at all. > Is the idea that the kernel guarantees that "generic" looks > 100% the same to the guest regardless of host hardware? I'm > not sure that can be made to work, given impdef differences > in ID register values, bp/wp registers, and so on. > > Given that, it seems to me that we still need to provide > KVM_ARM_TARGET_$THISCPU defines so userspace can request > a specific guest CPU flavour; so what does this patch > provide that isn't already provided by just having userspace > query for the "preferred" CPU type as it does already? The way I see this working is that a "generic" CPU cannot be migrated (because we don't know anything about it). If it can be identified as a known (non generic) implementation, then we can migrate it. >>> Concretely, how should this work? Be enforced by userspace or should we >>> deny certain SET_ONE_REG operations from working on this target? >> >> I can definitely see MIDR overriding failing on a generic CPU. Also, you >> shouldn't be able to select a generic CPU if we know about the physical CPU. >> > > If I am migrating from an A57 to an A53, but the VM was started as > "generic CPU" then we rely on the user discovering that this is not > supported when trying to set the MIDR from userspace in KVM? A57 to A53 is probably a bad example because we actively support both. So let's replace your A57 with an A72. With this patch, the A72 would be reported as "generic CPU", and MIDR access would fail on the A53. Admittedly, this is a bit late. We could also refuse to instantiate a "generic CPU" on A53, but that's not much better timing wise. Not sure we can do much better though. M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/kvm: Add generic v8 KVM target
On Wed, Jun 24, 2015 at 10:32:38AM +0100, Marc Zyngier wrote: > Hi Christoffer, > > On 24/06/15 09:51, Christoffer Dall wrote: > > On Wed, Jun 24, 2015 at 09:29:56AM +0100, Marc Zyngier wrote: > >> On 22/06/15 09:44, Peter Maydell wrote: > >>> On 17 June 2015 at 10:00, Suzuki K. Poulose > >>> wrote: > From: "Suzuki K. Poulose" > > This patch adds a generic ARM v8 KVM target cpu type for use > by the new CPUs which eventualy ends up using the common sys_reg > table. For backward compatibility the existing targets have been > preserved. Any new target CPU that can be covered by generic v8 > sys_reg tables should make use of the new generic target. > >>> > >>> How do you intend this to work for cross-host migration? > >> > >> It is not meant to work for cross migration at all. > >> > >>> Is the idea that the kernel guarantees that "generic" looks > >>> 100% the same to the guest regardless of host hardware? I'm > >>> not sure that can be made to work, given impdef differences > >>> in ID register values, bp/wp registers, and so on. > >>> > >>> Given that, it seems to me that we still need to provide > >>> KVM_ARM_TARGET_$THISCPU defines so userspace can request > >>> a specific guest CPU flavour; so what does this patch > >>> provide that isn't already provided by just having userspace > >>> query for the "preferred" CPU type as it does already? > >> > >> The way I see this working is that a "generic" CPU cannot be migrated > >> (because we don't know anything about it). If it can be identified as a > >> known (non generic) implementation, then we can migrate it. > >> > > Concretely, how should this work? Be enforced by userspace or should we > > deny certain SET_ONE_REG operations from working on this target? > > I can definitely see MIDR overriding failing on a generic CPU. Also, you > shouldn't be able to select a generic CPU if we know about the physical CPU. > If I am migrating from an A57 to an A53, but the VM was started as "generic CPU" then we rely on the user discovering that this is not supported when trying to set the MIDR from userspace in KVM? > > > > Also, can we imagine any scenario where the generic CPU cannot me > > modeled for a VM on a specific piece of hardware (current or future)? > > What is the definition of a generic CPU here? In the above, generic > really means "Unknown", so I can't immediately see what it would mean to > model this. > ok, good way to phrase it, I suppose that's a non-issue then. Thanks, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 10/12] KVM: arm64: guest debug, HW assisted debug support
Christoffer Dall writes: > On Thu, Jun 25, 2015 at 07:38:33AM +0100, Alex Bennée wrote: >> >> Christoffer Dall writes: >> >> > On Fri, Jun 19, 2015 at 01:23:48PM +0100, Alex Bennée wrote: >> >> This adds support for userspace to control the HW debug registers for >> >> guest debug. In the debug ioctl we copy the IMPDEF defined number of >> >> void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) >> >> { >> >> - if (vcpu->guest_debug) >> >> + if (vcpu->guest_debug) { >> >> restore_guest_debug_regs(vcpu); >> >> + >> >> + /* >> >> + * If we were using HW debug we need to restore the >> >> + * debug_ptr to the guest debug state. >> >> + */ >> >> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) >> >> + vcpu->arch.debug_ptr = &vcpu->arch.vcpu_debug_state; >> > >> > I still think this would be more cleanly done in the setup_debug >> > function, but ok: >> >> I don't follow, setup_debug is called before we enter KVM. It's pretty >> light when no debugging is being done so this ensure we leave state how >> we would like it when we stop debugging. >> >> I can move it to an else leg in setup if you really want. >> > I just feel like whenever you enter the guest you setup the state you > want for your guest and then when reading the code you never have to > worry about "did I set the pointer back correctly last time it exited", > but thinking about your response, I guess that's an extra store on each > world-switch, so theoretically that may be a bit more overhead (on top > of the hundreds other stores and spinlocks we take and stuff). The setup/clear() calls are tightly paired around the KVM_RUN ioctl code without any obvious exit points. Are there any cases you can escape the ioctl code flow? I notice irq's are re-enabled so I guess a suitably determined irq function could change the return address or mess around with guest_debug. > If you prefer, leave it like this, but consider adding a > BUG_ON(!guest_debugging && debug_ptr != &vcpu->arch.vcpu_debug_state) in > the setup function... The clear_debug() code would end up being a fairly sparse piece of code without it ;-) > I'm probably being paranoid. A little paranoia goes a long way in kernel mode ;-) > > -Christoffer -- Alex Bennée -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/11] kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file
On 22/06/15 19:04, Denis V. Lunev wrote: From: Andrey Smetanin This patch introduces Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All hyper-v kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin Signed-off-by: Denis V. Lunev CC: Paolo Bonzini CC: Gleb Natapov Paolo, Gleb, we are very sensitive on the first patch. Could you suggest which tree we should be based on? For now I think that we should use https://git.kernel.org/cgit/virt/kvm/kvm.git/ but the branch here could also matters. Den -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/11] kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file
On 25/06/2015 12:25, Denis V. Lunev wrote: > > Paolo, Gleb, > > we are very sensitive on the first patch. Could you > suggest which tree we should be based on? > For now I think that we should use > > https://git.kernel.org/cgit/virt/kvm/kvm.git/ > > but the branch here could also matters. You can use Linus's tree right now, but in general you should use branch "next" in that tree. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3.16.y-ckt 31/71] MIPS: KVM: Do not sign extend on unsigned MMIO load
3.16.7-ckt14 -stable review patch. If anyone has any objections, please let me know. -- From: Nicholas Mc Guire commit ed9244e6c534612d2b5ae47feab2f55a0d4b4ced upstream. Fix possible unintended sign extension in unsigned MMIO loads by casting to uint16_t in the case of mmio_needed != 2. Signed-off-by: Nicholas Mc Guire Reviewed-by: James Hogan Tested-by: James Hogan Cc: Gleb Natapov Cc: Paolo Bonzini Cc: kvm@vger.kernel.org Cc: linux-m...@linux-mips.org Cc: linux-ker...@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/9985/ Signed-off-by: Ralf Baechle [ luis: backported to 3.16: - file rename: emulate.c -> kvm_mips_emul.c ] Signed-off-by: Luis Henriques --- arch/mips/kvm/kvm_mips_emul.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index 2071472bc3c4..18b4e2fdae33 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -2130,7 +2130,7 @@ kvm_mips_complete_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run) if (vcpu->mmio_needed == 2) *gpr = *(int16_t *) run->mmio.data; else - *gpr = *(int16_t *) run->mmio.data; + *gpr = *(uint16_t *)run->mmio.data; break; case 1: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> -Original Message- > From: Joerg Roedel [mailto:j...@8bytes.org] > Sent: Wednesday, June 24, 2015 11:46 PM > To: Alex Williamson > Cc: Wu, Feng; Eric Auger; Avi Kivity; kvm@vger.kernel.org; > linux-ker...@vger.kernel.org; pbonz...@redhat.com; mtosa...@redhat.com > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote: > > There are plenty of details to be filled in, > > I also need to fill plenty of details in my head first, so here are some > suggestions based on my current understanding. Please don't hesitate to > correct me if where I got something wrong. > > So first I totally agree that the handling of PI/non-PI configurations > should be transparent to user-space. After thinking about this a bit more, I recall that why I used user-space to trigger the IRTE update for posted-interrupts, here is the reason: Let's take MSI for an example: When guest updates the MSI configuration, here is the code path in QEMU and KVM: vfio_update_msi() --> vfio_update_kvm_msi_virq() --> kvm_irqchip_update_msi_route() --> kvm_update_routing_entry() --> kvm_irqchip_commit_routes() --> kvm_irqchip_commit_routes() --> KVM_SET_GSI_ROUTING --> kvm_set_irq_routing() It will finally go to kvm_set_irq_routing() in KVM, there are two problem: 1. It use RCU in this function, it is hard to find which entry in the irq routing table is being updated. 2. Even we find the updated entry, it is hard to find the associated assigned device with this irq routing entry. So I used a VFIO API to notify KVM the updated MSI/MSIx configuration and the associated assigned devices. I think we need to find a way to address the above two issues before going forward. Alex, what is your opinion? Thanks a lot! Thanks, Feng > > I read a bit through the VT-d spec, and my understanding of posted > interrupts so far is that: > > 1) Each VCPU gets a PI-Descriptor with its pending Posted > Interrupts. This descriptor needs to be updated when a VCPU > is migrated to another PCPU and should thus be under control > of KVM. > > This is similar to the vAPIC backing page in the AMD version > of this, except that the PCPU routing information is stored > somewhere else on AMD. > > 2) As long as the VCPU runs the IRTEs are configured for > posting, when the VCPU goes to sleep the old remapped entry is > established again. So when the VCPU sleeps the interrupt > would get routed to VFIO and forwarded through the eventfd. > > This would be different to the AMD version, where we have a > running bit. When this is clear the IOMMU will trigger an event > in its event-log. This might need special handling in VFIO > ('might' because VFIO does not need to forward the interrupt, > it just needs to make sure the VCPU wakes up). > > Please correct me if my understanding of the Intel version is > wrong. > > So most of the data structures the IOMMU reads for this need to be > updated from KVM code (either x86-generic or AMD/Intel specific code), > as KVM has the information about VCPU load/unload and the IRQ routing. > > What KVM needs from VFIO are the informations about the physical > interrupts, and it makes total sense to attach them as metadata to the > eventfd. > > But the problems start at how this metadata should look like. It would > be good to have some generic description, but not sure if this is > possible. Otherwise this metadata would need to be requested by VFIO > from the IOMMU driver and passed on to KVM, which it then passes back to > the IOMMU driver. Or something like that. > > > > Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On 25.06.2015 11:10, Peter Maydell wrote: > On 25 June 2015 at 09:59, Claudio Fontana wrote: >> Once the VM is created, I think QEMU should not request kvm to >> change the virtual offset of the VM anymore: maybe an unexpected >> consequence of QEMU's target-arm/kvm64.c::kvm_arch_put_registers ? > > Hmm. In general we assume that we can: > * stop the VM > * read all the guest system registers > * write those values back again > * restart the VM > > if we need to. Is that what's happening here, or are we doing > something odder? > > -- PMM > What I guess could be happening by looking at the code in linux virt/kvm/arm/arch_timer.c::kvm_arm_timer_set_reg is that QEMU tries to set the KVM_REG_ARM_TIMER_CNT register from exactly the previous value, but just because of the fact that the set function is called, cntvoff is updated, since the value provided by the user is apparently assumed to be _relative_ to the physical timer. This is apparent to me in the code in that function which says: case KVM_REG_ARM_TIMER_CNT: { /* ... */ u64 cntvoff = kvm_phys_timer_read() - value; /* ... */ } And this is matched by the corresponding get function kvm_arm_timer_get_reg where it says: case KVM_REG_ARM_TIMER_CNT: return kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff; The time difference between when the GET is issued by QEMU and when the PUT is issued then would account for the difference. Thanks, Claudio -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On 25 June 2015 at 09:59, Claudio Fontana wrote: > Once the VM is created, I think QEMU should not request kvm to > change the virtual offset of the VM anymore: maybe an unexpected > consequence of QEMU's target-arm/kvm64.c::kvm_arch_put_registers ? Hmm. In general we assume that we can: * stop the VM * read all the guest system registers * write those values back again * restart the VM if we need to. Is that what's happening here, or are we doing something odder? -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
Hi Christoffer, On 25.06.2015 10:04, Christoffer Dall wrote: > On Wed, Jun 24, 2015 at 03:54:57PM +0100, Marc Zyngier wrote: >> Userspace is allowed to set the guest's view of CNTVCT, which turns >> into setting CNTVOFF for the whole VM. One thing userspace is not supposed >> to do is to update that register while the guest is running. Time will >> either move forward (best case) or backward (really bad idea). Either way, >> this shouldn't happen. >> >> This patch prevents userspace from touching CNTVOFF as soon as a vcpu >> has been started. This ensures that time will keep monotonically increase. >> >> Signed-off-by: Marc Zyngier >> --- >> >> QEMU seems to trigger this at boot time, and I have no idea why it does so. >> It would be good to find out, hence the RFC tag. > > Is this at kernel boot time you see this, or at system startup time? > > IIRC, QEMU creates a throw-away VM with the default CPU target time, > reads out all the system registers to get the KVM reset values of those, > then creates the real VM, and feeds back in all the system register > reset values, as a method for QEMU and KVM to be in sync about the reset > state of the machine. If we do this, and include CNTVCT, then that > would probably trigger this, but the VCPU really shouldn't have been run > at that time... > > We should prevent userspace from fiddling with this register post VCPU > start regardless, but yes, it would be good to find out why this is > happening in the first place. > > How did you notice this and does it manifest itself in some user-visible > ugliness? > > Thanks, > -Christoffer > You can read the whole history here: https://groups.google.com/forum/#!topic/osv-dev/2w101csH65E It causes clock-related bugs with time jumping backward when relying on the virtual counter register in the guest, whenever a cpu is booted (primary, secondary via PSCI), and actually whenever the monitor is used to stop, info registers etc. Once the VM is created, I think QEMU should not request kvm to change the virtual offset of the VM anymore: maybe an unexpected consequence of QEMU's target-arm/kvm64.c::kvm_arch_put_registers ? Thanks, Claudio -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
Hi Christoffer, On 25/06/15 09:04, Christoffer Dall wrote: > On Wed, Jun 24, 2015 at 03:54:57PM +0100, Marc Zyngier wrote: >> Userspace is allowed to set the guest's view of CNTVCT, which turns >> into setting CNTVOFF for the whole VM. One thing userspace is not supposed >> to do is to update that register while the guest is running. Time will >> either move forward (best case) or backward (really bad idea). Either way, >> this shouldn't happen. >> >> This patch prevents userspace from touching CNTVOFF as soon as a vcpu >> has been started. This ensures that time will keep monotonically increase. >> >> Signed-off-by: Marc Zyngier >> --- >> >> QEMU seems to trigger this at boot time, and I have no idea why it does so. >> It would be good to find out, hence the RFC tag. > > Is this at kernel boot time you see this, or at system startup time? When the kernel boots. The OSV guys also have seen this (time going backward), and this patch seems to have solved their problem. > IIRC, QEMU creates a throw-away VM with the default CPU target time, > reads out all the system registers to get the KVM reset values of those, > then creates the real VM, and feeds back in all the system register > reset values, as a method for QEMU and KVM to be in sync about the reset > state of the machine. If we do this, and include CNTVCT, then that > would probably trigger this, but the VCPU really shouldn't have been run > at that time... I definitely see this process happening, but that doesn't seem to be the problem. > We should prevent userspace from fiddling with this register post VCPU > start regardless, but yes, it would be good to find out why this is > happening in the first place. My main problem is that I cannot easily correlate what is happening in the guest at that time with something touching CNTVCT. That's just odd. > How did you notice this and does it manifest itself in some user-visible > ugliness? It was first reported by the OSV guys, and I then spotted some interesting hrtimers taking too long at guest boot time. Putting some traces quickly identified the issue. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC 0/6] KVM: arm/arm64: gsi routing support
Hi! > But personally I would prefer we > check irqchip routing entries have flat mapping, ie gsi = irqchip.pin > since in any case we don't want/expect the userspace to play with that. Why? On x86 userspace perfectly can play with it. We can imagine some very new qemu version in future which has all arch-specific kludges like "direct mapping" removed and relying fully on routing which it sets up from scratch, in the same way as x86 qemu does this. Or we can imagine some new, legacy-free hypervisor implementation. The gsi == irqchip.pin limitation is just what we have now by default, for backwards compatibility. But by design we were never obliged to stick to this. Well, it's just MHO... Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
On Wed, Jun 24, 2015 at 03:54:57PM +0100, Marc Zyngier wrote: > Userspace is allowed to set the guest's view of CNTVCT, which turns > into setting CNTVOFF for the whole VM. One thing userspace is not supposed > to do is to update that register while the guest is running. Time will > either move forward (best case) or backward (really bad idea). Either way, > this shouldn't happen. > > This patch prevents userspace from touching CNTVOFF as soon as a vcpu > has been started. This ensures that time will keep monotonically increase. > > Signed-off-by: Marc Zyngier > --- > > QEMU seems to trigger this at boot time, and I have no idea why it does so. > It would be good to find out, hence the RFC tag. Is this at kernel boot time you see this, or at system startup time? IIRC, QEMU creates a throw-away VM with the default CPU target time, reads out all the system registers to get the KVM reset values of those, then creates the real VM, and feeds back in all the system register reset values, as a method for QEMU and KVM to be in sync about the reset state of the machine. If we do this, and include CNTVCT, then that would probably trigger this, but the VCPU really shouldn't have been run at that time... We should prevent userspace from fiddling with this register post VCPU start regardless, but yes, it would be good to find out why this is happening in the first place. How did you notice this and does it manifest itself in some user-visible ugliness? Thanks, -Christoffer > > virt/kvm/arm/arch_timer.c | 21 +++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c > index 98c95f2..660f348 100644 > --- a/virt/kvm/arm/arch_timer.c > +++ b/virt/kvm/arm/arch_timer.c > @@ -220,9 +220,26 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 > regid, u64 value) > case KVM_REG_ARM_TIMER_CTL: > timer->cntv_ctl = value; > break; > - case KVM_REG_ARM_TIMER_CNT: > - vcpu->kvm->arch.timer.cntvoff = kvm_phys_timer_read() - value; > + case KVM_REG_ARM_TIMER_CNT: { > + struct kvm_vcpu *tmp; > + int i; > + u64 cntvoff = kvm_phys_timer_read() - value; > + > + /* > + * If any of the vcpus has started, don't update > + * CNTVOFF. Userspace is severely brain damaged. > + */ > + kvm_for_each_vcpu(i, tmp, vcpu->kvm) { > + if (tmp->arch.has_run_once) { > + kvm_debug("Won't set CNTVOFF to %llx (%llx)\n", > + cntvoff, > + vcpu->kvm->arch.timer.cntvoff); > + return -1; > + } > + } > + vcpu->kvm->arch.timer.cntvoff = cntvoff; > break; > + } > case KVM_REG_ARM_TIMER_CVAL: > timer->cntv_cval = value; > break; > -- > 2.1.4 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 10/12] KVM: arm64: guest debug, HW assisted debug support
On Thu, Jun 25, 2015 at 07:38:33AM +0100, Alex Bennée wrote: > > Christoffer Dall writes: > > > On Fri, Jun 19, 2015 at 01:23:48PM +0100, Alex Bennée wrote: > >> This adds support for userspace to control the HW debug registers for > >> guest debug. In the debug ioctl we copy the IMPDEF defined number of > > > > s/defined// > > > >> registers into a new register set called host_debug_state. There is now > >> a new vcpu parameter called debug_ptr which selects which register set > >> is to copied into the real registers when world switch occurs. > > > > But this patch doesn't seem to add the debug_ptr field? > > Oops, yes the comment belongs to the previous patch. > > > > > s/to// > > > >> > >> I've moved some helper functions into the hw_breakpoint.h header for > >> re-use. > >> > >> As with single step we need to tweak the guest registers to enable the > >> exceptions so we need to save and restore those bits. > >> > >> Two new capabilities have been added to the KVM_EXTENSION ioctl to allow > >> userspace to query the number of hardware break and watch points > >> available on the host hardware. > >> > >> Signed-off-by: Alex Bennée > >> > >> --- > >> v2 > >>- switched to C setup > >>- replace host debug registers directly into context > >>- minor tweak to api docs > >>- setup right register for debug > >>- add FAR_EL2 to debug exit structure > >>- add support for trapping debug register access > >> v3 > >>- remove stray trace statement > >>- fix spacing around operators (various) > >>- clean-up usage of trap_debug > >>- introduce debug_ptr, replace excessive memcpy stuff > >>- don't use memcpy in ioctl, just assign > >>- update cap ioctl documentation > >>- reword a number comments > >>- rename host_debug_state->external_debug_state > >> v4 > >>- use the new u32/u64 split debug_ptr approach > >>- fix some wording/comments > >> v5 > >>- don't set MDSCR_EL1.KDE (not needed) > >> v6 > >>- update wording given change in commentary > >>- KVM_GUESTDBG_USE_HW_BP->KVM_GUESTDBG_USE_HW > >> --- > >> Documentation/virtual/kvm/api.txt | 7 ++- > >> arch/arm/kvm/arm.c | 7 +++ > >> arch/arm64/include/asm/hw_breakpoint.h | 12 +++ > >> arch/arm64/include/asm/kvm_host.h | 6 +- > >> arch/arm64/kernel/hw_breakpoint.c | 12 --- > >> arch/arm64/kvm/debug.c | 37 > >> +- > >> arch/arm64/kvm/handle_exit.c | 6 ++ > >> arch/arm64/kvm/reset.c | 12 +++ > >> include/uapi/linux/kvm.h | 2 ++ > >> 9 files changed, 82 insertions(+), 19 deletions(-) > >> > >> diff --git a/Documentation/virtual/kvm/api.txt > >> b/Documentation/virtual/kvm/api.txt > >> index 33c8143..ada57df 100644 > >> --- a/Documentation/virtual/kvm/api.txt > >> +++ b/Documentation/virtual/kvm/api.txt > >> @@ -2668,7 +2668,7 @@ The top 16 bits of the control field are > >> architecture specific control > >> flags which can include the following: > >> > >>- KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] > >> - - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] > >> + - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, > >> arm64] > >>- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] > >>- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] > >>- KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] > >> @@ -2683,6 +2683,11 @@ updated to the correct (supplied) values. > >> The second part of the structure is architecture specific and > >> typically contains a set of debug registers. > >> > >> +For arm64 the number of debug registers is implementation defined and > >> +can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and > >> +KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number > >> +indicating the number of supported registers. > >> + > >> When debug events exit the main run loop with the reason > >> KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run > >> structure containing architecture specific debug information. > >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > >> index 0d17c7b..60c4045 100644 > >> --- a/arch/arm/kvm/arm.c > >> +++ b/arch/arm/kvm/arm.c > >> @@ -307,6 +307,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > >> > >> #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE |\ > >>KVM_GUESTDBG_USE_SW_BP | \ > >> + KVM_GUESTDBG_USE_HW | \ > >>KVM_GUESTDBG_SINGLESTEP) > >> > >> /** > >> @@ -327,6 +328,12 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct > >> kvm_vcpu *vcpu, > >> > >>if (dbg->control & KVM_GUESTDBG_ENABLE) { > >>vcpu->guest_debug = dbg->control; > >> + > >> + /* Hardware assisted Break and Watch points */ > >> +
Re: [PATCH v6 09/12] KVM: arm64: introduce vcpu->arch.debug_ptr
On Thu, Jun 25, 2015 at 07:32:27AM +0100, Alex Bennée wrote: > > Christoffer Dall writes: > > > On Fri, Jun 19, 2015 at 01:23:47PM +0100, Alex Bennée wrote: > >> This introduces a level of indirection for the debug registers. Instead > >> of using the sys_regs[] directly we store registers in a structure in > >> the vcpu. As we are no longer tied to the layout of the sys_regs[] we > >> can make the copies size appropriate for control and value registers. > >> > >> This also entails updating the sys_regs code to access this new > >> structure. Instead of passing a register index we now pass an offset > >> into the kvm_guest_debug_arch structure. > >> > >> We also need to ensure the GET/SET_ONE_REG ioctl operations store the > >> registers in their correct location. > >> > >> Signed-off-by: Alex Bennée > >> > >> --- > >> v6: > >> - fix up some ws issues > >> - correct clobber info > >> - re-word commentary in kvm_host.h > >> - fix endian access issues for aarch32 fields > >> - revert all KVM_GET/SET_ONE_REG to 64bit (also see ABI update) > >> --- > > >> > >> +/* Used when AArch32 kernels trap to mapped debug registers */ > >> +static inline bool trap_debug32(struct kvm_vcpu *vcpu, > >> + const struct sys_reg_params *p, > >> + const struct sys_reg_desc *rd) > >> +{ > >> + __u32 *r = (__u32 *) ((void * )&vcpu->arch.vcpu_debug_state + rd->reg); > > > > This still looks like something that's asking for BE trouble. Why not > > access the register as a __u64 as it is and then only special-case it > > somehow for the XVR thingy... Perhaps a separate function, see below. > > > >> + if (p->is_write) { > >> + *r = *vcpu_reg(vcpu, p->Rt); > >> + vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY; > >> + } else { > >> + *vcpu_reg(vcpu, p->Rt) = *r; > >> + } > >> + > >> + return true; > >> +} > >> + > >> +static inline bool trap_debug64(struct kvm_vcpu *vcpu, > >> + const struct sys_reg_params *p, > >> + const struct sys_reg_desc *rd) > >> +{ > >> + __u64 *r = (__u64 *) ((void * )&vcpu->arch.vcpu_debug_state + rd->reg); > >> + if (p->is_write) { > >> + *r = *vcpu_reg(vcpu, p->Rt); > >> + vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY; > >> + } else { > >> + *vcpu_reg(vcpu, p->Rt) = *r; > >> + } > >> + > >> + return true; > >> +} > >> + > >> +static inline void reset_debug64(struct kvm_vcpu *vcpu, const struct > >> sys_reg_desc *rd) > >> +{ > >> + __u64 *r = (__u64 *) ((void * )&vcpu->arch.vcpu_debug_state + rd->reg); > >> + *r = rd->val; > >> +} > >> + > >> static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct > >> sys_reg_desc *r) > >> { > >>u64 amair; > >> @@ -240,16 +277,20 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const > >> struct sys_reg_desc *r) > >> #define DBG_BCR_BVR_WCR_WVR_EL1(n) > >> \ > >>/* DBGBVRn_EL1 */ \ > >>{ Op0(0b10), Op1(0b000), CRn(0b), CRm((n)), Op2(0b100), \ > >> -trap_debug_regs, reset_val, (DBGBVR0_EL1 + (n)), 0 }, \ > >> +trap_debug64, reset_debug64, \ > >> +offsetof(struct kvm_guest_debug_arch, dbg_bvr[(n)]), 0 }, \ > >>/* DBGBCRn_EL1 */ \ > >>{ Op0(0b10), Op1(0b000), CRn(0b), CRm((n)), Op2(0b101), \ > >> -trap_debug_regs, reset_val, (DBGBCR0_EL1 + (n)), 0 }, \ > >> +trap_debug64, reset_debug64, \ > >> +offsetof(struct kvm_guest_debug_arch, dbg_bcr[(n)]), 0}, \ > >>/* DBGWVRn_EL1 */ \ > >>{ Op0(0b10), Op1(0b000), CRn(0b), CRm((n)), Op2(0b110), \ > >> -trap_debug_regs, reset_val, (DBGWVR0_EL1 + (n)), 0 }, \ > >> +trap_debug64, reset_debug64, \ > >> +offsetof(struct kvm_guest_debug_arch, dbg_wvr[(n)]), 0 }, \ > >>/* DBGWCRn_EL1 */ \ > >>{ Op0(0b10), Op1(0b000), CRn(0b), CRm((n)), Op2(0b111), \ > >> -trap_debug_regs, reset_val, (DBGWCR0_EL1 + (n)), 0 } > >> +trap_debug64, reset_debug64, \ > >> +offsetof(struct kvm_guest_debug_arch, dbg_wcr[(n)]), 0} > >> > >> /* > >> * Architected system registers. > >> @@ -502,42 +543,51 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu, > >>} > >> } > >> > >> -static bool trap_debug32(struct kvm_vcpu *vcpu, > >> - const struct sys_reg_params *p, > >> - const struct sys_reg_desc *r) > >> -{ > >> - if (p->is_write) { > >> - vcpu_cp14(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt); > >> - vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY; > >> - } else { > >> - *vcpu_reg(vcpu, p->Rt) = vcpu_c