Re: [PATCH v2 00/94] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
Hi Zengtao, On 2020-04-16 02:38, Zengtao (B) wrote: Hi Marc: Got it. Really a bit patch set :) Well, yeah... ;-) BTW, I have done a basic kvm unit test git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git And I find that after apply the patch KVM: arm64: VNCR-ize ELR_EL1, The psci test failed for some reason, I can't understand why, this is only the test result.(find the patch by git bisect + kvm test) That it is that mechanical, we should be able to quickly nail that one. My platform: Hisilicon D06 board. Linux kernel: Linux 5.6-rc6 + nv patches(some rebases) Could you help to take a look? I'll have a look tomorrow. I'm in the middle of refactoring the series for 5.7, and things have changed quite a bit. Hopefully this isn't a VHE vs non-VHE issue. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/8] arm64: cpufeature: Add an overview comment for the cpufeature framework
On Thu, Apr 16, 2020 at 03:59:39PM +0100, Suzuki K Poulose wrote: > On 04/14/2020 10:31 PM, Will Deacon wrote: > > Now that Suzuki isn't within throwing distance, I thought I'd better add > > a rough overview comment to cpufeature.c so that it doesn't take me days > > to remember how it works next time. > > > > Signed-off-by: Will Deacon > > --- > > arch/arm64/kernel/cpufeature.c | 43 ++ > > 1 file changed, 43 insertions(+) > > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index 680a453ca8c4..421ca99dc8fc 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -3,6 +3,49 @@ > >* Contains CPU feature definitions > >* > >* Copyright (C) 2015 ARM Ltd. > > + * > > + * A note for the weary kernel hacker: the code here is confusing and hard > > to > > + * follow! That's partly because it's solving a nasty problem, but also > > because > > + * there's a little bit of over-abstraction that tends to obscure what's > > going > > + * on behind a maze of helper functions and macros. > > Thanks for writing this up ! It's purely a selfish thing ;) > > + * The basic problem is that hardware folks have started gluing together > > CPUs > > + * with distinct architectural features; in some cases even creating SoCs > > where > > + * user-visible instructions are available only on a subset of the > > available > > + * cores. We try to address this by snapshotting the feature registers of > > the > > + * boot CPU and comparing these with the feature registers of each > > secondary > > + * CPU when bringing them up. If there is a mismatch, then we update the > > + * snapshot state to indicate the lowest-common denominator of the feature, > > + * known as the "safe" value. This snapshot state can be queried to view > > the > > I am not sure if the following is implied above. > > 1) Against the "snapshot" state, where mismatches triggers updating > the "snapshot" state to reflect the "safe" value. > > 2) Compared against the CPU feature registers of *the boot CPU* for > "FTR_STRICT" fields and any mismatch triggers TAINT_CPU_OUT_OF_SPEC. > This makes sure that warning is generated for each OUT_OF_SPEC > secondary CPU. I was trying to avoid talking about the consequences of a mismatch in that paragraph, and instead cover them below: > > + * The sanitised register values are used to decide which capabilities we > > + * have in the system. These may be in the form of traditional "hwcaps" > > + * advertised to userspace or internal "cpucaps" which are used to > > configure > > + * things like alternative patching and static keys. While a feature > > mismatch > > + * may result in a TAINT_CPU_OUT_OF_SPEC kernel taint, a capability > > mismatch > > + * may prevent a CPU from being onlined at all. Do you think something is missing here? > > + * > > + * Some implementation details worth remembering: > > + * > > + * - Mismatched features are *always* sanitised to a "safe" value, which > > + * usually indicates that the feature is not supported. > > + * > > + * - A mismatched feature marked with FTR_STRICT will cause a "SANITY > > CHECK" > > + * warning when onlining an offending CPU and the kernel will be tainted > > + * with TAINT_CPU_OUT_OF_SPEC. > > As mentioned above, this check is against that of the "boot CPU" > register state, which may not be implicit from the statement. Hmm, I'm trying to figure out if this matters. I suppose this means you get a SANITY CHECK warning for every mismatching secondary CPU, but that's implied by the above. Is there something else I'm missing? > > + * > > + * - Features marked as FTR_VISIBLE have their sanitised value visible to > > + * userspace. FTR_VISIBLE features in registers that are only visible > > + * to EL0 by trapping *must* have a corresponding HWCAP so that late > > + * onlining of CPUs cannot lead to features disappearing at runtime. > > + * > > As you mentioned in the other response we could add information about > the guest view, something like : > > - KVM exposes the sanitised value of the feature registers to the > guests and is not affected by the FTR_VISIBLE. However, > depending on the individual feature support in the hypervisor, > some of the fields may be capped/limited. In light of Marc's comment, I'll add something here along the lines of: "KVM exposes its own view of the feature registers to guest operating systems regardless of FTR_VISIBLE. This is typically driven from the sanitised register values to allow virtual CPUs to be migrated between arbitrary physical CPUs, but some features not present on the host are also advertised and emulated. Look at sys_reg_descs[] for the gory details." Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu
Re: [PATCH] KVM: arm64: Drop PTE_S2_MEMATTR_MASK
On 2020-04-16 18:05, Will Deacon wrote: On Thu, Apr 16, 2020 at 04:36:05PM +0100, Marc Zyngier wrote: On Wed, 15 Apr 2020 18:57:46 +0800 Zenghui Yu wrote: > The only user of PTE_S2_MEMATTR_MASK macro had been removed since > commit a501e32430d4 ("arm64: Clean up the default pgprot setting"). > It has been about six years and no one has used it again. > > Let's drop it. > > Signed-off-by: Zenghui Yu > --- > arch/arm64/include/asm/pgtable-hwdef.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h > index 6bf5e650da78..99315bdca0e6 100644 > --- a/arch/arm64/include/asm/pgtable-hwdef.h > +++ b/arch/arm64/include/asm/pgtable-hwdef.h > @@ -190,7 +190,6 @@ > * Memory Attribute override for Stage-2 (MemAttr[3:0]) > */ > #define PTE_S2_MEMATTR(t) (_AT(pteval_t, (t)) << 2) > -#define PTE_S2_MEMATTR_MASK (_AT(pteval_t, 0xf) << 2) > > /* > * EL2/HYP PTE/PMD definitions Looks good to me. Catalin, Will: do you want to take this directly? If so please add my: Acked-by: Marc Zyngier Otherwise, I'll route it via the KVM tree. I can take it for 5.8 if it's not urgent. It has been there 6 years, I think we can cope with another few months... ;-) Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 2/2] KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
On 2020-04-16 02:17, Zenghui Yu wrote: On 2020/4/14 11:03, Zenghui Yu wrote: If we're going to fail out the vgic_add_lpi(), let's make sure the allocated vgic_irq memory is also freed. Though it seems that both cases are unlikely to fail. Cc: Zengruan Ye Signed-off-by: Zenghui Yu --- virt/kvm/arm/vgic/vgic-its.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index d53d34a33e35..3c3b6a0f2dce 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -98,12 +98,16 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid, * the respective config data from memory here upon mapping the LPI. */ ret = update_lpi_config(kvm, irq, NULL, false); - if (ret) + if (ret) { + kfree(irq); return ERR_PTR(ret); + } ret = vgic_v3_lpi_sync_pending_status(kvm, irq); - if (ret) + if (ret) { + kfree(irq); return ERR_PTR(ret); + } Looking at it again, I realized that this error handling is still not complete. Maybe we should use a vgic_put_irq() instead so that we can also properly delete the vgic_irq from lpi_list. Yes, this is a more correct fix indeed. There is still a bit of a bizarre behaviour if you have two vgic_add_lpi() racing to create the same interrupt, which is pretty dodgy anyway (it means we have two MAPI at the same time...). You end-up with re-reading the state from memory... Oh well. Marc, what do you think? Could you please help to fix it, or I can resend it. I've fixed it as such (with a comment for a good measure): diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 3c3b6a0f2dce..c012a52b19f5 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -96,16 +96,19 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid, * We "cache" the configuration table entries in our struct vgic_irq's. * However we only have those structs for mapped IRQs, so we read in * the respective config data from memory here upon mapping the LPI. +* +* Should any of these fail, behave as if we couldn't create the LPI +* by dropping the refcount and returning the error. */ ret = update_lpi_config(kvm, irq, NULL, false); if (ret) { - kfree(irq); + vgic_put_irq(kvm, irq); return ERR_PTR(ret); } ret = vgic_v3_lpi_sync_pending_status(kvm, irq); if (ret) { - kfree(irq); + vgic_put_irq(kvm, irq); return ERR_PTR(ret); } Let me know if you agree with that. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Drop PTE_S2_MEMATTR_MASK
On Thu, Apr 16, 2020 at 04:36:05PM +0100, Marc Zyngier wrote: > On Wed, 15 Apr 2020 18:57:46 +0800 > Zenghui Yu wrote: > > > The only user of PTE_S2_MEMATTR_MASK macro had been removed since > > commit a501e32430d4 ("arm64: Clean up the default pgprot setting"). > > It has been about six years and no one has used it again. > > > > Let's drop it. > > > > Signed-off-by: Zenghui Yu > > --- > > arch/arm64/include/asm/pgtable-hwdef.h | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h > > b/arch/arm64/include/asm/pgtable-hwdef.h > > index 6bf5e650da78..99315bdca0e6 100644 > > --- a/arch/arm64/include/asm/pgtable-hwdef.h > > +++ b/arch/arm64/include/asm/pgtable-hwdef.h > > @@ -190,7 +190,6 @@ > > * Memory Attribute override for Stage-2 (MemAttr[3:0]) > > */ > > #define PTE_S2_MEMATTR(t) (_AT(pteval_t, (t)) << 2) > > -#define PTE_S2_MEMATTR_MASK(_AT(pteval_t, 0xf) << 2) > > > > /* > > * EL2/HYP PTE/PMD definitions > > Looks good to me. Catalin, Will: do you want to take this directly? If > so please add my: > > Acked-by: Marc Zyngier > > Otherwise, I'll route it via the KVM tree. I can take it for 5.8 if it's not urgent. Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM/arm64: Support enabling dirty log gradually in small chunks
On 16/04/20 17:09, Marc Zyngier wrote: > On Wed, 15 Apr 2020 18:13:56 +0200 > Paolo Bonzini wrote: > >> On 13/04/20 14:20, Keqian Zhu wrote: >>> There is already support of enabling dirty log graually in small chunks >>> for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in >>> small chunks"). This adds support for arm64. >>> >>> x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET >>> is eanbled. However, for arm64, both huge pages and normal pages can be >>> write protected gradually by userspace. >>> >>> Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G >>> Linux VMs with different page size. The memory pressure is 127G in each >>> case. The time taken of memory_global_dirty_log_start in QEMU is listed >>> below: >>> >>> Page Size BeforeAfter Optimization >>> 4K650ms 1.8ms >>> 2M 4ms 1.8ms >>> 1G 2ms 1.8ms >>> >>> Besides the time reduction, the biggest income is that we will minimize >>> the performance side effect (because of dissloving huge pages and marking >>> memslots dirty) on guest after enabling dirty log. >>> >>> Signed-off-by: Keqian Zhu >>> --- >>> Documentation/virt/kvm/api.rst| 2 +- >>> arch/arm64/include/asm/kvm_host.h | 3 +++ >>> virt/kvm/arm/mmu.c| 12 ++-- >>> 3 files changed, 14 insertions(+), 3 deletions(-) >>> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst >>> index efbbe570aa9b..0017f63fa44f 100644 >>> --- a/Documentation/virt/kvm/api.rst >>> +++ b/Documentation/virt/kvm/api.rst >>> @@ -5777,7 +5777,7 @@ will be initialized to 1 when created. This also >>> improves performance because >>> dirty logging can be enabled gradually in small chunks on the first call >>> to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on >>> KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on >>> -x86 for now). >>> +x86 and arm64 for now). >>> >>> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name >>> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make >>> diff --git a/arch/arm64/include/asm/kvm_host.h >>> b/arch/arm64/include/asm/kvm_host.h >>> index 32c8a675e5a4..a723f84fab83 100644 >>> --- a/arch/arm64/include/asm/kvm_host.h >>> +++ b/arch/arm64/include/asm/kvm_host.h >>> @@ -46,6 +46,9 @@ >>> #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) >>> #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4) >>> >>> +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | >>> \ >>> +KVM_DIRTY_LOG_INITIALLY_SET) >>> + >>> DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); >>> >>> extern unsigned int kvm_sve_max_vl; >>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>> index e3b9ee268823..1077f653a611 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -2265,8 +2265,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, >>> * allocated dirty_bitmap[], dirty pages will be be tracked while the >>> * memory slot is write protected. >>> */ >>> - if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) >>> - kvm_mmu_wp_memory_region(kvm, mem->slot); >>> + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { >>> + /* >>> +* If we're with initial-all-set, we don't need to write >>> +* protect any pages because they're all reported as dirty. >>> +* Huge pages and normal pages will be write protect gradually. >>> +*/ >>> + if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) { >>> + kvm_mmu_wp_memory_region(kvm, mem->slot); >>> + } >>> + } >>> } >>> >>> int kvm_arch_prepare_memory_region(struct kvm *kvm, >>> >> >> Marc, what is the status of this patch? > > I just had a look at it. Is there any urgency for merging it? No, I thought I was still replying to the v1. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Drop PTE_S2_MEMATTR_MASK
On Wed, 15 Apr 2020 18:57:46 +0800 Zenghui Yu wrote: > The only user of PTE_S2_MEMATTR_MASK macro had been removed since > commit a501e32430d4 ("arm64: Clean up the default pgprot setting"). > It has been about six years and no one has used it again. > > Let's drop it. > > Signed-off-by: Zenghui Yu > --- > arch/arm64/include/asm/pgtable-hwdef.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h > b/arch/arm64/include/asm/pgtable-hwdef.h > index 6bf5e650da78..99315bdca0e6 100644 > --- a/arch/arm64/include/asm/pgtable-hwdef.h > +++ b/arch/arm64/include/asm/pgtable-hwdef.h > @@ -190,7 +190,6 @@ > * Memory Attribute override for Stage-2 (MemAttr[3:0]) > */ > #define PTE_S2_MEMATTR(t)(_AT(pteval_t, (t)) << 2) > -#define PTE_S2_MEMATTR_MASK (_AT(pteval_t, 0xf) << 2) > > /* > * EL2/HYP PTE/PMD definitions Looks good to me. Catalin, Will: do you want to take this directly? If so please add my: Acked-by: Marc Zyngier Otherwise, I'll route it via the KVM tree. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/8] arm64: cpufeature: Add an overview comment for the cpufeature framework
On 2020-04-16 15:59, Suzuki K Poulose wrote: Hi Suzuki, [...] As you mentioned in the other response we could add information about the guest view, something like : - KVM exposes the sanitised value of the feature registers to the guests and is not affected by the FTR_VISIBLE. However, depending on the individual feature support in the hypervisor, some of the fields may be capped/limited. Although in most cases, what KVM exposes is indeed a strict subset of the host's features, there is a few corner cases where we expose features that do not necessarily exist on the host. For example ARMv8.5-GTG and ARMv8.4-TTL get exposed by the NV patches even if they don't exist on the host, as KVM will actually emulate them. Not a big deal, but I just wanted to outline that it isn't as clear-cut as it may seem... Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM/arm64: Support enabling dirty log gradually in small chunks
On Wed, 15 Apr 2020 18:13:56 +0200 Paolo Bonzini wrote: > On 13/04/20 14:20, Keqian Zhu wrote: > > There is already support of enabling dirty log graually in small chunks > > for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in > > small chunks"). This adds support for arm64. > > > > x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET > > is eanbled. However, for arm64, both huge pages and normal pages can be > > write protected gradually by userspace. > > > > Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G > > Linux VMs with different page size. The memory pressure is 127G in each > > case. The time taken of memory_global_dirty_log_start in QEMU is listed > > below: > > > > Page Size BeforeAfter Optimization > > 4K650ms 1.8ms > > 2M 4ms 1.8ms > > 1G 2ms 1.8ms > > > > Besides the time reduction, the biggest income is that we will minimize > > the performance side effect (because of dissloving huge pages and marking > > memslots dirty) on guest after enabling dirty log. > > > > Signed-off-by: Keqian Zhu > > --- > > Documentation/virt/kvm/api.rst| 2 +- > > arch/arm64/include/asm/kvm_host.h | 3 +++ > > virt/kvm/arm/mmu.c| 12 ++-- > > 3 files changed, 14 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index efbbe570aa9b..0017f63fa44f 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -5777,7 +5777,7 @@ will be initialized to 1 when created. This also > > improves performance because > > dirty logging can be enabled gradually in small chunks on the first call > > to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on > > KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on > > -x86 for now). > > +x86 and arm64 for now). > > > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make > > diff --git a/arch/arm64/include/asm/kvm_host.h > > b/arch/arm64/include/asm/kvm_host.h > > index 32c8a675e5a4..a723f84fab83 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -46,6 +46,9 @@ > > #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) > > #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4) > > > > +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | > > \ > > +KVM_DIRTY_LOG_INITIALLY_SET) > > + > > DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); > > > > extern unsigned int kvm_sve_max_vl; > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > > index e3b9ee268823..1077f653a611 100644 > > --- a/virt/kvm/arm/mmu.c > > +++ b/virt/kvm/arm/mmu.c > > @@ -2265,8 +2265,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, > > * allocated dirty_bitmap[], dirty pages will be be tracked while the > > * memory slot is write protected. > > */ > > - if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) > > - kvm_mmu_wp_memory_region(kvm, mem->slot); > > + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > + /* > > +* If we're with initial-all-set, we don't need to write > > +* protect any pages because they're all reported as dirty. > > +* Huge pages and normal pages will be write protect gradually. > > +*/ > > + if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) { > > + kvm_mmu_wp_memory_region(kvm, mem->slot); > > + } > > + } > > } > > > > int kvm_arch_prepare_memory_region(struct kvm *kvm, > > > > Marc, what is the status of this patch? I just had a look at it. Is there any urgency for merging it? Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM/arm64: Support enabling dirty log gradually in small chunks
On Mon, 13 Apr 2020 20:20:23 +0800 Keqian Zhu wrote: > There is already support of enabling dirty log graually in small chunks gradually > for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in > small chunks"). This adds support for arm64. > > x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET > is eanbled. However, for arm64, both huge pages and normal pages can be enabled > write protected gradually by userspace. > > Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G > Linux VMs with different page size. The memory pressure is 127G in each > case. The time taken of memory_global_dirty_log_start in QEMU is listed > below: > > Page Size BeforeAfter Optimization > 4K650ms 1.8ms > 2M 4ms 1.8ms > 1G 2ms 1.8ms These numbers are different from what you have advertised before. What changed? > > Besides the time reduction, the biggest income is that we will minimize s/income/improvement/ > the performance side effect (because of dissloving huge pages and marking dissolving > memslots dirty) on guest after enabling dirty log. > > Signed-off-by: Keqian Zhu > --- > Documentation/virt/kvm/api.rst| 2 +- > arch/arm64/include/asm/kvm_host.h | 3 +++ > virt/kvm/arm/mmu.c| 12 ++-- > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index efbbe570aa9b..0017f63fa44f 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -5777,7 +5777,7 @@ will be initialized to 1 when created. This also > improves performance because > dirty logging can be enabled gradually in small chunks on the first call > to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on > KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on > -x86 for now). > +x86 and arm64 for now). > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index 32c8a675e5a4..a723f84fab83 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -46,6 +46,9 @@ > #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) > #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4) > > +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ > + KVM_DIRTY_LOG_INITIALLY_SET) > + > DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); > > extern unsigned int kvm_sve_max_vl; > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index e3b9ee268823..1077f653a611 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -2265,8 +2265,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, >* allocated dirty_bitmap[], dirty pages will be be tracked while the >* memory slot is write protected. >*/ > - if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) > - kvm_mmu_wp_memory_region(kvm, mem->slot); > + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > + /* > + * If we're with initial-all-set, we don't need to write > + * protect any pages because they're all reported as dirty. > + * Huge pages and normal pages will be write protect gradually. > + */ > + if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) { > + kvm_mmu_wp_memory_region(kvm, mem->slot); > + } > + } > } > > int kvm_arch_prepare_memory_region(struct kvm *kvm, As it is, it is pretty good. The one thing that isn't clear to me is why we have a difference in behaviour between x86 and arm64. What prevents x86 from having the same behaviour as arm64? Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/8] arm64: cpufeature: Add an overview comment for the cpufeature framework
Hi Will, On 04/14/2020 10:31 PM, Will Deacon wrote: Now that Suzuki isn't within throwing distance, I thought I'd better add a rough overview comment to cpufeature.c so that it doesn't take me days to remember how it works next time. Signed-off-by: Will Deacon --- arch/arm64/kernel/cpufeature.c | 43 ++ 1 file changed, 43 insertions(+) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 680a453ca8c4..421ca99dc8fc 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -3,6 +3,49 @@ * Contains CPU feature definitions * * Copyright (C) 2015 ARM Ltd. + * + * A note for the weary kernel hacker: the code here is confusing and hard to + * follow! That's partly because it's solving a nasty problem, but also because + * there's a little bit of over-abstraction that tends to obscure what's going + * on behind a maze of helper functions and macros. Thanks for writing this up ! + * + * The basic problem is that hardware folks have started gluing together CPUs + * with distinct architectural features; in some cases even creating SoCs where + * user-visible instructions are available only on a subset of the available + * cores. We try to address this by snapshotting the feature registers of the + * boot CPU and comparing these with the feature registers of each secondary + * CPU when bringing them up. If there is a mismatch, then we update the + * snapshot state to indicate the lowest-common denominator of the feature, + * known as the "safe" value. This snapshot state can be queried to view the I am not sure if the following is implied above. 1) Against the "snapshot" state, where mismatches triggers updating the "snapshot" state to reflect the "safe" value. 2) Compared against the CPU feature registers of *the boot CPU* for "FTR_STRICT" fields and any mismatch triggers TAINT_CPU_OUT_OF_SPEC. This makes sure that warning is generated for each OUT_OF_SPEC secondary CPU. + * "sanitised" value of a feature register. + * + * The sanitised register values are used to decide which capabilities we + * have in the system. These may be in the form of traditional "hwcaps" + * advertised to userspace or internal "cpucaps" which are used to configure + * things like alternative patching and static keys. While a feature mismatch + * may result in a TAINT_CPU_OUT_OF_SPEC kernel taint, a capability mismatch + * may prevent a CPU from being onlined at all. + * + * Some implementation details worth remembering: + * + * - Mismatched features are *always* sanitised to a "safe" value, which + * usually indicates that the feature is not supported. + * + * - A mismatched feature marked with FTR_STRICT will cause a "SANITY CHECK" + * warning when onlining an offending CPU and the kernel will be tainted + * with TAINT_CPU_OUT_OF_SPEC. As mentioned above, this check is against that of the "boot CPU" register state, which may not be implicit from the statement. + * + * - Features marked as FTR_VISIBLE have their sanitised value visible to + * userspace. FTR_VISIBLE features in registers that are only visible + * to EL0 by trapping *must* have a corresponding HWCAP so that late + * onlining of CPUs cannot lead to features disappearing at runtime. + * As you mentioned in the other response we could add information about the guest view, something like : - KVM exposes the sanitised value of the feature registers to the guests and is not affected by the FTR_VISIBLE. However, depending on the individual feature support in the hypervisor, some of the fields may be capped/limited. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 16/04/20 07:10, Tianjia Zhang wrote: > In earlier versions of kvm, 'kvm_run' is an independent structure > and is not included in the vcpu structure. At present, 'kvm_run' > is already included in the vcpu structure, so the parameter > 'kvm_run' is redundant. > > This patch simplify the function definition, removes the extra > 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure > if necessary. > > Signed-off-by: Tianjia Zhang > --- > > v2 change: > remove 'kvm_run' parameter and extract it from 'kvm_vcpu' > > arch/mips/kvm/mips.c | 3 ++- > arch/powerpc/kvm/powerpc.c | 3 ++- > arch/s390/kvm/kvm-s390.c | 3 ++- > arch/x86/kvm/x86.c | 11 ++- > include/linux/kvm_host.h | 2 +- > virt/kvm/arm/arm.c | 6 +++--- > virt/kvm/kvm_main.c| 2 +- > 7 files changed, 17 insertions(+), 13 deletions(-) > > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > index 8f05dd0a0f4e..ec24adf4857e 100644 > --- a/arch/mips/kvm/mips.c > +++ b/arch/mips/kvm/mips.c > @@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu > *vcpu, > return -ENOIOCTLCMD; > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *run = vcpu->run; > int r = -EINTR; > > vcpu_load(vcpu); > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index e15166b0a16d..7e24691e138a 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, > struct kvm_one_reg *reg) > return r; > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *run = vcpu->run; > int r; > > vcpu_load(vcpu); > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index 19a81024fe16..443af3ead739 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct > kvm_run *kvm_run) > store_regs_fmt2(vcpu, kvm_run); > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *kvm_run = vcpu->run; > int rc; > > if (kvm_run->immediate_exit) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 3bf2ecafd027..a0338e86c90f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) > trace_kvm_fpu(0); > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *kvm_run = vcpu->run; > int r; > > vcpu_load(vcpu); > @@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, > struct kvm_run *kvm_run) > r = -EAGAIN; > if (signal_pending(current)) { > r = -EINTR; > - vcpu->run->exit_reason = KVM_EXIT_INTR; > + kvm_run->exit_reason = KVM_EXIT_INTR; > ++vcpu->stat.signal_exits; > } > goto out; > } > > - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { > + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { > r = -EINVAL; > goto out; > } > > - if (vcpu->run->kvm_dirty_regs) { > + if (kvm_run->kvm_dirty_regs) { > r = sync_regs(vcpu); > if (r != 0) > goto out; > @@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, > struct kvm_run *kvm_run) > > out: > kvm_put_guest_fpu(vcpu); > - if (vcpu->run->kvm_valid_regs) > + if (kvm_run->kvm_valid_regs) > store_regs(vcpu); > post_kvm_run_save(vcpu); > kvm_sigset_deactivate(vcpu); > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 6d58beb65454..1e17ef719595 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, > struct kvm_mp_state *mp_state); > int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, > struct kvm_guest_debug *dbg); > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); > > int kvm_arch_init(void *opaque); > void kvm_arch_exit(void); > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 48d0ec44ad77..f5390ac2165b 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -639,7 +639,6 @@ static void
Re: [PATCH] KVM: handle the right RAS SEA(Synchronous External Abort) type
Hi Geng, On 16/04/2020 13:07, gengdongjiu wrote: > On 2020/4/14 20:18, James Morse wrote: >> On 11/04/2020 13:17, Dongjiu Geng wrote: >>> When the RAS Extension is implemented, b0b011000, 0b011100, >>> 0b011101, 0b00, and 0b01, are not used and reserved >>> to the DFSC[5:0] of ESR_ELx, but the code still checks these >>> unused bits, so remove them. >> >> They aren't unused: CPUs without the RAS extensions may still generate these. >> >> kvm_handle_guest_abort() wants to know if this is an external abort. >> KVM doesn't really care if the CPU has the RAS extensions or not, its the >> arch code's job >> to sort all that out. > > No, handle_guest_sea() ---> ghes_notify_sea ---> apei driver You missed the arch code's apei_claim_sea(). This is where kvm washes its hands of the exception, its up to the arch code to deal with, in the same way it deals with errors for user-space. > If it is an external abort, it will call apei driver to handle it, but it > should be only SEA will call the apei driver. > other type of external abort should not call apei driver. > I am not see arch code sort all that out. The other kind is _asynchronous_ external abort, which doesn't get handled in here. Do you mean 'Synchronous external abort on translation table walk, nth level'? KVM shouldn't care, this is still up to the arch code to deal with. Do you mean 'Synchronous parity or ECC error on memory access, not on translation table walk'? (and all its translation table walk friends...) These really are still external abort! See shared/functions/aborts/IsExternalAbort() in the psuedo code. (J12-7735 of DDI0487F.a) This means they are trapped by SCR_EL3.EA. On a firmware-first system if we see one of these, its because firmware re-injected it. The arch code needs to get APEI to check for CPER records when it sees one. (KVM ... doesn't care) (I agree not having 'external' in the name is counter-intuitive. I think this is because the component that triggers them is 'in' the CPU, (e.g. the L1 cache). This is how you can know it was a parity or ECC error, instead of 'something outside' (i.e. external). An OS that is deeply tied to the CPU micro-architecture could use the difference to read imp-def sysregs for one, and poke around in imp-def mmio for the external case. Linux isn't tied to the micro-architecture, so ignores this 'internal/external' hint.) >>> If the handling of guest ras data error fails, it should >>> inject data instead of SError to let the guest recover as >>> much as possible. > > In some hardware platform, it supports RAS, but the RAS error address will be > not recorded, In what circumstances does this happen? Wouldn't these errors all be uncontained? (in which case the host should panic()). e.g. A write went to the wrong address, we don't know where it went... ... Is this because your platform describes memory-errors that happen inside the CPU as processor errors, instead of memory-errors? Linux's APEI code ignores them, because it doesn't know what they are. This means you are missing the whole memory_failure(), page-unmapping, user-space signalling ... and guest error injection ... part of the RAS handling. > so it is better to inject a data abort instead of SError for thtat platform. Not at all. The SError here is KVM's response to having nothing else it can do. Having the host handle the error is the best way of solving the problem. > because guest will try to do recovery for the Synchronous data abort, such as > kill the error > application. But for SError, guest will be panic. I'd rather we fix this by having the host handle the error. Botching in a synchronous exception here would mean Qemu can't properly emulate a machine that has SCR_EL3.EA set ... which you need to provide firmware-first for the guest. [...] > Yes, some platform supports RAS but will not record the error address, so the > host has failed > to handle the error. I don't think its reasonable to expect KVM to do anything useful in this case. We should fix the host error handling. In what circumstances does this happen? Thanks, James ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: handle the right RAS SEA(Synchronous External Abort) type
Hi James On 2020/4/14 20:18, James Morse wrote: > Hi Geng, > > On 11/04/2020 13:17, Dongjiu Geng wrote: >> When the RAS Extension is implemented, b0b011000, 0b011100, >> 0b011101, 0b00, and 0b01, are not used and reserved >> to the DFSC[5:0] of ESR_ELx, but the code still checks these >> unused bits, so remove them. > > They aren't unused: CPUs without the RAS extensions may still generate these. > > kvm_handle_guest_abort() wants to know if this is an external abort. > KVM doesn't really care if the CPU has the RAS extensions or not, its the > arch code's job > to sort all that out. No, handle_guest_sea() ---> ghes_notify_sea ---> apei driver If it is an external abort, it will call apei driver to handle it, but it should be only SEA will call the apei driver. other type of external abort should not call apei driver. I am not see arch code sort all that out. /* Synchronous External Abort? */ if (kvm_vcpu_dabt_isextabt(vcpu)) { /* * For RAS the host kernel may handle this abort. * There is no need to pass the error into the guest. */ if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu))) return 1; } > > >> If the handling of guest ras data error fails, it should >> inject data instead of SError to let the guest recover as >> much as possible. In some hardware platform, it supports RAS, but the RAS error address will be not recorded, so it is better to inject a data abort instead of SError for thtat platform. because guest will try to do recovery for the Synchronous data abort, such as kill the error application. But for SError, guest will be panic. > > (I don't quite follow your point here). > > If KVM injected a synchronous external abort due to a RAS error here, then > you wouldn't be > able to support firmware-first RAS with Qemu. I don't think this is what you > want. > > > The handling is (and should be) decoupled. > > KVM guests aren't special. Whatever happens for a normal user-space process > is what should > happen here. KVM is just doing the plumbing: > > When the hypervisor takes an external abort due to the guest, it should plumb > the error > into the arch code to be handled. This is what would happen for a normal EL0 > process. > This is what do_sea() and kvm_handle_guest_sea() do with apei_claim_sea(). > > If the RAS code says it handled this error, then we can continue. For > user-space, we > return to user-space. For a guest, we return to the guest. (for user-space > this piece is > not quite complete in mainline, see: > https://lore.kernel.org/linux-acpi/20200228174817.74278-4-james.mo...@arm.com/ > ) > > This first part happens even if the errors are notified by IRQs, or found in > a polled buffer. > > The RAS code may have 'handled' the memory by unmapping it, and marking the > corresponding > page as HWPOISONed. If user-space tries to access this, it will be give an > SIGBUS:MCEERR_AR. If a guest tries to do this, the same things happens. (The > signal goes > to Qemu). > (See do_page_fault()s use of the MCEERR si_code's, and > kvm_send_hwpoison_signal) > > This second part is the same regardless of how the kernel discovered the RAS > error in the > first place. > > > If the RAS code says it did not handle this error, it means it wasn't a RAS > error, or your > platform doesn't support RAS. For an external-abort there is very little the > hypervisor > can do in this situation. It does what KVM has always done: inject an > asynchronous > external abort. > This should only happen if the host has failed to handle the error. KVM's use > of > asynchronous abort is the simplest one size fits all. > > Are you seeing this happen? If so, what are the circumstances. Did the host > handle the > error? (if not: why not!) Yes, some platform supports RAS but will not record the error address, so the host has failed to handle the error. > > > Thanks, > > James > . > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/8] arm64: cpufeature: Add an overview comment for the cpufeature framework
Hi Suzuki, On Tue, Apr 14, 2020 at 10:31:14PM +0100, Will Deacon wrote: > Now that Suzuki isn't within throwing distance, I thought I'd better add > a rough overview comment to cpufeature.c so that it doesn't take me days > to remember how it works next time. > > Signed-off-by: Will Deacon > --- > arch/arm64/kernel/cpufeature.c | 43 ++ > 1 file changed, 43 insertions(+) Any chance you can look at this one, please? I don't trust myself to get all of the details right here! I'm also wondering whether we should mention something about KVM and the guest view of the registers. What do you think? Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020/4/16 16:28, Marc Zyngier wrote: On 2020-04-16 08:03, Vitaly Kuznetsov wrote: Tianjia Zhang writes: In earlier versions of kvm, 'kvm_run' is an independent structure and is not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplify the function definition, removes the extra 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang --- v2 change: remove 'kvm_run' parameter and extract it from 'kvm_vcpu' arch/mips/kvm/mips.c | 3 ++- arch/powerpc/kvm/powerpc.c | 3 ++- arch/s390/kvm/kvm-s390.c | 3 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 2 +- virt/kvm/arm/arm.c | 6 +++--- virt/kvm/kvm_main.c | 2 +- 7 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 8f05dd0a0f4e..ec24adf4857e 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, return -ENOIOCTLCMD; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r = -EINTR; vcpu_load(vcpu); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index e15166b0a16d..7e24691e138a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) return r; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r; vcpu_load(vcpu); diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 19a81024fe16..443af3ead739 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) store_regs_fmt2(vcpu, kvm_run); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int rc; if (kvm_run->immediate_exit) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3bf2ecafd027..a0338e86c90f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) trace_kvm_fpu(0); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int r; vcpu_load(vcpu); @@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = -EAGAIN; if (signal_pending(current)) { r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; + kvm_run->exit_reason = KVM_EXIT_INTR; ++vcpu->stat.signal_exits; } goto out; } - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { r = -EINVAL; goto out; } - if (vcpu->run->kvm_dirty_regs) { + if (kvm_run->kvm_dirty_regs) { r = sync_regs(vcpu); if (r != 0) goto out; @@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) out: kvm_put_guest_fpu(vcpu); - if (vcpu->run->kvm_valid_regs) + if (kvm_run->kvm_valid_regs) store_regs(vcpu); post_kvm_run_save(vcpu); kvm_sigset_deactivate(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6d58beb65454..1e17ef719595 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state); int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); int kvm_arch_init(void *opaque); void kvm_arch_exit(void); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 48d0ec44ad77..f5390ac2165b 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu) /** * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * @vcpu: The VCPU pointer - * @run: The kvm_run structure pointer used for userspace state exchange * * This function is called through the VCPU_RUN ioctl called from user space. It * will execute VM
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020/4/16 16:50, Cornelia Huck wrote: On Thu, 16 Apr 2020 16:45:33 +0800 Tianjia Zhang wrote: On 2020/4/16 16:28, Marc Zyngier wrote: On 2020-04-16 08:03, Vitaly Kuznetsov wrote: Tianjia Zhang writes: In earlier versions of kvm, 'kvm_run' is an independent structure and is not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplify the function definition, removes the extra 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang --- v2 change: remove 'kvm_run' parameter and extract it from 'kvm_vcpu' arch/mips/kvm/mips.c | 3 ++- arch/powerpc/kvm/powerpc.c | 3 ++- arch/s390/kvm/kvm-s390.c | 3 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 2 +- virt/kvm/arm/arm.c | 6 +++--- virt/kvm/kvm_main.c | 2 +- 7 files changed, 17 insertions(+), 13 deletions(-) Overall, there is a large set of cleanups to be done when both the vcpu and the run structures are passed as parameters at the same time. Just grepping the tree for kvm_run is pretty instructive. M. Sorry, it's my mistake, I only compiled the x86 platform, I will submit patch again. I think it's completely fine (and even preferable) to do cleanups like that on top. [FWIW, I compiled s390 here.] Very good, I will do a comprehensive cleanup of this type of code. Thanks, Tianjia ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
On 2020/4/14 下午11:05, Eric Auger wrote: This version fixes an issue observed by Shameer on an SMMU 3.2, when moving from dual stage config to stage 1 only config. The 2 high 64b of the STE now get reset. Otherwise, leaving the S2TTB set may cause a C_BAD_STE error. This series can be found at: https://github.com/eauger/linux/tree/v5.6-2stage-v11_10.1 (including the VFIO part) The QEMU fellow series still can be found at: https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6 Users have expressed interest in that work and tested v9/v10: - https://patchwork.kernel.org/cover/11039995/#23012381 - https://patchwork.kernel.org/cover/11039995/#23197235 Background: This series brings the IOMMU part of HW nested paging support in the SMMUv3. The VFIO part is submitted separately. The IOMMU API is extended to support 2 new API functionalities: 1) pass the guest stage 1 configuration 2) pass stage 1 MSI bindings Then those capabilities gets implemented in the SMMUv3 driver. The virtualizer passes information through the VFIO user API which cascades them to the iommu subsystem. This allows the guest to own stage 1 tables and context descriptors (so-called PASID table) while the host owns stage 2 tables and main configuration structures (STE). Thanks Eric Tested v11 on Hisilicon kunpeng920 board via hardware zip accelerator. 1. no-sva works, where guest app directly use physical address via ioctl. 2. vSVA still not work, same as v10, 3. the v10 issue reported by Shameer has been solved, first start qemu with iommu=smmuv3, then start qemu without iommu=smmuv3 4. no-sva also works without iommu=smmuv3 Test details in https://docs.qq.com/doc/DRU5oR1NtUERseFNL Thanks ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
Tianjia Zhang writes: > In earlier versions of kvm, 'kvm_run' is an independent structure > and is not included in the vcpu structure. At present, 'kvm_run' > is already included in the vcpu structure, so the parameter > 'kvm_run' is redundant. > > This patch simplify the function definition, removes the extra > 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure > if necessary. > > Signed-off-by: Tianjia Zhang > --- > > v2 change: > remove 'kvm_run' parameter and extract it from 'kvm_vcpu' > > arch/mips/kvm/mips.c | 3 ++- > arch/powerpc/kvm/powerpc.c | 3 ++- > arch/s390/kvm/kvm-s390.c | 3 ++- > arch/x86/kvm/x86.c | 11 ++- > include/linux/kvm_host.h | 2 +- > virt/kvm/arm/arm.c | 6 +++--- > virt/kvm/kvm_main.c| 2 +- > 7 files changed, 17 insertions(+), 13 deletions(-) > > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > index 8f05dd0a0f4e..ec24adf4857e 100644 > --- a/arch/mips/kvm/mips.c > +++ b/arch/mips/kvm/mips.c > @@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu > *vcpu, > return -ENOIOCTLCMD; > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *run = vcpu->run; > int r = -EINTR; > > vcpu_load(vcpu); > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index e15166b0a16d..7e24691e138a 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, > struct kvm_one_reg *reg) > return r; > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *run = vcpu->run; > int r; > > vcpu_load(vcpu); > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index 19a81024fe16..443af3ead739 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct > kvm_run *kvm_run) > store_regs_fmt2(vcpu, kvm_run); > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *kvm_run = vcpu->run; > int rc; > > if (kvm_run->immediate_exit) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 3bf2ecafd027..a0338e86c90f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) > trace_kvm_fpu(0); > } > > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > { > + struct kvm_run *kvm_run = vcpu->run; > int r; > > vcpu_load(vcpu); > @@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, > struct kvm_run *kvm_run) > r = -EAGAIN; > if (signal_pending(current)) { > r = -EINTR; > - vcpu->run->exit_reason = KVM_EXIT_INTR; > + kvm_run->exit_reason = KVM_EXIT_INTR; > ++vcpu->stat.signal_exits; > } > goto out; > } > > - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { > + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { > r = -EINVAL; > goto out; > } > > - if (vcpu->run->kvm_dirty_regs) { > + if (kvm_run->kvm_dirty_regs) { > r = sync_regs(vcpu); > if (r != 0) > goto out; > @@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, > struct kvm_run *kvm_run) > > out: > kvm_put_guest_fpu(vcpu); > - if (vcpu->run->kvm_valid_regs) > + if (kvm_run->kvm_valid_regs) > store_regs(vcpu); > post_kvm_run_save(vcpu); > kvm_sigset_deactivate(vcpu); > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 6d58beb65454..1e17ef719595 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, > struct kvm_mp_state *mp_state); > int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, > struct kvm_guest_debug *dbg); > -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); > +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); > > int kvm_arch_init(void *opaque); > void kvm_arch_exit(void); > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 48d0ec44ad77..f5390ac2165b 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -639,7 +639,6 @@ static void
RE: [PATCH v2 00/94] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
Hi Marc: Got it. Really a bit patch set :) BTW, I have done a basic kvm unit test git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git And I find that after apply the patch KVM: arm64: VNCR-ize ELR_EL1, The psci test failed for some reason, I can't understand why, this is only the test result.(find the patch by git bisect + kvm test) My platform: Hisilicon D06 board. Linux kernel: Linux 5.6-rc6 + nv patches(some rebases) Could you help to take a look? Thanks Zengtao > -Original Message- > From: Marc Zyngier [mailto:m...@kernel.org] > Sent: Saturday, April 11, 2020 5:24 PM > To: Zengtao (B) > Cc: George Cherian; dave.mar...@arm.com; alexandru.eli...@arm.com; > andre.przyw...@arm.com; christoffer.d...@arm.com; > james.mo...@arm.com; jint...@cs.columbia.edu; > julien.thierry.k...@gmail.com; k...@vger.kernel.org; > kvmarm@lists.cs.columbia.edu; linux-arm-ker...@lists.infradead.org; > suzuki.poul...@arm.com; Anil Kumar Reddy H; Ganapatrao Kulkarni > Subject: Re: [PATCH v2 00/94] KVM: arm64: ARMv8.3/8.4 Nested > Virtualization support > > Hi Zengtao, > > On Sat, 11 Apr 2020 05:10:05 +0100, > "Zengtao (B)" wrote: > > > > Hi Marc: > > > > Since it's a very large patch series, I want to test it on my platform > > which don't support nv, and want to make sure if this patch series > > affects the existed virtualization functions or not. > > > > Any suggestion about the test focus? > > Not really. Given that the NV patches affect absolutely every > architectural parts of KVM/arm64, everything needs careful > testing. But more than testing, it needs reviewing. > > Thanks, > > M. > > -- > Jazz is not dead, it just smells funny. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020/4/15 22:53, Paolo Bonzini wrote: On 15/04/20 11:07, Vitaly Kuznetsov wrote: In case this is no longer needed I'd suggest we drop 'kvm_run' parameter and extract it from 'struct kvm_vcpu' when needed. This looks like a natural add-on to your cleanup patch. I agree, though I think it should be _instead_ of Tianjia's patch rather than on top. Paolo Thank you very much for the comments of Vitaly and Paolo, I will make a v2 patch. Thanks and best, Tianjia ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
In earlier versions of kvm, 'kvm_run' is an independent structure and is not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplify the function definition, removes the extra 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang --- v2 change: remove 'kvm_run' parameter and extract it from 'kvm_vcpu' arch/mips/kvm/mips.c | 3 ++- arch/powerpc/kvm/powerpc.c | 3 ++- arch/s390/kvm/kvm-s390.c | 3 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 2 +- virt/kvm/arm/arm.c | 6 +++--- virt/kvm/kvm_main.c| 2 +- 7 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 8f05dd0a0f4e..ec24adf4857e 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, return -ENOIOCTLCMD; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r = -EINTR; vcpu_load(vcpu); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index e15166b0a16d..7e24691e138a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) return r; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r; vcpu_load(vcpu); diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 19a81024fe16..443af3ead739 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) store_regs_fmt2(vcpu, kvm_run); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int rc; if (kvm_run->immediate_exit) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3bf2ecafd027..a0338e86c90f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) trace_kvm_fpu(0); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int r; vcpu_load(vcpu); @@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = -EAGAIN; if (signal_pending(current)) { r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; + kvm_run->exit_reason = KVM_EXIT_INTR; ++vcpu->stat.signal_exits; } goto out; } - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { r = -EINVAL; goto out; } - if (vcpu->run->kvm_dirty_regs) { + if (kvm_run->kvm_dirty_regs) { r = sync_regs(vcpu); if (r != 0) goto out; @@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) out: kvm_put_guest_fpu(vcpu); - if (vcpu->run->kvm_valid_regs) + if (kvm_run->kvm_valid_regs) store_regs(vcpu); post_kvm_run_save(vcpu); kvm_sigset_deactivate(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6d58beb65454..1e17ef719595 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state); int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); int kvm_arch_init(void *opaque); void kvm_arch_exit(void); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 48d0ec44ad77..f5390ac2165b 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu) /** * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * @vcpu: The VCPU pointer - * @run: The kvm_run structure pointer used for userspace
Re: [PATCH RFCv1 0/7] Support Async Page Fault
Hi Mark, On 4/14/20 9:05 PM, Mark Rutland wrote: On Tue, Apr 14, 2020 at 03:39:56PM +1000, Gavin Shan wrote: On 4/10/20 10:52 PM, Marc Zyngier wrote: On 2020-04-10 09:58, Gavin Shan wrote: In order to fulfil the control flow and convey signals between host and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced. The register accepts control block's physical address, plus requested features. Also, the signal is sent using data abort with the specific IMPDEF Data Fault Status Code (DFSC). The specific signal is stored in the control block by host, to be consumed by guest. - We don't add IMPDEF sysregs, period. That's reserved for the HW. If you want to trap, there's the HVC instruction to that effect. I really don't understand how IMPDEF sysreg is used by hardware vendors. Do we have an existing functionality, which depends on IMPDEF sysreg? I was thinking the IMPDEF sysreg can be used by software either, but it seems I'm wrong. The key is in the name: an IMPLEMENTATION DEFINED register is defined by the implementation (i.e. the specific CPU microarchitecture), so it's wrong for software to come up with an arbitrary semantic as this will differ from the implementation's defined semantic for the register. Typically, IMP DEF resgisters are used for things that firmware needs to do (e.g. enter/exit coherency), or for bringup-time debug (e.g. poking into TLB/cache internals), and are not usually intended for general purpose software. Linux generally avoids the use of IMP DEF registers, but does so in some cases (e.g. for PMUs) after FW explicitly describes that those are safe to access. Thanks for the explanation and details, which make things much clear. Since the IMPDEF system register can't be used like this way, hypercall (HVC) would be considered to serve same purpose - deliver signals from host to guest. However, the hypercall number and behaviors are guarded by specification. For example, the hypercalls used by para-virtualized stolen time, which are defined in include/linux/arm-smccc.h, are specified by ARM DEN0057A [1]. So I need a specification to be created, where the hypercalls used by this feature are defined? If it's not needed, can I pick hypercalls that aren't used and define their behaviors by myself? [1] http://infocenter.arm.com/help/topic/com.arm.doc.den0057a/DEN0057A_Paravirtualized_Time_for_Arm_based_Systems_v1_0.pdf Another thing I want to check is about the ESR_EL1[DFSC]. In this series, the asynchronous page fault is identified by IMPDEF DFSC (Data Fault Status Code) in ESR_EL1. According to what we discussed, the IMPDEF DFSC shouldn't be fired (produced) by software. It should be produced by hardware either? What I understood is IMPDEF is hardware behavior. If this is true, I need to avoid using IMPDEF DFSC in next revision :) Thanks, Gavin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/8] Relax sanity checking for mismatched AArch32 EL1
On 2020-04-16 14:09, Sai Prakash Ranjan wrote: On 2020-04-15 03:01, Will Deacon wrote: Hi all, For better or worse, there are SoCs in production where some, but not all of the CPUs, support AArch32 at EL1 and above. Right now, that results in "SANITY CHECK" warnings during boot and an unconditional kernel taint. This patch series tries to do a bit better: the only time we care about AArch32 at EL1 is for KVM, so rather than throw our toys out of the pram, we can instead just disable support for 32-bit guests on these systems. In the unlikely scenario of a late CPU hotplug being the first time we notice that AArch32 is not available, then we fail the hotplug (right now we let the thing come online, which leads to hilarious results for any pre-existing 32-bit guests). Feedback welcome, Will Thanks Will, tested this series on QCOM SC7180 and SM8150 SoCs. For the entire series, Tested-by: saiprakash.ran...@codeaurora.org Urgh sorry, it should be Tested-by: Sai Prakash Ranjan -Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/8] Relax sanity checking for mismatched AArch32 EL1
On 2020-04-15 03:01, Will Deacon wrote: Hi all, For better or worse, there are SoCs in production where some, but not all of the CPUs, support AArch32 at EL1 and above. Right now, that results in "SANITY CHECK" warnings during boot and an unconditional kernel taint. This patch series tries to do a bit better: the only time we care about AArch32 at EL1 is for KVM, so rather than throw our toys out of the pram, we can instead just disable support for 32-bit guests on these systems. In the unlikely scenario of a late CPU hotplug being the first time we notice that AArch32 is not available, then we fail the hotplug (right now we let the thing come online, which leads to hilarious results for any pre-existing 32-bit guests). Feedback welcome, Will Thanks Will, tested this series on QCOM SC7180 and SM8150 SoCs. For the entire series, Tested-by: saiprakash.ran...@codeaurora.org -Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH RFCv1 0/7] Support Async Page Fault
On Thu, Apr 16, 2020 at 10:16:22AM +0100, Mark Rutland wrote: > On Thu, Apr 16, 2020 at 05:59:33PM +1000, Gavin Shan wrote: > > However, the hypercall number and behaviors are guarded by > > specification. For example, the hypercalls used by para-virtualized > > stolen time, which are defined in include/linux/arm-smccc.h, are > > specified by ARM DEN0057A [1]. So I need a specification to be > > created, where the hypercalls used by this feature are defined? If > > it's not needed, can I pick hypercalls that aren't used and define > > their behaviors by myself? > > > > [1] > > http://infocenter.arm.com/help/topic/com.arm.doc.den0057a/DEN0057A_Paravirtualized_Time_for_Arm_based_Systems_v1_0.pdf > > Take a look at the SMCCC / SMC Calling Convention: > > https://developer.arm.com/docs/den0028/c > > ... that defines ranges set aside for hypervisor-specific usage, and > despite its name it also applies to HVC calls. > > There's been intermittent work to add a probing story for that, so that > part is subject to change, but for prototyping you can just choose an > arbitray number in that range -- just be suere to mention in the commit > and cover letter that this part isn't complete. Right, might be simplest to start off with: https://android-kvm.googlesource.com/linux/+/refs/heads/willdeacon/hvc Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH RFCv1 0/7] Support Async Page Fault
On Thu, Apr 16, 2020 at 05:59:33PM +1000, Gavin Shan wrote: > On 4/14/20 9:05 PM, Mark Rutland wrote: > > On Tue, Apr 14, 2020 at 03:39:56PM +1000, Gavin Shan wrote: > > > On 4/10/20 10:52 PM, Marc Zyngier wrote: > > > > On 2020-04-10 09:58, Gavin Shan wrote: > > > > > In order to fulfil the control flow and convey signals between host > > > > > and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced. > > > > > The register accepts control block's physical address, plus requested > > > > > features. Also, the signal is sent using data abort with the specific > > > > > IMPDEF Data Fault Status Code (DFSC). The specific signal is stored > > > > > in the control block by host, to be consumed by guest. > > > > > > - We don't add IMPDEF sysregs, period. That's reserved for the HW. If > > > > you want to trap, there's the HVC instruction to that effect. > > > > > I really don't understand how IMPDEF sysreg is used by hardware vendors. > > > Do we have an existing functionality, which depends on IMPDEF sysreg? > > > I was thinking the IMPDEF sysreg can be used by software either, but > > > it seems I'm wrong. > > > > The key is in the name: an IMPLEMENTATION DEFINED register is defined by > > the implementation (i.e. the specific CPU microarchitecture), so it's > > wrong for software to come up with an arbitrary semantic as this will > > differ from the implementation's defined semantic for the register. > > > > Typically, IMP DEF resgisters are used for things that firmware needs to > > do (e.g. enter/exit coherency), or for bringup-time debug (e.g. poking > > into TLB/cache internals), and are not usually intended for general > > purpose software. > > > > Linux generally avoids the use of IMP DEF registers, but does so in some > > cases (e.g. for PMUs) after FW explicitly describes that those are safe > > to access. > > Thanks for the explanation and details, which make things much clear. Since > the IMPDEF system register can't be used like this way, hypercall (HVC) would > be considered to serve same purpose - deliver signals from host to guest. I'm not sure I follow how you'd use HVC to inject a signal into a guest; the HVC would have to be issued by the guest to the host. Unless you're injecting the signal via some other mechanism (e.g. an interrupt), and the guest issues the HVC in response to that? > However, the hypercall number and behaviors are guarded by > specification. For example, the hypercalls used by para-virtualized > stolen time, which are defined in include/linux/arm-smccc.h, are > specified by ARM DEN0057A [1]. So I need a specification to be > created, where the hypercalls used by this feature are defined? If > it's not needed, can I pick hypercalls that aren't used and define > their behaviors by myself? > > [1] > http://infocenter.arm.com/help/topic/com.arm.doc.den0057a/DEN0057A_Paravirtualized_Time_for_Arm_based_Systems_v1_0.pdf Take a look at the SMCCC / SMC Calling Convention: https://developer.arm.com/docs/den0028/c ... that defines ranges set aside for hypervisor-specific usage, and despite its name it also applies to HVC calls. There's been intermittent work to add a probing story for that, so that part is subject to change, but for prototyping you can just choose an arbitray number in that range -- just be suere to mention in the commit and cover letter that this part isn't complete. > Another thing I want to check is about the ESR_EL1[DFSC]. In this series, > the asynchronous page fault is identified by IMPDEF DFSC (Data Fault Status > Code) in ESR_EL1. According to what we discussed, the IMPDEF DFSC shouldn't > be fired (produced) by software. It should be produced by hardware either? > What I understood is IMPDEF is hardware behavior. If this is true, I need > to avoid using IMPDEF DFSC in next revision :) Yes, similar applies here. If the guest is making a hypercall, you can return the fault info as the response in GPRs, so I don't think you need to touch any architectural fault registers. Thanks, Mark. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020-04-16 09:45, Tianjia Zhang wrote: On 2020/4/16 16:28, Marc Zyngier wrote: [...] Overall, there is a large set of cleanups to be done when both the vcpu and the run structures are passed as parameters at the same time. Just grepping the tree for kvm_run is pretty instructive. M. Sorry, it's my mistake, I only compiled the x86 platform, I will submit patch again. Not a mistake. All I'm saying is that there is an opportunity for a larger series that cleans up the code base, rather than just doing a couple of localized changes. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On Thu, 16 Apr 2020 16:45:33 +0800 Tianjia Zhang wrote: > On 2020/4/16 16:28, Marc Zyngier wrote: > > On 2020-04-16 08:03, Vitaly Kuznetsov wrote: > >> Tianjia Zhang writes: > >> > >>> In earlier versions of kvm, 'kvm_run' is an independent structure > >>> and is not included in the vcpu structure. At present, 'kvm_run' > >>> is already included in the vcpu structure, so the parameter > >>> 'kvm_run' is redundant. > >>> > >>> This patch simplify the function definition, removes the extra > >>> 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure > >>> if necessary. > >>> > >>> Signed-off-by: Tianjia Zhang > >>> --- > >>> > >>> v2 change: > >>> remove 'kvm_run' parameter and extract it from 'kvm_vcpu' > >>> > >>> arch/mips/kvm/mips.c | 3 ++- > >>> arch/powerpc/kvm/powerpc.c | 3 ++- > >>> arch/s390/kvm/kvm-s390.c | 3 ++- > >>> arch/x86/kvm/x86.c | 11 ++- > >>> include/linux/kvm_host.h | 2 +- > >>> virt/kvm/arm/arm.c | 6 +++--- > >>> virt/kvm/kvm_main.c | 2 +- > >>> 7 files changed, 17 insertions(+), 13 deletions(-) > > Overall, there is a large set of cleanups to be done when both the vcpu > > and the run > > structures are passed as parameters at the same time. Just grepping the > > tree for > > kvm_run is pretty instructive. > > > > M. > > Sorry, it's my mistake, I only compiled the x86 platform, I will submit > patch again. I think it's completely fine (and even preferable) to do cleanups like that on top. [FWIW, I compiled s390 here.] ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020-04-16 08:03, Vitaly Kuznetsov wrote: Tianjia Zhang writes: In earlier versions of kvm, 'kvm_run' is an independent structure and is not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplify the function definition, removes the extra 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang --- v2 change: remove 'kvm_run' parameter and extract it from 'kvm_vcpu' arch/mips/kvm/mips.c | 3 ++- arch/powerpc/kvm/powerpc.c | 3 ++- arch/s390/kvm/kvm-s390.c | 3 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 2 +- virt/kvm/arm/arm.c | 6 +++--- virt/kvm/kvm_main.c| 2 +- 7 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 8f05dd0a0f4e..ec24adf4857e 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, return -ENOIOCTLCMD; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r = -EINTR; vcpu_load(vcpu); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index e15166b0a16d..7e24691e138a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) return r; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r; vcpu_load(vcpu); diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 19a81024fe16..443af3ead739 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) store_regs_fmt2(vcpu, kvm_run); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int rc; if (kvm_run->immediate_exit) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3bf2ecafd027..a0338e86c90f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) trace_kvm_fpu(0); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int r; vcpu_load(vcpu); @@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = -EAGAIN; if (signal_pending(current)) { r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; + kvm_run->exit_reason = KVM_EXIT_INTR; ++vcpu->stat.signal_exits; } goto out; } - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { r = -EINVAL; goto out; } - if (vcpu->run->kvm_dirty_regs) { + if (kvm_run->kvm_dirty_regs) { r = sync_regs(vcpu); if (r != 0) goto out; @@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) out: kvm_put_guest_fpu(vcpu); - if (vcpu->run->kvm_valid_regs) + if (kvm_run->kvm_valid_regs) store_regs(vcpu); post_kvm_run_save(vcpu); kvm_sigset_deactivate(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6d58beb65454..1e17ef719595 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state); int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); int kvm_arch_init(void *opaque); void kvm_arch_exit(void); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 48d0ec44ad77..f5390ac2165b 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu) /** * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * @vcpu: The
Re: [PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On Thu, 16 Apr 2020 13:10:57 +0800 Tianjia Zhang wrote: > In earlier versions of kvm, 'kvm_run' is an independent structure > and is not included in the vcpu structure. At present, 'kvm_run' > is already included in the vcpu structure, so the parameter > 'kvm_run' is redundant. > > This patch simplify the function definition, removes the extra s/simplify/simplifies/ > 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure s/extract/extracts/ > if necessary. > > Signed-off-by: Tianjia Zhang > --- > > v2 change: > remove 'kvm_run' parameter and extract it from 'kvm_vcpu' > > arch/mips/kvm/mips.c | 3 ++- > arch/powerpc/kvm/powerpc.c | 3 ++- > arch/s390/kvm/kvm-s390.c | 3 ++- > arch/x86/kvm/x86.c | 11 ++- > include/linux/kvm_host.h | 2 +- > virt/kvm/arm/arm.c | 6 +++--- > virt/kvm/kvm_main.c| 2 +- > 7 files changed, 17 insertions(+), 13 deletions(-) > Reviewed-by: Cornelia Huck ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Zhangfei, On 4/16/20 6:25 AM, Zhangfei Gao wrote: > > > On 2020/4/14 下午11:05, Eric Auger wrote: >> This version fixes an issue observed by Shameer on an SMMU 3.2, >> when moving from dual stage config to stage 1 only config. >> The 2 high 64b of the STE now get reset. Otherwise, leaving the >> S2TTB set may cause a C_BAD_STE error. >> >> This series can be found at: >> https://github.com/eauger/linux/tree/v5.6-2stage-v11_10.1 >> (including the VFIO part) >> The QEMU fellow series still can be found at: >> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6 >> >> Users have expressed interest in that work and tested v9/v10: >> - https://patchwork.kernel.org/cover/11039995/#23012381 >> - https://patchwork.kernel.org/cover/11039995/#23197235 >> >> Background: >> >> This series brings the IOMMU part of HW nested paging support >> in the SMMUv3. The VFIO part is submitted separately. >> >> The IOMMU API is extended to support 2 new API functionalities: >> 1) pass the guest stage 1 configuration >> 2) pass stage 1 MSI bindings >> >> Then those capabilities gets implemented in the SMMUv3 driver. >> >> The virtualizer passes information through the VFIO user API >> which cascades them to the iommu subsystem. This allows the guest >> to own stage 1 tables and context descriptors (so-called PASID >> table) while the host owns stage 2 tables and main configuration >> structures (STE). >> >> > > Thanks Eric > > Tested v11 on Hisilicon kunpeng920 board via hardware zip accelerator. > 1. no-sva works, where guest app directly use physical address via ioctl. Thank you for the testing. Glad it works for you. > 2. vSVA still not work, same as v10, Yes that's normal this series is not meant to support vSVM at this stage. I intend to add the missing pieces during the next weeks. Thanks Eric > 3. the v10 issue reported by Shameer has been solved, first start qemu > with iommu=smmuv3, then start qemu without iommu=smmuv3 > 4. no-sva also works without iommu=smmuv3 > > Test details in https://docs.qq.com/doc/DRU5oR1NtUERseFNL > > Thanks > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm