Re: [PATCH v2] KVM/arm64: Support enabling dirty log gradually in small chunks
Hi Marc, In RFC patch, I still write protect huge pages when DIRTY_LOG_INITIALLY_ALL_SET is enabled by userspace. I find that both huge pages and normal pages can be write protected during log clear. So this formal patch is pretty simple now. Thanks, Keqian On 2020/4/13 20:20, Keqian Zhu wrote: > There is already support of enabling dirty log graually in small chunks > for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in > small chunks"). This adds support for arm64. > > x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET > is eanbled. However, for arm64, both huge pages and normal pages can be > write protected gradually by userspace. > > Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G > Linux VMs with different page size. The memory pressure is 127G in each > case. The time taken of memory_global_dirty_log_start in QEMU is listed > below: > > Page Size BeforeAfter Optimization > 4K650ms 1.8ms > 2M 4ms 1.8ms > 1G 2ms 1.8ms > > Besides the time reduction, the biggest income is that we will minimize > the performance side effect (because of dissloving huge pages and marking > memslots dirty) on guest after enabling dirty log. > > Signed-off-by: Keqian Zhu > --- > Documentation/virt/kvm/api.rst| 2 +- > arch/arm64/include/asm/kvm_host.h | 3 +++ > virt/kvm/arm/mmu.c| 12 ++-- > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index efbbe570aa9b..0017f63fa44f 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -5777,7 +5777,7 @@ will be initialized to 1 when created. This also > improves performance because > dirty logging can be enabled gradually in small chunks on the first call > to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on > KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on > -x86 for now). > +x86 and arm64 for now). > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index 32c8a675e5a4..a723f84fab83 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -46,6 +46,9 @@ > #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) > #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4) > > +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ > + KVM_DIRTY_LOG_INITIALLY_SET) > + > DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); > > extern unsigned int kvm_sve_max_vl; > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index e3b9ee268823..1077f653a611 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -2265,8 +2265,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, >* allocated dirty_bitmap[], dirty pages will be be tracked while the >* memory slot is write protected. >*/ > - if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) > - kvm_mmu_wp_memory_region(kvm, mem->slot); > + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > + /* > + * If we're with initial-all-set, we don't need to write > + * protect any pages because they're all reported as dirty. > + * Huge pages and normal pages will be write protect gradually. > + */ > + if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) { > + kvm_mmu_wp_memory_region(kvm, mem->slot); > + } > + } > } > > int kvm_arch_prepare_memory_region(struct kvm *kvm, > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 2/2] KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
On 2020/4/14 11:03, Zenghui Yu wrote: If we're going to fail out the vgic_add_lpi(), let's make sure the allocated vgic_irq memory is also freed. Though it seems that both cases are unlikely to fail. Cc: Zengruan Ye Signed-off-by: Zenghui Yu --- virt/kvm/arm/vgic/vgic-its.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index d53d34a33e35..3c3b6a0f2dce 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -98,12 +98,16 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid, * the respective config data from memory here upon mapping the LPI. */ ret = update_lpi_config(kvm, irq, NULL, false); - if (ret) + if (ret) { + kfree(irq); return ERR_PTR(ret); + } ret = vgic_v3_lpi_sync_pending_status(kvm, irq); - if (ret) + if (ret) { + kfree(irq); return ERR_PTR(ret); + } Looking at it again, I realized that this error handling is still not complete. Maybe we should use a vgic_put_irq() instead so that we can also properly delete the vgic_irq from lpi_list. Marc, what do you think? Could you please help to fix it, or I can resend it. Thanks, Zenghui ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory
On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel wrote: > > On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei > wrote: > > > > Hi, > > > > I've tested this patch by running badblocks and fio on a flash device > > inside a > > guest, everything worked as expected. > > > > I've also looked at the flowcharts for device operation from Intel > > Application > > Note 646, pages 12-21, and they seem implemented correctly. > > > > A few minor issues below. > > > > On 2/21/20 4:55 PM, Andre Przywara wrote: > > > From: Raphael Gault > > > > > > The EDK II UEFI firmware implementation requires some storage for the EFI > > > variables, which is typically some flash storage. > > > Since this is already supported on the EDK II side, we add a CFI flash > > > emulation to kvmtool. > > > This is backed by a file, specified via the --flash or -F command line > > > option. Any flash writes done by the guest will immediately be reflected > > > into this file (kvmtool mmap's the file). > > > The flash will be limited to the nearest power-of-2 size, so only the > > > first 2 MB of a 3 MB file will be used. > > > > > > This implements a CFI flash using the "Intel/Sharp extended command > > > set", as specified in: > > > - JEDEC JESD68.01 > > > - JEDEC JEP137B > > > - Intel Application Note 646 > > > Some gaps in those specs have been filled by looking at real devices and > > > other implementations (QEMU, Linux kernel driver). > > > > > > At the moment this relies on DT to advertise the base address of the > > > flash memory (mapped into the MMIO address space) and is only enabled > > > for ARM/ARM64. The emulation itself is architecture agnostic, though. > > > > > > This is one missing piece toward a working UEFI boot with kvmtool on > > > ARM guests, the other is to provide writable PCI BARs, which is WIP. > > > > > I have given this a spin with UEFI built for kvmtool, and it appears > to be working correctly. However, I noticed that it is intolerably > slow, which seems to be caused by the fact that both array mode and > command mode (or whatever it is called in the CFI spec) are fully > emulated, whereas in the QEMU implementation (for instance), the > region is actually exposed to the guest using a read-only KVM memslot > in array mode, and so the read accesses are made natively. > > It is also causing problems in the UEFI implementation, as we can no > longer use unaligned accesses to read from the region, which is > something the code currently relies on (and which works fine on actual > hardware as long as you use normal non-cacheable mappings) > Actually, the issue is not alignment. The issue is with instructions with multiple outputs, which means you cannot do an ordinary memcpy() from the NOR region using ldp instructions, aligned or not. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory
On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei wrote: > > Hi, > > I've tested this patch by running badblocks and fio on a flash device inside a > guest, everything worked as expected. > > I've also looked at the flowcharts for device operation from Intel Application > Note 646, pages 12-21, and they seem implemented correctly. > > A few minor issues below. > > On 2/21/20 4:55 PM, Andre Przywara wrote: > > From: Raphael Gault > > > > The EDK II UEFI firmware implementation requires some storage for the EFI > > variables, which is typically some flash storage. > > Since this is already supported on the EDK II side, we add a CFI flash > > emulation to kvmtool. > > This is backed by a file, specified via the --flash or -F command line > > option. Any flash writes done by the guest will immediately be reflected > > into this file (kvmtool mmap's the file). > > The flash will be limited to the nearest power-of-2 size, so only the > > first 2 MB of a 3 MB file will be used. > > > > This implements a CFI flash using the "Intel/Sharp extended command > > set", as specified in: > > - JEDEC JESD68.01 > > - JEDEC JEP137B > > - Intel Application Note 646 > > Some gaps in those specs have been filled by looking at real devices and > > other implementations (QEMU, Linux kernel driver). > > > > At the moment this relies on DT to advertise the base address of the > > flash memory (mapped into the MMIO address space) and is only enabled > > for ARM/ARM64. The emulation itself is architecture agnostic, though. > > > > This is one missing piece toward a working UEFI boot with kvmtool on > > ARM guests, the other is to provide writable PCI BARs, which is WIP. > > I have given this a spin with UEFI built for kvmtool, and it appears to be working correctly. However, I noticed that it is intolerably slow, which seems to be caused by the fact that both array mode and command mode (or whatever it is called in the CFI spec) are fully emulated, whereas in the QEMU implementation (for instance), the region is actually exposed to the guest using a read-only KVM memslot in array mode, and so the read accesses are made natively. It is also causing problems in the UEFI implementation, as we can no longer use unaligned accesses to read from the region, which is something the code currently relies on (and which works fine on actual hardware as long as you use normal non-cacheable mappings) Are there any plans to implement this as well? I am aware that this is a big ask, but for the general utility of this feature, I think it is rather important. -- Ard. > > Signed-off-by: Raphael Gault > > [Andre: rewriting and fixing] > > Signed-off-by: Andre Przywra > > --- > > Hi, > > > > an update fixing Alexandru's review comments (many thanks for those!) > > The biggest change code-wise is the split of the MMIO handler into three > > different functions. Another significant change is the rounding *down* of > > the present flash file size to the nearest power-of-two, to match flash > > hardware chips and Linux' expectations. > > > > Cheers, > > Andre > > > > Changelog v2 .. v3: > > - Breaking MMIO handling into three separate functions. > > - Assing the flash base address in the memory map, but stay at 32 MB for > > now. > > The MMIO area has been moved up to 48 MB, to never overlap with the > > flash. > > - Impose a limit of 16 MB for the flash size, mostly to fit into the > > (for now) fixed memory map. > > - Trim flash size down to nearest power-of-2, to match hardware. > > - Announce forced flash size trimming. > > - Rework the CFI query table slightly, to add the addresses as array > > indicies. > > - Fix error handling when creating the flash device. > > - Fix pow2_size implementation for 0 and 1 as input values. > > - Fix write buffer size handling. > > - Improve some comments. > > > > Changelog v1 .. v2: > > - Add locking for MMIO handling. > > - Fold flash read into handler. > > - Move pow2_size() into generic header. > > - Spell out flash base address. > > > > Makefile | 6 + > > arm/include/arm-common/kvm-arch.h | 8 +- > > builtin-run.c | 2 + > > hw/cfi_flash.c| 576 ++ > > include/kvm/kvm-config.h | 1 + > > include/kvm/util.h| 8 + > > 6 files changed, 599 insertions(+), 2 deletions(-) > > create mode 100644 hw/cfi_flash.c > > > > diff --git a/Makefile b/Makefile > > index 3862112c..7ed6fb5e 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -170,6 +170,7 @@ ifeq ($(ARCH), arm) > > CFLAGS += -march=armv7-a > > > > ARCH_WANT_LIBFDT := y > > + ARCH_HAS_FLASH_MEM := y > > endif > > > > # ARM64 > > @@ -182,6 +183,7 @@ ifeq ($(ARCH), arm64) > > ARCH_INCLUDE+= -Iarm/aarch64/include > > > > ARCH_WANT_LIBFDT := y > > + ARCH_HAS_FLASH_MEM := y > > endif > > > > ifeq ($(ARCH),mips) > > @@ -261,6 +263,10 @@ ifeq (y,$(ARCH_HAS_FRAMEBUFFER)) > >
Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory
On Wed, 15 Apr 2020 at 18:11, André Przywara wrote: > > On 15/04/2020 16:55, Ard Biesheuvel wrote: > > On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel wrote: > >> > >> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei > >> wrote: > >>> > >>> Hi, > >>> > >>> I've tested this patch by running badblocks and fio on a flash device > >>> inside a > >>> guest, everything worked as expected. > >>> > >>> I've also looked at the flowcharts for device operation from Intel > >>> Application > >>> Note 646, pages 12-21, and they seem implemented correctly. > >>> > >>> A few minor issues below. > >>> > >>> On 2/21/20 4:55 PM, Andre Przywara wrote: > From: Raphael Gault > > The EDK II UEFI firmware implementation requires some storage for the EFI > variables, which is typically some flash storage. > Since this is already supported on the EDK II side, we add a CFI flash > emulation to kvmtool. > This is backed by a file, specified via the --flash or -F command line > option. Any flash writes done by the guest will immediately be reflected > into this file (kvmtool mmap's the file). > The flash will be limited to the nearest power-of-2 size, so only the > first 2 MB of a 3 MB file will be used. > > This implements a CFI flash using the "Intel/Sharp extended command > set", as specified in: > - JEDEC JESD68.01 > - JEDEC JEP137B > - Intel Application Note 646 > Some gaps in those specs have been filled by looking at real devices and > other implementations (QEMU, Linux kernel driver). > > At the moment this relies on DT to advertise the base address of the > flash memory (mapped into the MMIO address space) and is only enabled > for ARM/ARM64. The emulation itself is architecture agnostic, though. > > This is one missing piece toward a working UEFI boot with kvmtool on > ARM guests, the other is to provide writable PCI BARs, which is WIP. > > >> > >> I have given this a spin with UEFI built for kvmtool, and it appears > >> to be working correctly. However, I noticed that it is intolerably > >> slow, which seems to be caused by the fact that both array mode and > >> command mode (or whatever it is called in the CFI spec) are fully > >> emulated, whereas in the QEMU implementation (for instance), the > >> region is actually exposed to the guest using a read-only KVM memslot > >> in array mode, and so the read accesses are made natively. > >> > >> It is also causing problems in the UEFI implementation, as we can no > >> longer use unaligned accesses to read from the region, which is > >> something the code currently relies on (and which works fine on actual > >> hardware as long as you use normal non-cacheable mappings) > >> > > > > Actually, the issue is not alignment. The issue is with instructions > > with multiple outputs, which means you cannot do an ordinary memcpy() > > from the NOR region using ldp instructions, aligned or not. > > Yes, we traced that down to an "ldrb with post-inc", in the memcpy code. > My suggestion was to provide a version of memcpy_{from,to}_io(), as > Linux does, which only uses MMIO accessors to avoid "fancy" instructions. > That is possible, and the impact on the code is manageable, given the modular nature of EDK2. > Back at this point I was challenging the idea of accessing a flash > device with a normal memory mapping, because of it failing when being in > some query mode. Do you know of any best practices for flash mappings? > Are two mappings common? > In the QEMU port of EDK2, we use normal non-cacheable for the first flash device, which contains the executable image, and is not updatable by the guest. The second flash bank is used for the variable store, and is actually mapped as a device all the time. Another thing I just realized is that you cannot fetch instructions from an emulated flash device either, so to execute from NOR flash, you will need a true memory mapping as well. So in summary, I think the mode switch is needed to be generally useful, even if the current approach is sufficient for (slow) read/write using special memory accessors. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory
On Wed, 15 Apr 2020 at 18:36, André Przywara wrote: > > On 15/04/2020 17:20, Ard Biesheuvel wrote: > > On Wed, 15 Apr 2020 at 18:11, André Przywara wrote: > >> > >> On 15/04/2020 16:55, Ard Biesheuvel wrote: > >>> On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel wrote: > > On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei > wrote: > > > > Hi, > > > > I've tested this patch by running badblocks and fio on a flash device > > inside a > > guest, everything worked as expected. > > > > I've also looked at the flowcharts for device operation from Intel > > Application > > Note 646, pages 12-21, and they seem implemented correctly. > > > > A few minor issues below. > > > > On 2/21/20 4:55 PM, Andre Przywara wrote: > >> From: Raphael Gault > >> > >> The EDK II UEFI firmware implementation requires some storage for the > >> EFI > >> variables, which is typically some flash storage. > >> Since this is already supported on the EDK II side, we add a CFI flash > >> emulation to kvmtool. > >> This is backed by a file, specified via the --flash or -F command line > >> option. Any flash writes done by the guest will immediately be > >> reflected > >> into this file (kvmtool mmap's the file). > >> The flash will be limited to the nearest power-of-2 size, so only the > >> first 2 MB of a 3 MB file will be used. > >> > >> This implements a CFI flash using the "Intel/Sharp extended command > >> set", as specified in: > >> - JEDEC JESD68.01 > >> - JEDEC JEP137B > >> - Intel Application Note 646 > >> Some gaps in those specs have been filled by looking at real devices > >> and > >> other implementations (QEMU, Linux kernel driver). > >> > >> At the moment this relies on DT to advertise the base address of the > >> flash memory (mapped into the MMIO address space) and is only enabled > >> for ARM/ARM64. The emulation itself is architecture agnostic, though. > >> > >> This is one missing piece toward a working UEFI boot with kvmtool on > >> ARM guests, the other is to provide writable PCI BARs, which is WIP. > >> > > I have given this a spin with UEFI built for kvmtool, and it appears > to be working correctly. However, I noticed that it is intolerably > slow, which seems to be caused by the fact that both array mode and > command mode (or whatever it is called in the CFI spec) are fully > emulated, whereas in the QEMU implementation (for instance), the > region is actually exposed to the guest using a read-only KVM memslot > in array mode, and so the read accesses are made natively. > > It is also causing problems in the UEFI implementation, as we can no > longer use unaligned accesses to read from the region, which is > something the code currently relies on (and which works fine on actual > hardware as long as you use normal non-cacheable mappings) > > >>> > >>> Actually, the issue is not alignment. The issue is with instructions > >>> with multiple outputs, which means you cannot do an ordinary memcpy() > >>> from the NOR region using ldp instructions, aligned or not. > >> > >> Yes, we traced that down to an "ldrb with post-inc", in the memcpy code. > >> My suggestion was to provide a version of memcpy_{from,to}_io(), as > >> Linux does, which only uses MMIO accessors to avoid "fancy" instructions. > >> > > > > That is possible, and the impact on the code is manageable, given the > > modular nature of EDK2. > > > >> Back at this point I was challenging the idea of accessing a flash > >> device with a normal memory mapping, because of it failing when being in > >> some query mode. Do you know of any best practices for flash mappings? > >> Are two mappings common? > >> > > > > In the QEMU port of EDK2, we use normal non-cacheable for the first > > flash device, which contains the executable image, and is not > > updatable by the guest. The second flash bank is used for the variable > > store, and is actually mapped as a device all the time. > > > > Another thing I just realized is that you cannot fetch instructions > > from an emulated flash device either, so to execute from NOR flash, > > you will need a true memory mapping as well. > > Wait, did you put the whole of EDK-2 image in the flash? No, my point is that you cannot actually do that, since I don't think you can fetch instructions using MMIO emulation. > My assumption > (and testing) was to use > > $ lkvm run -f KVMTOOL_EFI.fd --flash just_the_variables.img > > Hence my ignorance about performance, because it would just be a few > bytes written/read. -f loads the firmware image into guest RAM. > No, the performance impact is due to the numerous variable accesses done by UEFI during boot. > > So in summary, I think the mode switch is needed to be generally > > useful, even if the current approach is
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
Hi Marc, On Wed, Apr 15, 2020 at 09:55:57AM +0100, Marc Zyngier wrote: > On 2020-04-14 22:31, Will Deacon wrote: > > Although we emit a "SANITY CHECK" warning and taint the kernel if we > > detect a CPU mismatch for AArch32 support at EL1, we still online the > > CPU with disastrous consequences for any running 32-bit VMs. > > > > Introduce a capability for AArch32 support at EL1 so that late onlining > > of incompatible CPUs is forbidden. > > > > Signed-off-by: Will Deacon > > Definitely an improvement over the current situation, as the direct read > of ID_AA64PFR0 was always a bit dodgy. Given that I'm pretty sure these new > braindead SoCs are going to run an older version of the kernel, should we > Cc stable for this? I don't think there's a real need for -stable given that we do at least taint the kernel. That's likely to annoy vendors enough to backport this themselves ;) Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory
On 15/04/2020 17:20, Ard Biesheuvel wrote: > On Wed, 15 Apr 2020 at 18:11, André Przywara wrote: >> >> On 15/04/2020 16:55, Ard Biesheuvel wrote: >>> On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel wrote: On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei wrote: > > Hi, > > I've tested this patch by running badblocks and fio on a flash device > inside a > guest, everything worked as expected. > > I've also looked at the flowcharts for device operation from Intel > Application > Note 646, pages 12-21, and they seem implemented correctly. > > A few minor issues below. > > On 2/21/20 4:55 PM, Andre Przywara wrote: >> From: Raphael Gault >> >> The EDK II UEFI firmware implementation requires some storage for the EFI >> variables, which is typically some flash storage. >> Since this is already supported on the EDK II side, we add a CFI flash >> emulation to kvmtool. >> This is backed by a file, specified via the --flash or -F command line >> option. Any flash writes done by the guest will immediately be reflected >> into this file (kvmtool mmap's the file). >> The flash will be limited to the nearest power-of-2 size, so only the >> first 2 MB of a 3 MB file will be used. >> >> This implements a CFI flash using the "Intel/Sharp extended command >> set", as specified in: >> - JEDEC JESD68.01 >> - JEDEC JEP137B >> - Intel Application Note 646 >> Some gaps in those specs have been filled by looking at real devices and >> other implementations (QEMU, Linux kernel driver). >> >> At the moment this relies on DT to advertise the base address of the >> flash memory (mapped into the MMIO address space) and is only enabled >> for ARM/ARM64. The emulation itself is architecture agnostic, though. >> >> This is one missing piece toward a working UEFI boot with kvmtool on >> ARM guests, the other is to provide writable PCI BARs, which is WIP. >> I have given this a spin with UEFI built for kvmtool, and it appears to be working correctly. However, I noticed that it is intolerably slow, which seems to be caused by the fact that both array mode and command mode (or whatever it is called in the CFI spec) are fully emulated, whereas in the QEMU implementation (for instance), the region is actually exposed to the guest using a read-only KVM memslot in array mode, and so the read accesses are made natively. It is also causing problems in the UEFI implementation, as we can no longer use unaligned accesses to read from the region, which is something the code currently relies on (and which works fine on actual hardware as long as you use normal non-cacheable mappings) >>> >>> Actually, the issue is not alignment. The issue is with instructions >>> with multiple outputs, which means you cannot do an ordinary memcpy() >>> from the NOR region using ldp instructions, aligned or not. >> >> Yes, we traced that down to an "ldrb with post-inc", in the memcpy code. >> My suggestion was to provide a version of memcpy_{from,to}_io(), as >> Linux does, which only uses MMIO accessors to avoid "fancy" instructions. >> > > That is possible, and the impact on the code is manageable, given the > modular nature of EDK2. > >> Back at this point I was challenging the idea of accessing a flash >> device with a normal memory mapping, because of it failing when being in >> some query mode. Do you know of any best practices for flash mappings? >> Are two mappings common? >> > > In the QEMU port of EDK2, we use normal non-cacheable for the first > flash device, which contains the executable image, and is not > updatable by the guest. The second flash bank is used for the variable > store, and is actually mapped as a device all the time. > > Another thing I just realized is that you cannot fetch instructions > from an emulated flash device either, so to execute from NOR flash, > you will need a true memory mapping as well. Wait, did you put the whole of EDK-2 image in the flash? My assumption (and testing) was to use $ lkvm run -f KVMTOOL_EFI.fd --flash just_the_variables.img Hence my ignorance about performance, because it would just be a few bytes written/read. -f loads the firmware image into guest RAM. > So in summary, I think the mode switch is needed to be generally > useful, even if the current approach is sufficient for (slow) > read/write using special memory accessors. Well,in hindsight I regret pursuing this whole flash emulation approach in kvmtool in the first place. Just some magic "this memory region is persistent" (mmapping a file and presenting as a memslot) would be *much* easier on the kvmtool side. It just seems that there wasn't any good DT binding or existing device class for this (to my surprise), or at least not one without issues. And then EDK-2 had this CFI flash support already, so
Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place
On Tue, 14 Apr 2020 17:56:25 +0200 Emanuele Giuseppe Esposito wrote: > The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple > files, each used by a different architecure to initialize the debugfs > entries for statistics. Since they all have the same purpose, they can be > unified in a single common definition in include/linux/kvm_host.h > > Signed-off-by: Emanuele Giuseppe Esposito > --- > arch/arm64/kvm/guest.c| 23 ++--- > arch/mips/kvm/mips.c | 61 ++-- > arch/powerpc/kvm/book3s.c | 61 ++-- > arch/powerpc/kvm/booke.c | 41 > arch/s390/kvm/kvm-s390.c | 203 +++--- > arch/x86/kvm/x86.c| 80 +++ > include/linux/kvm_host.h | 5 + > 7 files changed, 231 insertions(+), 243 deletions(-) Adds a bit of churn, but the end result does look nicer. Looks sane, but did not review in detail. Acked-by: Cornelia Huck ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM/arm64: Support enabling dirty log gradually in small chunks
On 13/04/20 14:20, Keqian Zhu wrote: > There is already support of enabling dirty log graually in small chunks > for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in > small chunks"). This adds support for arm64. > > x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET > is eanbled. However, for arm64, both huge pages and normal pages can be > write protected gradually by userspace. > > Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G > Linux VMs with different page size. The memory pressure is 127G in each > case. The time taken of memory_global_dirty_log_start in QEMU is listed > below: > > Page Size BeforeAfter Optimization > 4K650ms 1.8ms > 2M 4ms 1.8ms > 1G 2ms 1.8ms > > Besides the time reduction, the biggest income is that we will minimize > the performance side effect (because of dissloving huge pages and marking > memslots dirty) on guest after enabling dirty log. > > Signed-off-by: Keqian Zhu > --- > Documentation/virt/kvm/api.rst| 2 +- > arch/arm64/include/asm/kvm_host.h | 3 +++ > virt/kvm/arm/mmu.c| 12 ++-- > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index efbbe570aa9b..0017f63fa44f 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -5777,7 +5777,7 @@ will be initialized to 1 when created. This also > improves performance because > dirty logging can be enabled gradually in small chunks on the first call > to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on > KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on > -x86 for now). > +x86 and arm64 for now). > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index 32c8a675e5a4..a723f84fab83 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -46,6 +46,9 @@ > #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) > #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4) > > +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ > + KVM_DIRTY_LOG_INITIALLY_SET) > + > DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); > > extern unsigned int kvm_sve_max_vl; > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index e3b9ee268823..1077f653a611 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -2265,8 +2265,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, >* allocated dirty_bitmap[], dirty pages will be be tracked while the >* memory slot is write protected. >*/ > - if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) > - kvm_mmu_wp_memory_region(kvm, mem->slot); > + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > + /* > + * If we're with initial-all-set, we don't need to write > + * protect any pages because they're all reported as dirty. > + * Huge pages and normal pages will be write protect gradually. > + */ > + if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) { > + kvm_mmu_wp_memory_region(kvm, mem->slot); > + } > + } > } > > int kvm_arch_prepare_memory_region(struct kvm *kvm, > Marc, what is the status of this patch? Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory
On 15/04/2020 16:55, Ard Biesheuvel wrote: > On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel wrote: >> >> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei >> wrote: >>> >>> Hi, >>> >>> I've tested this patch by running badblocks and fio on a flash device >>> inside a >>> guest, everything worked as expected. >>> >>> I've also looked at the flowcharts for device operation from Intel >>> Application >>> Note 646, pages 12-21, and they seem implemented correctly. >>> >>> A few minor issues below. >>> >>> On 2/21/20 4:55 PM, Andre Przywara wrote: From: Raphael Gault The EDK II UEFI firmware implementation requires some storage for the EFI variables, which is typically some flash storage. Since this is already supported on the EDK II side, we add a CFI flash emulation to kvmtool. This is backed by a file, specified via the --flash or -F command line option. Any flash writes done by the guest will immediately be reflected into this file (kvmtool mmap's the file). The flash will be limited to the nearest power-of-2 size, so only the first 2 MB of a 3 MB file will be used. This implements a CFI flash using the "Intel/Sharp extended command set", as specified in: - JEDEC JESD68.01 - JEDEC JEP137B - Intel Application Note 646 Some gaps in those specs have been filled by looking at real devices and other implementations (QEMU, Linux kernel driver). At the moment this relies on DT to advertise the base address of the flash memory (mapped into the MMIO address space) and is only enabled for ARM/ARM64. The emulation itself is architecture agnostic, though. This is one missing piece toward a working UEFI boot with kvmtool on ARM guests, the other is to provide writable PCI BARs, which is WIP. >> >> I have given this a spin with UEFI built for kvmtool, and it appears >> to be working correctly. However, I noticed that it is intolerably >> slow, which seems to be caused by the fact that both array mode and >> command mode (or whatever it is called in the CFI spec) are fully >> emulated, whereas in the QEMU implementation (for instance), the >> region is actually exposed to the guest using a read-only KVM memslot >> in array mode, and so the read accesses are made natively. >> >> It is also causing problems in the UEFI implementation, as we can no >> longer use unaligned accesses to read from the region, which is >> something the code currently relies on (and which works fine on actual >> hardware as long as you use normal non-cacheable mappings) >> > > Actually, the issue is not alignment. The issue is with instructions > with multiple outputs, which means you cannot do an ordinary memcpy() > from the NOR region using ldp instructions, aligned or not. Yes, we traced that down to an "ldrb with post-inc", in the memcpy code. My suggestion was to provide a version of memcpy_{from,to}_io(), as Linux does, which only uses MMIO accessors to avoid "fancy" instructions. Back at this point I was challenging the idea of accessing a flash device with a normal memory mapping, because of it failing when being in some query mode. Do you know of any best practices for flash mappings? Are two mappings common? Cheers, Andre ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory
On 15/04/2020 16:43, Ard Biesheuvel wrote: Hi Ard, > On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei > wrote: >> >> Hi, >> >> I've tested this patch by running badblocks and fio on a flash device inside >> a >> guest, everything worked as expected. >> >> I've also looked at the flowcharts for device operation from Intel >> Application >> Note 646, pages 12-21, and they seem implemented correctly. >> >> A few minor issues below. >> >> On 2/21/20 4:55 PM, Andre Przywara wrote: >>> From: Raphael Gault >>> >>> The EDK II UEFI firmware implementation requires some storage for the EFI >>> variables, which is typically some flash storage. >>> Since this is already supported on the EDK II side, we add a CFI flash >>> emulation to kvmtool. >>> This is backed by a file, specified via the --flash or -F command line >>> option. Any flash writes done by the guest will immediately be reflected >>> into this file (kvmtool mmap's the file). >>> The flash will be limited to the nearest power-of-2 size, so only the >>> first 2 MB of a 3 MB file will be used. >>> >>> This implements a CFI flash using the "Intel/Sharp extended command >>> set", as specified in: >>> - JEDEC JESD68.01 >>> - JEDEC JEP137B >>> - Intel Application Note 646 >>> Some gaps in those specs have been filled by looking at real devices and >>> other implementations (QEMU, Linux kernel driver). >>> >>> At the moment this relies on DT to advertise the base address of the >>> flash memory (mapped into the MMIO address space) and is only enabled >>> for ARM/ARM64. The emulation itself is architecture agnostic, though. >>> >>> This is one missing piece toward a working UEFI boot with kvmtool on >>> ARM guests, the other is to provide writable PCI BARs, which is WIP. >>> > > I have given this a spin with UEFI built for kvmtool, and it appears > to be working correctly. However, I noticed that it is intolerably > slow, which seems to be caused by the fact that both array mode and > command mode (or whatever it is called in the CFI spec) are fully > emulated, whereas in the QEMU implementation (for instance), the > region is actually exposed to the guest using a read-only KVM memslot > in array mode, and so the read accesses are made natively. Yes, I have been thinking about this, but didn't implement it this way for the following reasons: 1) The use case here is pure UEFI firmware load, which should not be too much affected by performance. At least this what I was thinking so far. Your "intolerably slow" make me wonder if this assumption is not true. Can you elaborate on that? Do you have any numbers? Is that due to the trapping or something else? 2) As you mentioned, we need to switch between trapping and mapping, upon the guest switching between array mode and command mode. So that just adds complexity. Given 1) it didn't seem worth to do. > It is also causing problems in the UEFI implementation, as we can no > longer use unaligned accesses to read from the region, which is > something the code currently relies on (and which works fine on actual > hardware as long as you use normal non-cacheable mappings) So this is something I was discussing with Sami about already: It seems to me that a parallel memory mapped flash chip is actually a device, just with the twist of behaving with normal (ROM) memory semantics under certain circumstances. So for write accesses and read access in any of the query modes we would definitely need to use device memory and MMIO accessors, otherwise the compiler could mess up the carefully crafted access semantics. So does EDK-2 use separate mappings for both cases? Does it make sure to not access the array when being in command mode? I couldn't find any trace of two mappings in the Linux driver, IIRC. > Are there any plans to implement this as well? I am aware that this is > a big ask, but for the general utility of this feature, I think it is > rather important. I wasn't aware that this has a significant impact, so avoided the added complexity. I doesn't sound like a big deal, though, so I might have a look at it. Cheers, Andre ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API
Hi Jacob, On 4/15/20 5:59 PM, Jacob Pan wrote: > On Wed, 15 Apr 2020 16:52:10 +0200 > Auger Eric wrote: > >> Hi Jacob, >> On 4/15/20 12:15 AM, Jacob Pan wrote: >>> Hi Eric, >>> >>> There are some discussions about how to size the uAPI data. >>> https://lkml.org/lkml/2020/4/14/939 >>> >>> I think the problem with the current scheme is that when uAPI data >>> gets extended, if VFIO continue to use: >>> >>> minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, >>> config); if (copy_from_user(, (void __user *)arg, minsz)) >>> >>> It may copy more data from user than what was setup by the user. >>> >>> So, as suggested by Alex, we could add argsz to the IOMMU uAPI >>> struct. So if argsz > minsz, then fail the attach_table since >>> kernel might be old, doesn't know about the extra data. >>> If argsz <= minsz, kernel can support the attach_table but must >>> process the data based on flags or config. >> >> So I guess we would need both an argsz _u32 + a new flag _u32 right? >> > Yes. >> I am ok with that idea. Besides how will you manage for existing IOMMU >> UAPIs? > I plan to add argsz and flags (if not already have one) > >> At some point you envisionned to have a getter at iommu api >> level to retrieve the size of a structure for a given version, right? >> > This idea is shot down. There is no version-size lookup. > So the current plan is for user to fill out argsz in each IOMMU uAPI > struct. VFIO does the copy_from_user() based on argsz (sanitized > against the size of current kernel struct). > > IOMMU vendor driver process the data based on flags which indicates > new capability/extensions. OK. Sounds sensible Thanks Eric > >> Thanks >> >> Eric >>> >>> Does it make sense to you? >>> >>> >>> On Tue, 14 Apr 2020 17:05:55 +0200 >>> Eric Auger wrote: >>> From: Jacob Pan In virtualization use case, when a guest is assigned a PCI host device, protected by a virtual IOMMU on the guest, the physical IOMMU must be programmed to be consistent with the guest mappings. If the physical IOMMU supports two translation stages it makes sense to program guest mappings onto the first stage/level (ARM/Intel terminology) while the host owns the stage/level 2. In that case, it is mandated to trap on guest configuration settings and pass those to the physical iommu driver. This patch adds a new API to the iommu subsystem that allows to set/unset the pasid table information. A generic iommu_pasid_table_config struct is introduced in a new iommu.h uapi header. This is going to be used by the VFIO user API. Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu, Yi L Signed-off-by: Ashok Raj Signed-off-by: Jacob Pan Signed-off-by: Eric Auger Reviewed-by: Jean-Philippe Brucker --- drivers/iommu/iommu.c | 19 ++ include/linux/iommu.h | 18 ++ include/uapi/linux/iommu.h | 51 ++ 3 files changed, 88 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2b471419e26c..b71ad56f8c99 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, } EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); +int iommu_attach_pasid_table(struct iommu_domain *domain, + struct iommu_pasid_table_config *cfg) +{ + if (unlikely(!domain->ops->attach_pasid_table)) + return -ENODEV; + + return domain->ops->attach_pasid_table(domain, cfg); +} +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table); + +void iommu_detach_pasid_table(struct iommu_domain *domain) +{ + if (unlikely(!domain->ops->detach_pasid_table)) + return; + + domain->ops->detach_pasid_table(domain); +} +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7ef8b0bda695..3e1057c3585a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -248,6 +248,8 @@ struct iommu_iotlb_gather { * @cache_invalidate: invalidate translation caches * @sva_bind_gpasid: bind guest pasid and mm * @sva_unbind_gpasid: unbind guest pasid and mm + * @attach_pasid_table: attach a pasid table + * @detach_pasid_table: detach the pasid table * @pgsize_bitmap: bitmap of all possible supported page sizes * @owner: Driver module providing these ops */ @@ -307,6 +309,9 @@ struct iommu_ops { void *drvdata); void (*sva_unbind)(struct iommu_sva *handle);
Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API
On Wed, 15 Apr 2020 16:52:10 +0200 Auger Eric wrote: > Hi Jacob, > On 4/15/20 12:15 AM, Jacob Pan wrote: > > Hi Eric, > > > > There are some discussions about how to size the uAPI data. > > https://lkml.org/lkml/2020/4/14/939 > > > > I think the problem with the current scheme is that when uAPI data > > gets extended, if VFIO continue to use: > > > > minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, > > config); if (copy_from_user(, (void __user *)arg, minsz)) > > > > It may copy more data from user than what was setup by the user. > > > > So, as suggested by Alex, we could add argsz to the IOMMU uAPI > > struct. So if argsz > minsz, then fail the attach_table since > > kernel might be old, doesn't know about the extra data. > > If argsz <= minsz, kernel can support the attach_table but must > > process the data based on flags or config. > > So I guess we would need both an argsz _u32 + a new flag _u32 right? > Yes. > I am ok with that idea. Besides how will you manage for existing IOMMU > UAPIs? I plan to add argsz and flags (if not already have one) > At some point you envisionned to have a getter at iommu api > level to retrieve the size of a structure for a given version, right? > This idea is shot down. There is no version-size lookup. So the current plan is for user to fill out argsz in each IOMMU uAPI struct. VFIO does the copy_from_user() based on argsz (sanitized against the size of current kernel struct). IOMMU vendor driver process the data based on flags which indicates new capability/extensions. > Thanks > > Eric > > > > Does it make sense to you? > > > > > > On Tue, 14 Apr 2020 17:05:55 +0200 > > Eric Auger wrote: > > > >> From: Jacob Pan > >> > >> In virtualization use case, when a guest is assigned > >> a PCI host device, protected by a virtual IOMMU on the guest, > >> the physical IOMMU must be programmed to be consistent with > >> the guest mappings. If the physical IOMMU supports two > >> translation stages it makes sense to program guest mappings > >> onto the first stage/level (ARM/Intel terminology) while the host > >> owns the stage/level 2. > >> > >> In that case, it is mandated to trap on guest configuration > >> settings and pass those to the physical iommu driver. > >> > >> This patch adds a new API to the iommu subsystem that allows > >> to set/unset the pasid table information. > >> > >> A generic iommu_pasid_table_config struct is introduced in > >> a new iommu.h uapi header. This is going to be used by the VFIO > >> user API. > >> > >> Signed-off-by: Jean-Philippe Brucker > >> Signed-off-by: Liu, Yi L > >> Signed-off-by: Ashok Raj > >> Signed-off-by: Jacob Pan > >> Signed-off-by: Eric Auger > >> Reviewed-by: Jean-Philippe Brucker > >> --- > >> drivers/iommu/iommu.c | 19 ++ > >> include/linux/iommu.h | 18 ++ > >> include/uapi/linux/iommu.h | 51 > >> ++ 3 files changed, 88 > >> insertions(+) > >> > >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > >> index 2b471419e26c..b71ad56f8c99 100644 > >> --- a/drivers/iommu/iommu.c > >> +++ b/drivers/iommu/iommu.c > >> @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct > >> iommu_domain *domain, struct device *dev, } > >> EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); > >> > >> +int iommu_attach_pasid_table(struct iommu_domain *domain, > >> + struct iommu_pasid_table_config *cfg) > >> +{ > >> + if (unlikely(!domain->ops->attach_pasid_table)) > >> + return -ENODEV; > >> + > >> + return domain->ops->attach_pasid_table(domain, cfg); > >> +} > >> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table); > >> + > >> +void iommu_detach_pasid_table(struct iommu_domain *domain) > >> +{ > >> + if (unlikely(!domain->ops->detach_pasid_table)) > >> + return; > >> + > >> + domain->ops->detach_pasid_table(domain); > >> +} > >> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table); > >> + > >> static void __iommu_detach_device(struct iommu_domain *domain, > >> struct device *dev) > >> { > >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h > >> index 7ef8b0bda695..3e1057c3585a 100644 > >> --- a/include/linux/iommu.h > >> +++ b/include/linux/iommu.h > >> @@ -248,6 +248,8 @@ struct iommu_iotlb_gather { > >> * @cache_invalidate: invalidate translation caches > >> * @sva_bind_gpasid: bind guest pasid and mm > >> * @sva_unbind_gpasid: unbind guest pasid and mm > >> + * @attach_pasid_table: attach a pasid table > >> + * @detach_pasid_table: detach the pasid table > >> * @pgsize_bitmap: bitmap of all possible supported page sizes > >> * @owner: Driver module providing these ops > >> */ > >> @@ -307,6 +309,9 @@ struct iommu_ops { > >> void *drvdata); > >>void (*sva_unbind)(struct iommu_sva *handle); > >>int (*sva_get_pasid)(struct iommu_sva *handle); > >> + int (*attach_pasid_table)(struct iommu_domain *domain,
Re: [PATCH] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 15/04/20 11:07, Vitaly Kuznetsov wrote: > In case this is no longer needed I'd suggest we drop 'kvm_run' parameter > and extract it from 'struct kvm_vcpu' when needed. This looks like a > natural add-on to your cleanup patch. I agree, though I think it should be _instead_ of Tianjia's patch rather than on top. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API
Hi Jacob, On 4/15/20 12:15 AM, Jacob Pan wrote: > Hi Eric, > > There are some discussions about how to size the uAPI data. > https://lkml.org/lkml/2020/4/14/939 > > I think the problem with the current scheme is that when uAPI data gets > extended, if VFIO continue to use: > > minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, config); > if (copy_from_user(, (void __user *)arg, minsz)) > > It may copy more data from user than what was setup by the user. > > So, as suggested by Alex, we could add argsz to the IOMMU uAPI struct. > So if argsz > minsz, then fail the attach_table since kernel might be > old, doesn't know about the extra data. > If argsz <= minsz, kernel can support the attach_table but must process > the data based on flags or config. So I guess we would need both an argsz _u32 + a new flag _u32 right? I am ok with that idea. Besides how will you manage for existing IOMMU UAPIs? At some point you envisionned to have a getter at iommu api level to retrieve the size of a structure for a given version, right? Thanks Eric > > Does it make sense to you? > > > On Tue, 14 Apr 2020 17:05:55 +0200 > Eric Auger wrote: > >> From: Jacob Pan >> >> In virtualization use case, when a guest is assigned >> a PCI host device, protected by a virtual IOMMU on the guest, >> the physical IOMMU must be programmed to be consistent with >> the guest mappings. If the physical IOMMU supports two >> translation stages it makes sense to program guest mappings >> onto the first stage/level (ARM/Intel terminology) while the host >> owns the stage/level 2. >> >> In that case, it is mandated to trap on guest configuration >> settings and pass those to the physical iommu driver. >> >> This patch adds a new API to the iommu subsystem that allows >> to set/unset the pasid table information. >> >> A generic iommu_pasid_table_config struct is introduced in >> a new iommu.h uapi header. This is going to be used by the VFIO >> user API. >> >> Signed-off-by: Jean-Philippe Brucker >> Signed-off-by: Liu, Yi L >> Signed-off-by: Ashok Raj >> Signed-off-by: Jacob Pan >> Signed-off-by: Eric Auger >> Reviewed-by: Jean-Philippe Brucker >> --- >> drivers/iommu/iommu.c | 19 ++ >> include/linux/iommu.h | 18 ++ >> include/uapi/linux/iommu.h | 51 >> ++ 3 files changed, 88 >> insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index 2b471419e26c..b71ad56f8c99 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct >> iommu_domain *domain, struct device *dev, } >> EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); >> >> +int iommu_attach_pasid_table(struct iommu_domain *domain, >> + struct iommu_pasid_table_config *cfg) >> +{ >> +if (unlikely(!domain->ops->attach_pasid_table)) >> +return -ENODEV; >> + >> +return domain->ops->attach_pasid_table(domain, cfg); >> +} >> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table); >> + >> +void iommu_detach_pasid_table(struct iommu_domain *domain) >> +{ >> +if (unlikely(!domain->ops->detach_pasid_table)) >> +return; >> + >> +domain->ops->detach_pasid_table(domain); >> +} >> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table); >> + >> static void __iommu_detach_device(struct iommu_domain *domain, >>struct device *dev) >> { >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >> index 7ef8b0bda695..3e1057c3585a 100644 >> --- a/include/linux/iommu.h >> +++ b/include/linux/iommu.h >> @@ -248,6 +248,8 @@ struct iommu_iotlb_gather { >> * @cache_invalidate: invalidate translation caches >> * @sva_bind_gpasid: bind guest pasid and mm >> * @sva_unbind_gpasid: unbind guest pasid and mm >> + * @attach_pasid_table: attach a pasid table >> + * @detach_pasid_table: detach the pasid table >> * @pgsize_bitmap: bitmap of all possible supported page sizes >> * @owner: Driver module providing these ops >> */ >> @@ -307,6 +309,9 @@ struct iommu_ops { >>void *drvdata); >> void (*sva_unbind)(struct iommu_sva *handle); >> int (*sva_get_pasid)(struct iommu_sva *handle); >> +int (*attach_pasid_table)(struct iommu_domain *domain, >> + struct iommu_pasid_table_config >> *cfg); >> +void (*detach_pasid_table)(struct iommu_domain *domain); >> >> int (*page_response)(struct device *dev, >> struct iommu_fault_event *evt, >> @@ -446,6 +451,9 @@ extern int iommu_sva_bind_gpasid(struct >> iommu_domain *domain, struct device *dev, struct >> iommu_gpasid_bind_data *data); extern int >> iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device >> *dev, ioasid_t pasid); +extern int iommu_attach_pasid_table(struct >> iommu_domain *domain, >> +struct iommu_pasid_table_config >> *cfg); +extern
Re: [PATCH 1/3] KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
Hi Zenghui, On 2020-04-15 14:15, Zenghui Yu wrote: Hi Marc, On 2020/4/14 18:35, Marc Zyngier wrote: When a guest tries to read the active state of its interrupts, we currently just return whatever state we have in memory. This means that if such an interrupt lives in a List Register on another CPU, we fail to obsertve the latest active state for this interrupt. In order to remedy this, stop all the other vcpus so that they exit and we can observe the most recent value for the state. Reported-by: Julien Grall Signed-off-by: Marc Zyngier --- virt/kvm/arm/vgic/vgic-mmio-v2.c | 4 +- virt/kvm/arm/vgic/vgic-mmio-v3.c | 4 +- virt/kvm/arm/vgic/vgic-mmio.c| 100 --- virt/kvm/arm/vgic/vgic-mmio.h| 3 + 4 files changed, 71 insertions(+), 40 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c index 5945f062d749..d63881f60e1a 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c @@ -422,11 +422,11 @@ static const struct vgic_register_region vgic_v2_dist_registers[] = { VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET, vgic_mmio_read_active, vgic_mmio_write_sactive, - NULL, vgic_mmio_uaccess_write_sactive, 1, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR, vgic_mmio_read_active, vgic_mmio_write_cactive, - NULL, vgic_mmio_uaccess_write_cactive, 1, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI, vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL, diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index e72dcc454247..77c8ba1a2535 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -553,11 +553,11 @@ static const struct vgic_register_region vgic_v3_dist_registers[] = { VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER, vgic_mmio_read_active, vgic_mmio_write_sactive, - NULL, vgic_mmio_uaccess_write_sactive, 1, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER, vgic_mmio_read_active, vgic_mmio_write_cactive, - NULL, vgic_mmio_uaccess_write_cactive, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR, vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL, Shouldn't we also set this uaccess_read cb for GICR_I{S,C}ACTIVER0? Duh. Yes, of course... M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
On Wed, 15 Apr 2020 14:15:51 +0100 Suzuki K Poulose wrote: > On 04/15/2020 11:14 AM, Will Deacon wrote: > > On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote: > >> On 04/14/2020 10:31 PM, Will Deacon wrote: > >>> Although we emit a "SANITY CHECK" warning and taint the kernel if we > >>> detect a CPU mismatch for AArch32 support at EL1, we still online the > >>> CPU with disastrous consequences for any running 32-bit VMs. > >>> > >>> Introduce a capability for AArch32 support at EL1 so that late onlining > >>> of incompatible CPUs is forbidden. > >>> > >>> Signed-off-by: Will Deacon > >> > >> One of the other important missing sanity check for KVM is the VMID width > >> check. I will code something up. > > > > Cheers! Do we handle things like the IPA size already? > > Good point. No, we don't. I will include this too. There is also the question of the ARMv8.5-GTG extension. I have a patch that treats it as non-strict, but that approach would fail with KVM if we online a late CPU without support for the right page size at S2. M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/3] KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
Hi Marc, On 2020/4/14 18:35, Marc Zyngier wrote: When a guest tries to read the active state of its interrupts, we currently just return whatever state we have in memory. This means that if such an interrupt lives in a List Register on another CPU, we fail to obsertve the latest active state for this interrupt. In order to remedy this, stop all the other vcpus so that they exit and we can observe the most recent value for the state. Reported-by: Julien Grall Signed-off-by: Marc Zyngier --- virt/kvm/arm/vgic/vgic-mmio-v2.c | 4 +- virt/kvm/arm/vgic/vgic-mmio-v3.c | 4 +- virt/kvm/arm/vgic/vgic-mmio.c| 100 --- virt/kvm/arm/vgic/vgic-mmio.h| 3 + 4 files changed, 71 insertions(+), 40 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c index 5945f062d749..d63881f60e1a 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c @@ -422,11 +422,11 @@ static const struct vgic_register_region vgic_v2_dist_registers[] = { VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET, vgic_mmio_read_active, vgic_mmio_write_sactive, - NULL, vgic_mmio_uaccess_write_sactive, 1, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR, vgic_mmio_read_active, vgic_mmio_write_cactive, - NULL, vgic_mmio_uaccess_write_cactive, 1, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI, vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL, diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index e72dcc454247..77c8ba1a2535 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -553,11 +553,11 @@ static const struct vgic_register_region vgic_v3_dist_registers[] = { VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER, vgic_mmio_read_active, vgic_mmio_write_sactive, - NULL, vgic_mmio_uaccess_write_sactive, 1, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER, vgic_mmio_read_active, vgic_mmio_write_cactive, - NULL, vgic_mmio_uaccess_write_cactive, + vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 1, VGIC_ACCESS_32bit), REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR, vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL, Shouldn't we also set this uaccess_read cb for GICR_I{S,C}ACTIVER0? Thanks, Zenghui ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
On 04/15/2020 11:14 AM, Will Deacon wrote: On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote: On 04/14/2020 10:31 PM, Will Deacon wrote: Although we emit a "SANITY CHECK" warning and taint the kernel if we detect a CPU mismatch for AArch32 support at EL1, we still online the CPU with disastrous consequences for any running 32-bit VMs. Introduce a capability for AArch32 support at EL1 so that late onlining of incompatible CPUs is forbidden. Signed-off-by: Will Deacon One of the other important missing sanity check for KVM is the VMID width check. I will code something up. Cheers! Do we handle things like the IPA size already? Good point. No, we don't. I will include this too. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place
On Wed, 15 Apr 2020 10:30:06 +0200 Emanuele Giuseppe Esposito wrote: > On 4/15/20 8:36 AM, Philippe Mathieu-Daudé wrote: > > On 4/14/20 5:56 PM, Emanuele Giuseppe Esposito wrote: > >> The macros VM_STAT and VCPU_STAT are redundantly implemented in > >> multiple files, each used by a different architecure to initialize > >> the debugfs entries for statistics. Since they all have the same > >> purpose, they can be unified in a single common definition in > >> include/linux/kvm_host.h > >> > >> Signed-off-by: Emanuele Giuseppe Esposito > >> --- > >> arch/arm64/kvm/guest.c| 23 ++--- > >> arch/mips/kvm/mips.c | 61 ++-- > >> arch/powerpc/kvm/book3s.c | 61 ++-- > >> arch/powerpc/kvm/booke.c | 41 > >> arch/s390/kvm/kvm-s390.c | 203 > >> +++--- arch/x86/kvm/x86.c| > >> 80 +++ include/linux/kvm_host.h | 5 + > >> 7 files changed, 231 insertions(+), 243 deletions(-) > >> > >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > >> index 23ebe51410f0..8417b200bec9 100644 > >> --- a/arch/arm64/kvm/guest.c > >> +++ b/arch/arm64/kvm/guest.c > >> @@ -29,20 +29,17 @@ > >> > >> #include "trace.h" > >> > >> -#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), > >> KVM_STAT_VM } -#define VCPU_STAT(x) { #x, offsetof(struct > >> kvm_vcpu, stat.x), KVM_STAT_VCPU } - > >> struct kvm_stats_debugfs_item debugfs_entries[] = { > >> - VCPU_STAT(halt_successful_poll), > >> - VCPU_STAT(halt_attempted_poll), > >> - VCPU_STAT(halt_poll_invalid), > >> - VCPU_STAT(halt_wakeup), > >> - VCPU_STAT(hvc_exit_stat), > >> - VCPU_STAT(wfe_exit_stat), > >> - VCPU_STAT(wfi_exit_stat), > >> - VCPU_STAT(mmio_exit_user), > >> - VCPU_STAT(mmio_exit_kernel), > >> - VCPU_STAT(exits), > >> + VCPU_STAT("halt_successful_poll", halt_successful_poll), > >> + VCPU_STAT("halt_attempted_poll", halt_attempted_poll), > >> + VCPU_STAT("halt_poll_invalid", halt_poll_invalid), > >> + VCPU_STAT("halt_wakeup", halt_wakeup), > >> + VCPU_STAT("hvc_exit_stat", hvc_exit_stat), > >> + VCPU_STAT("wfe_exit_stat", wfe_exit_stat), > >> + VCPU_STAT("wfi_exit_stat", wfi_exit_stat), > >> + VCPU_STAT("mmio_exit_user", mmio_exit_user), > >> + VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel), > >> + VCPU_STAT("exits", exits), > >>{ NULL } > >> }; > > > > Patch easily reviewed with --word-diff. > > > > [...] > >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > >> index 6d58beb65454..2e6ead872957 100644 > >> --- a/include/linux/kvm_host.h > >> +++ b/include/linux/kvm_host.h > >> @@ -1130,6 +1130,11 @@ struct kvm_stats_debugfs_item { > >> #define KVM_DBGFS_GET_MODE(dbgfs_item) > >> \ ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644) > >> > >> +#define VM_STAT(n, x, ...) > >> \ > >> + { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## > >> __VA_ARGS__ } +#define VCPU_STAT(n, x, ...) > >> \ > >> > > > > Not sure while you use so many whitespaces here... (maybe Paolo can > > strip some when applying?). > > I honestly am not sure why it added few more spaces than I wanted, > but the idea was to follow the KVM_DBGFS_GET_MODE macro above and put > the backslash at the end of the line (80th char). once the whitespace issues are fixed, you can also add Reviewed-by: Claudio Imbrenda > > > > Otherwise it looks nicer that v1, thanks. > > > > Reviewed-by: Philippe Mathieu-Daudé > > > >> + { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## > >> __VA_ARGS__ } + > >> extern struct kvm_stats_debugfs_item debugfs_entries[]; > >> extern struct dentry *kvm_debugfs_dir; > >> > >> > > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
On Wed, Apr 15, 2020 at 12:37:31PM +0100, Suzuki K Poulose wrote: > On 04/15/2020 11:58 AM, Will Deacon wrote: > > On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote: > > > On 04/14/2020 10:31 PM, Will Deacon wrote: > > > > We don't need to be quite as strict about mismatched AArch32 support, > > > > which is good because the friendly hardware folks have been busy > > > > mismatching this to their hearts' content. > > > > > > > > * We don't care about EL2 or EL3 (there are silly comments > > > > concerning > > > > the latter, so remove those) > > > > > > > > * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and > > > > handled > > > > gracefully when a mismatch occurs > > > > > > > > * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and > > > > handled > > > > > > s/EL1/EL0 > > > > > > > gracefully when a mismatch occurs > > > > > > > > Relax the AArch32 checks to FTR_NONSTRICT. > > > > > > Agreed. We should do something similar for the features exposed by the > > > ELF_HWCAP, of course in a separate series. > > > > Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from > > the sanitised feature register values. What am I missing? > > sorry, that was cryptic. I was suggesting to relax the ftr fields to > NONSTRICT for the fields covered by ELF HWCAPs (and other CPU hwcaps). Ah, gotcha. Given that the HWCAPs usually describe EL0 features, I say we can punt this down the road until people give us hardware with mismatched AArch32 at EL0. Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/4] kvm: Replace vcpu->swait with rcuwait
On 14/04/20 23:12, Davidlohr Bueso wrote: > On Wed, 25 Mar 2020, Paolo Bonzini wrote: > >> On 24/03/20 05:44, Davidlohr Bueso wrote: >>> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c >>> index 71244bf87c3a..e049fcb3dffb 100644 >>> --- a/arch/mips/kvm/mips.c >>> +++ b/arch/mips/kvm/mips.c >>> @@ -290,8 +290,7 @@ static enum hrtimer_restart >>> kvm_mips_comparecount_wakeup(struct hrtimer *timer) >>> kvm_mips_callbacks->queue_timer_int(vcpu); >>> >>> vcpu->arch.wait = 0; >>> - if (swq_has_sleeper(>wq)) >>> - swake_up_one(>wq); >>> + rcuwait_wake_up(>wait) >> >> This is missing a semicolon. (KVM MIPS is known not to compile and will >> be changed to "depends on BROKEN" in 5.7). > > Do you want me to send another version with this fix or do you prefer > fixing it when/if picked up? It's up to the TIP tree people, but sending a fixed version is probably the best way to get their attention. :) I can also queue it myself (for 5.7 even) if I get an Acked-by from Peter though. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
On 04/15/2020 11:58 AM, Will Deacon wrote: On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote: On 04/14/2020 10:31 PM, Will Deacon wrote: We don't need to be quite as strict about mismatched AArch32 support, which is good because the friendly hardware folks have been busy mismatching this to their hearts' content. * We don't care about EL2 or EL3 (there are silly comments concerning the latter, so remove those) * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled gracefully when a mismatch occurs * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled s/EL1/EL0 gracefully when a mismatch occurs Relax the AArch32 checks to FTR_NONSTRICT. Agreed. We should do something similar for the features exposed by the ELF_HWCAP, of course in a separate series. Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from the sanitised feature register values. What am I missing? sorry, that was cryptic. I was suggesting to relax the ftr fields to NONSTRICT for the fields covered by ELF HWCAPs (and other CPU hwcaps). Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote: > On 04/14/2020 10:31 PM, Will Deacon wrote: > > We don't need to be quite as strict about mismatched AArch32 support, > > which is good because the friendly hardware folks have been busy > > mismatching this to their hearts' content. > > > >* We don't care about EL2 or EL3 (there are silly comments concerning > > the latter, so remove those) > > > >* EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled > > gracefully when a mismatch occurs > > > >* EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled > > s/EL1/EL0 > > > gracefully when a mismatch occurs > > > > Relax the AArch32 checks to FTR_NONSTRICT. > > Agreed. We should do something similar for the features exposed by the > ELF_HWCAP, of course in a separate series. Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from the sanitised feature register values. What am I missing? Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] KVM: arm64: Drop PTE_S2_MEMATTR_MASK
The only user of PTE_S2_MEMATTR_MASK macro had been removed since commit a501e32430d4 ("arm64: Clean up the default pgprot setting"). It has been about six years and no one has used it again. Let's drop it. Signed-off-by: Zenghui Yu --- arch/arm64/include/asm/pgtable-hwdef.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 6bf5e650da78..99315bdca0e6 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -190,7 +190,6 @@ * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ #define PTE_S2_MEMATTR(t) (_AT(pteval_t, (t)) << 2) -#define PTE_S2_MEMATTR_MASK(_AT(pteval_t, 0xf) << 2) /* * EL2/HYP PTE/PMD definitions -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
On 04/14/2020 10:31 PM, Will Deacon wrote: We don't need to be quite as strict about mismatched AArch32 support, which is good because the friendly hardware folks have been busy mismatching this to their hearts' content. * We don't care about EL2 or EL3 (there are silly comments concerning the latter, so remove those) * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled gracefully when a mismatch occurs * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled s/EL1/EL0 gracefully when a mismatch occurs Relax the AArch32 checks to FTR_NONSTRICT. Agreed. We should do something similar for the features exposed by the ELF_HWCAP, of course in a separate series. Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 6/8] arm64: cpufeature: Relax AArch32 system checks if EL1 is 64-bit only
On 04/14/2020 10:31 PM, Will Deacon wrote: If AArch32 is not supported at EL1, the AArch32 feature register fields no longer advertise support for some system features: * ISAR4.SMC * PFR1.{Virt_frac, Sec_frac, Virtualization, Security, ProgMod} In which case, we don't need to emit "SANITY CHECK" failures for all of them. Add logic to relax the strictness of individual feature register fields at runtime and use this for the fields above if 32-bit EL1 is not supported. Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 5/8] arm64: cpufeature: Factor out checking of AArch32 features
On 04/14/2020 10:31 PM, Will Deacon wrote: update_cpu_features() is pretty large, so split out the checking of the AArch32 features into a separate function and call it after checking the AArch64 features. Signed-off-by: Will Deacon --- arch/arm64/kernel/cpufeature.c | 108 +++-- 1 file changed, 61 insertions(+), 47 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 7dfcdd9e75c1..32828a77acc3 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -715,6 +715,65 @@ static int check_update_ftr_reg(u32 sys_id, int cpu, u64 val, u64 boot) return 1; } +static int update_32bit_cpu_features(int cpu, struct cpuinfo_arm64 *info, +struct cpuinfo_arm64 *boot) +{ ... - if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) { taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu, info->reg_zcr, boot->reg_zcr); @@ -845,6 +857,8 @@ void update_cpu_features(int cpu, sve_update_vq_map(); } + taint |= update_32bit_cpu_features(cpu, info, boot); + This relies on the assumption that the id_aa64pfr0 has been sanitised. It may be worth adding a comment to make sure people (hacking the kernel) don't move this around and break that dependency. Either ways: Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 4/8] arm64: cpufeature: Remove redundant call to id_aa64pfr0_32bit_el0()
On 04/14/2020 10:31 PM, Will Deacon wrote: There's no need to call id_aa64pfr0_32bit_el0() twice because the sanitised value of ID_AA64PFR0_EL1 has already been updated for the CPU being onlined. Remove the redundant function call. Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 838fe5cc8d7e..7dfcdd9e75c1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -792,9 +792,7 @@ void update_cpu_features(int cpu, * If we have AArch32, we care about 32-bit features for compat. * If the system doesn't support AArch32, don't update them. */ - if (id_aa64pfr0_32bit_el0(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1)) && - id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) { - + if (id_aa64pfr0_32bit_el0(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1))) { taint |= check_update_ftr_reg(SYS_ID_DFR0_EL1, cpu, info->reg_id_dfr0, boot->reg_id_dfr0); taint |= check_update_ftr_reg(SYS_ID_ISAR0_EL1, cpu, ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote: > On 04/14/2020 10:31 PM, Will Deacon wrote: > > Although we emit a "SANITY CHECK" warning and taint the kernel if we > > detect a CPU mismatch for AArch32 support at EL1, we still online the > > CPU with disastrous consequences for any running 32-bit VMs. > > > > Introduce a capability for AArch32 support at EL1 so that late onlining > > of incompatible CPUs is forbidden. > > > > Signed-off-by: Will Deacon > > One of the other important missing sanity check for KVM is the VMID width > check. I will code something up. Cheers! Do we handle things like the IPA size already? Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
On 04/14/2020 10:31 PM, Will Deacon wrote: Although we emit a "SANITY CHECK" warning and taint the kernel if we detect a CPU mismatch for AArch32 support at EL1, we still online the CPU with disastrous consequences for any running 32-bit VMs. Introduce a capability for AArch32 support at EL1 so that late onlining of incompatible CPUs is forbidden. Signed-off-by: Will Deacon One of the other important missing sanity check for KVM is the VMID width check. I will code something up. Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 2/8] arm64: cpufeature: Spell out register fields for ID_ISAR4 and ID_PFR1
On 04/14/2020 10:31 PM, Will Deacon wrote: In preparation for runtime updates to the strictness of some AArch32 features, spell out the register fields for ID_ISAR4 and ID_PFR1 to make things clearer to read. Note that this isn't functionally necessary, as the feature arrays themselves are not modified dynamically and remain 'const'. Signed-off-by: Will Deacon --- arch/arm64/include/asm/sysreg.h | 17 + arch/arm64/kernel/cpufeature.c | 28 ++-- 2 files changed, 43 insertions(+), 2 deletions(-) Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/8] arm64: cpufeature: Relax check for IESB support
Hi Will On 04/14/2020 10:31 PM, Will Deacon wrote: From: Sai Prakash Ranjan We don't care if IESB is supported or not as we always set SCTLR_ELx.IESB and, if it works, that's really great. Relax the ID_AA64MMFR2.IESB cpufeature check so that we don't warn and taint if it's mismatched. Signed-off-by: Sai Prakash Ranjan [will: rewrote commit message] Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9fac745aa7bb..63df28e6a425 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -247,7 +247,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_IESB_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_IESB_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LSM_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_UAO_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_CNP_SHIFT, 4, 0), ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/4] kvm: Replace vcpu->swait with rcuwait
On Wed, 25 Mar 2020, Paolo Bonzini wrote: On 24/03/20 05:44, Davidlohr Bueso wrote: diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 71244bf87c3a..e049fcb3dffb 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -290,8 +290,7 @@ static enum hrtimer_restart kvm_mips_comparecount_wakeup(struct hrtimer *timer) kvm_mips_callbacks->queue_timer_int(vcpu); vcpu->arch.wait = 0; - if (swq_has_sleeper(>wq)) - swake_up_one(>wq); + rcuwait_wake_up(>wait) This is missing a semicolon. (KVM MIPS is known not to compile and will be changed to "depends on BROKEN" in 5.7). Do you want me to send another version with this fix or do you prefer fixing it when/if picked up? Thanks, Davidlohr ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place
On 4/14/20 5:56 PM, Emanuele Giuseppe Esposito wrote: > The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple > files, each used by a different architecure to initialize the debugfs > entries for statistics. Since they all have the same purpose, they can be > unified in a single common definition in include/linux/kvm_host.h > > Signed-off-by: Emanuele Giuseppe Esposito > --- > arch/arm64/kvm/guest.c| 23 ++--- > arch/mips/kvm/mips.c | 61 ++-- > arch/powerpc/kvm/book3s.c | 61 ++-- > arch/powerpc/kvm/booke.c | 41 > arch/s390/kvm/kvm-s390.c | 203 +++--- > arch/x86/kvm/x86.c| 80 +++ > include/linux/kvm_host.h | 5 + > 7 files changed, 231 insertions(+), 243 deletions(-) > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > index 23ebe51410f0..8417b200bec9 100644 > --- a/arch/arm64/kvm/guest.c > +++ b/arch/arm64/kvm/guest.c > @@ -29,20 +29,17 @@ > > #include "trace.h" > > -#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM } > -#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU } > - > struct kvm_stats_debugfs_item debugfs_entries[] = { > - VCPU_STAT(halt_successful_poll), > - VCPU_STAT(halt_attempted_poll), > - VCPU_STAT(halt_poll_invalid), > - VCPU_STAT(halt_wakeup), > - VCPU_STAT(hvc_exit_stat), > - VCPU_STAT(wfe_exit_stat), > - VCPU_STAT(wfi_exit_stat), > - VCPU_STAT(mmio_exit_user), > - VCPU_STAT(mmio_exit_kernel), > - VCPU_STAT(exits), > + VCPU_STAT("halt_successful_poll", halt_successful_poll), > + VCPU_STAT("halt_attempted_poll", halt_attempted_poll), > + VCPU_STAT("halt_poll_invalid", halt_poll_invalid), > + VCPU_STAT("halt_wakeup", halt_wakeup), > + VCPU_STAT("hvc_exit_stat", hvc_exit_stat), > + VCPU_STAT("wfe_exit_stat", wfe_exit_stat), > + VCPU_STAT("wfi_exit_stat", wfi_exit_stat), > + VCPU_STAT("mmio_exit_user", mmio_exit_user), > + VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel), > + VCPU_STAT("exits", exits), > { NULL } > }; Patch easily reviewed with --word-diff. [...] > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 6d58beb65454..2e6ead872957 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1130,6 +1130,11 @@ struct kvm_stats_debugfs_item { > #define KVM_DBGFS_GET_MODE(dbgfs_item) > \ > ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644) > > +#define VM_STAT(n, x, ...) >\ > + { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ } > +#define VCPU_STAT(n, x, ...) >\ Not sure while you use so many whitespaces here... (maybe Paolo can strip some when applying?). Otherwise it looks nicer that v1, thanks. Reviewed-by: Philippe Mathieu-Daudé > + { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ } > + > extern struct kvm_stats_debugfs_item debugfs_entries[]; > extern struct dentry *kvm_debugfs_dir; > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place
On 4/15/20 8:36 AM, Philippe Mathieu-Daudé wrote: On 4/14/20 5:56 PM, Emanuele Giuseppe Esposito wrote: The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple files, each used by a different architecure to initialize the debugfs entries for statistics. Since they all have the same purpose, they can be unified in a single common definition in include/linux/kvm_host.h Signed-off-by: Emanuele Giuseppe Esposito --- arch/arm64/kvm/guest.c| 23 ++--- arch/mips/kvm/mips.c | 61 ++-- arch/powerpc/kvm/book3s.c | 61 ++-- arch/powerpc/kvm/booke.c | 41 arch/s390/kvm/kvm-s390.c | 203 +++--- arch/x86/kvm/x86.c| 80 +++ include/linux/kvm_host.h | 5 + 7 files changed, 231 insertions(+), 243 deletions(-) diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 23ebe51410f0..8417b200bec9 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -29,20 +29,17 @@ #include "trace.h" -#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM } -#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU } - struct kvm_stats_debugfs_item debugfs_entries[] = { - VCPU_STAT(halt_successful_poll), - VCPU_STAT(halt_attempted_poll), - VCPU_STAT(halt_poll_invalid), - VCPU_STAT(halt_wakeup), - VCPU_STAT(hvc_exit_stat), - VCPU_STAT(wfe_exit_stat), - VCPU_STAT(wfi_exit_stat), - VCPU_STAT(mmio_exit_user), - VCPU_STAT(mmio_exit_kernel), - VCPU_STAT(exits), + VCPU_STAT("halt_successful_poll", halt_successful_poll), + VCPU_STAT("halt_attempted_poll", halt_attempted_poll), + VCPU_STAT("halt_poll_invalid", halt_poll_invalid), + VCPU_STAT("halt_wakeup", halt_wakeup), + VCPU_STAT("hvc_exit_stat", hvc_exit_stat), + VCPU_STAT("wfe_exit_stat", wfe_exit_stat), + VCPU_STAT("wfi_exit_stat", wfi_exit_stat), + VCPU_STAT("mmio_exit_user", mmio_exit_user), + VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel), + VCPU_STAT("exits", exits), { NULL } }; Patch easily reviewed with --word-diff. [...] diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6d58beb65454..2e6ead872957 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1130,6 +1130,11 @@ struct kvm_stats_debugfs_item { #define KVM_DBGFS_GET_MODE(dbgfs_item) \ ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644) +#define VM_STAT(n, x, ...) \ + { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ } +#define VCPU_STAT(n, x, ...) \ Not sure while you use so many whitespaces here... (maybe Paolo can strip some when applying?). I honestly am not sure why it added few more spaces than I wanted, but the idea was to follow the KVM_DBGFS_GET_MODE macro above and put the backslash at the end of the line (80th char). Otherwise it looks nicer that v1, thanks. Reviewed-by: Philippe Mathieu-Daudé + { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ } + extern struct kvm_stats_debugfs_item debugfs_entries[]; extern struct dentry *kvm_debugfs_dir; ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: Optimize kvm_arch_vcpu_ioctl_run function
On 2020/4/14 22:26, Vitaly Kuznetsov wrote: Tianjia Zhang writes: kvm_arch_vcpu_ioctl_run() is only called in the file kvm_main.c, where vcpu->run is the kvm_run parameter, so it has been replaced. Signed-off-by: Tianjia Zhang --- arch/x86/kvm/x86.c | 8 virt/kvm/arm/arm.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3bf2ecafd027..70e3f4abbd4d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8726,18 +8726,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = -EAGAIN; if (signal_pending(current)) { r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; + kvm_run->exit_reason = KVM_EXIT_INTR; I have a more generic question: why do we need to pass 'kvm_run' to kvm_arch_vcpu_ioctl_run() if it can be extracted from 'struct kvm_vcpu'? The only call site looks like virt/kvm/kvm_main.c:r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run); In the earlier version, kvm_run is used to pass parameters with user mode and is not included in the vcpu structure, so it has been retained until now. Thanks, Tianjia ++vcpu->stat.signal_exits; } goto out; } - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { r = -EINVAL; goto out; } - if (vcpu->run->kvm_dirty_regs) { + if (kvm_run->kvm_dirty_regs) { r = sync_regs(vcpu); if (r != 0) goto out; @@ -8767,7 +8767,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) out: kvm_put_guest_fpu(vcpu); - if (vcpu->run->kvm_valid_regs) + if (kvm_run->kvm_valid_regs) store_regs(vcpu); post_kvm_run_save(vcpu); kvm_sigset_deactivate(vcpu); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 48d0ec44ad77..ab9d7966a4c8 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -659,7 +659,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) return ret; if (run->exit_reason == KVM_EXIT_MMIO) { - ret = kvm_handle_mmio_return(vcpu, vcpu->run); + ret = kvm_handle_mmio_return(vcpu, run); if (ret) return ret; } ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
Hi Will, On 2020-04-14 22:31, Will Deacon wrote: Although we emit a "SANITY CHECK" warning and taint the kernel if we detect a CPU mismatch for AArch32 support at EL1, we still online the CPU with disastrous consequences for any running 32-bit VMs. Introduce a capability for AArch32 support at EL1 so that late onlining of incompatible CPUs is forbidden. Signed-off-by: Will Deacon Definitely an improvement over the current situation, as the direct read of ID_AA64PFR0 was always a bit dodgy. Given that I'm pretty sure these new braindead SoCs are going to run an older version of the kernel, should we Cc stable for this? Otherwise: Acked-by: Marc Zyngier M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
stage2_unmap_vm() was introduced to unmap user RAM region in the stage2 page table to make the caches coherent. E.g., a guest reboot with stage1 MMU disabled will access memory using non-cacheable attributes. If the RAM and caches are not coherent at this stage, some evicted dirty cache line may go and corrupt guest data in RAM. Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage of it to configure the stage2 page table and the attributes of memory access. So we ensure that guests always access memory using cacheable attributes and thus, the caches always be coherent. So on CPUs that support S2FWB, we can safely reset the vcpu without a heavy stage2 unmapping. Cc: Marc Zyngier Cc: Christoffer Dall Cc: James Morse Cc: Julien Thierry Cc: Suzuki K Poulose Signed-off-by: Zenghui Yu --- If this is correct, there should be a great performance improvement on a guest reboot (or reset) on systems support S2FWB. But I'm afraid that I've missed some points here, so please comment! The commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm") was merged about six years ago and I failed to track its histroy and intention. Instead of a whole stage2 unmapping, something like stage2_flush_vm() looks enough to me. But again, I'm unsure... Thanks for having a look! virt/kvm/arm/arm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 48d0ec44ad77..e6378162cdef 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu, /* * Ensure a rebooted VM will fault in RAM pages and detect if the * guest MMU is turned off and flush the caches as needed. +* +* S2FWB enforces all memory accesses to RAM being cacheable, we +* ensure that the cache is always coherent. */ - if (vcpu->arch.has_run_once) + if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) stage2_unmap_vm(vcpu->kvm); vcpu_reset_hcr(vcpu); -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm