Re: [PATCH v2] KVM/arm64: Support enabling dirty log gradually in small chunks

2020-04-15 Thread zhukeqian
Hi Marc,

In RFC patch, I still write protect huge pages when DIRTY_LOG_INITIALLY_ALL_SET
is enabled by userspace. I find that both huge pages and normal pages can be
write protected during log clear. So this formal patch is pretty simple now.

Thanks,
Keqian

On 2020/4/13 20:20, Keqian Zhu wrote:
> There is already support of enabling dirty log graually in small chunks
> for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in
> small chunks"). This adds support for arm64.
> 
> x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET
> is eanbled. However, for arm64, both huge pages and normal pages can be
> write protected gradually by userspace.
> 
> Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G
> Linux VMs with different page size. The memory pressure is 127G in each
> case. The time taken of memory_global_dirty_log_start in QEMU is listed
> below:
> 
> Page Size  BeforeAfter Optimization
>   4K650ms 1.8ms
>   2M 4ms  1.8ms
>   1G 2ms  1.8ms
> 
> Besides the time reduction, the biggest income is that we will minimize
> the performance side effect (because of dissloving huge pages and marking
> memslots dirty) on guest after enabling dirty log.
> 
> Signed-off-by: Keqian Zhu 
> ---
>  Documentation/virt/kvm/api.rst|  2 +-
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  virt/kvm/arm/mmu.c| 12 ++--
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index efbbe570aa9b..0017f63fa44f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5777,7 +5777,7 @@ will be initialized to 1 when created.  This also 
> improves performance because
>  dirty logging can be enabled gradually in small chunks on the first call
>  to KVM_CLEAR_DIRTY_LOG.  KVM_DIRTY_LOG_INITIALLY_SET depends on
>  KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on
> -x86 for now).
> +x86 and arm64 for now).
>  
>  KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
>  KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 32c8a675e5a4..a723f84fab83 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -46,6 +46,9 @@
>  #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3)
>  #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4)
>  
> +#define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
> +  KVM_DIRTY_LOG_INITIALLY_SET)
> +
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
>  extern unsigned int kvm_sve_max_vl;
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index e3b9ee268823..1077f653a611 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -2265,8 +2265,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>* allocated dirty_bitmap[], dirty pages will be be tracked while the
>* memory slot is write protected.
>*/
> - if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES)
> - kvm_mmu_wp_memory_region(kvm, mem->slot);
> + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> + /*
> +  * If we're with initial-all-set, we don't need to write
> +  * protect any pages because they're all reported as dirty.
> +  * Huge pages and normal pages will be write protect gradually.
> +  */
> + if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) {
> + kvm_mmu_wp_memory_region(kvm, mem->slot);
> + }
> + }
>  }
>  
>  int kvm_arch_prepare_memory_region(struct kvm *kvm,
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()

2020-04-15 Thread Zenghui Yu

On 2020/4/14 11:03, Zenghui Yu wrote:

If we're going to fail out the vgic_add_lpi(), let's make sure the
allocated vgic_irq memory is also freed. Though it seems that both
cases are unlikely to fail.

Cc: Zengruan Ye 
Signed-off-by: Zenghui Yu 
---
  virt/kvm/arm/vgic/vgic-its.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index d53d34a33e35..3c3b6a0f2dce 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -98,12 +98,16 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 
intid,
 * the respective config data from memory here upon mapping the LPI.
 */
ret = update_lpi_config(kvm, irq, NULL, false);
-   if (ret)
+   if (ret) {
+   kfree(irq);
return ERR_PTR(ret);
+   }
  
  	ret = vgic_v3_lpi_sync_pending_status(kvm, irq);

-   if (ret)
+   if (ret) {
+   kfree(irq);
return ERR_PTR(ret);
+   }


Looking at it again, I realized that this error handling is still not
complete. Maybe we should use a vgic_put_irq() instead so that we can
also properly delete the vgic_irq from lpi_list.

Marc, what do you think? Could you please help to fix it, or I can
resend it.


Thanks,
Zenghui

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:
>
> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
> wrote:
> >
> > Hi,
> >
> > I've tested this patch by running badblocks and fio on a flash device 
> > inside a
> > guest, everything worked as expected.
> >
> > I've also looked at the flowcharts for device operation from Intel 
> > Application
> > Note 646, pages 12-21, and they seem implemented correctly.
> >
> > A few minor issues below.
> >
> > On 2/21/20 4:55 PM, Andre Przywara wrote:
> > > From: Raphael Gault 
> > >
> > > The EDK II UEFI firmware implementation requires some storage for the EFI
> > > variables, which is typically some flash storage.
> > > Since this is already supported on the EDK II side, we add a CFI flash
> > > emulation to kvmtool.
> > > This is backed by a file, specified via the --flash or -F command line
> > > option. Any flash writes done by the guest will immediately be reflected
> > > into this file (kvmtool mmap's the file).
> > > The flash will be limited to the nearest power-of-2 size, so only the
> > > first 2 MB of a 3 MB file will be used.
> > >
> > > This implements a CFI flash using the "Intel/Sharp extended command
> > > set", as specified in:
> > > - JEDEC JESD68.01
> > > - JEDEC JEP137B
> > > - Intel Application Note 646
> > > Some gaps in those specs have been filled by looking at real devices and
> > > other implementations (QEMU, Linux kernel driver).
> > >
> > > At the moment this relies on DT to advertise the base address of the
> > > flash memory (mapped into the MMIO address space) and is only enabled
> > > for ARM/ARM64. The emulation itself is architecture agnostic, though.
> > >
> > > This is one missing piece toward a working UEFI boot with kvmtool on
> > > ARM guests, the other is to provide writable PCI BARs, which is WIP.
> > >
>
> I have given this a spin with UEFI built for kvmtool, and it appears
> to be working correctly. However, I noticed that it is intolerably
> slow, which seems to be caused by the fact that both array mode and
> command mode (or whatever it is called in the CFI spec) are fully
> emulated, whereas in the QEMU implementation (for instance), the
> region is actually exposed to the guest using a read-only KVM memslot
> in array mode, and so the read accesses are made natively.
>
> It is also causing problems in the UEFI implementation, as we can no
> longer use unaligned accesses to read from the region, which is
> something the code currently relies on (and which works fine on actual
> hardware as long as you use normal non-cacheable mappings)
>

Actually, the issue is not alignment. The issue is with instructions
with multiple outputs, which means you cannot do an ordinary memcpy()
from the NOR region using ldp instructions, aligned or not.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  wrote:
>
> Hi,
>
> I've tested this patch by running badblocks and fio on a flash device inside a
> guest, everything worked as expected.
>
> I've also looked at the flowcharts for device operation from Intel Application
> Note 646, pages 12-21, and they seem implemented correctly.
>
> A few minor issues below.
>
> On 2/21/20 4:55 PM, Andre Przywara wrote:
> > From: Raphael Gault 
> >
> > The EDK II UEFI firmware implementation requires some storage for the EFI
> > variables, which is typically some flash storage.
> > Since this is already supported on the EDK II side, we add a CFI flash
> > emulation to kvmtool.
> > This is backed by a file, specified via the --flash or -F command line
> > option. Any flash writes done by the guest will immediately be reflected
> > into this file (kvmtool mmap's the file).
> > The flash will be limited to the nearest power-of-2 size, so only the
> > first 2 MB of a 3 MB file will be used.
> >
> > This implements a CFI flash using the "Intel/Sharp extended command
> > set", as specified in:
> > - JEDEC JESD68.01
> > - JEDEC JEP137B
> > - Intel Application Note 646
> > Some gaps in those specs have been filled by looking at real devices and
> > other implementations (QEMU, Linux kernel driver).
> >
> > At the moment this relies on DT to advertise the base address of the
> > flash memory (mapped into the MMIO address space) and is only enabled
> > for ARM/ARM64. The emulation itself is architecture agnostic, though.
> >
> > This is one missing piece toward a working UEFI boot with kvmtool on
> > ARM guests, the other is to provide writable PCI BARs, which is WIP.
> >

I have given this a spin with UEFI built for kvmtool, and it appears
to be working correctly. However, I noticed that it is intolerably
slow, which seems to be caused by the fact that both array mode and
command mode (or whatever it is called in the CFI spec) are fully
emulated, whereas in the QEMU implementation (for instance), the
region is actually exposed to the guest using a read-only KVM memslot
in array mode, and so the read accesses are made natively.

It is also causing problems in the UEFI implementation, as we can no
longer use unaligned accesses to read from the region, which is
something the code currently relies on (and which works fine on actual
hardware as long as you use normal non-cacheable mappings)

Are there any plans to implement this as well? I am aware that this is
a big ask, but for the general utility of this feature, I think it is
rather important.

-- 
Ard.


> > Signed-off-by: Raphael Gault 
> > [Andre: rewriting and fixing]
> > Signed-off-by: Andre Przywra 
> > ---
> > Hi,
> >
> > an update fixing Alexandru's review comments (many thanks for those!)
> > The biggest change code-wise is the split of the MMIO handler into three
> > different functions. Another significant change is the rounding *down* of
> > the present flash file size to the nearest power-of-two, to match flash
> > hardware chips and Linux' expectations.
> >
> > Cheers,
> > Andre
> >
> > Changelog v2 .. v3:
> > - Breaking MMIO handling into three separate functions.
> > - Assing the flash base address in the memory map, but stay at 32 MB for 
> > now.
> >   The MMIO area has been moved up to 48 MB, to never overlap with the
> >   flash.
> > - Impose a limit of 16 MB for the flash size, mostly to fit into the
> >   (for now) fixed memory map.
> > - Trim flash size down to nearest power-of-2, to match hardware.
> > - Announce forced flash size trimming.
> > - Rework the CFI query table slightly, to add the addresses as array
> >   indicies.
> > - Fix error handling when creating the flash device.
> > - Fix pow2_size implementation for 0 and 1 as input values.
> > - Fix write buffer size handling.
> > - Improve some comments.
> >
> > Changelog v1 .. v2:
> > - Add locking for MMIO handling.
> > - Fold flash read into handler.
> > - Move pow2_size() into generic header.
> > - Spell out flash base address.
> >
> >  Makefile  |   6 +
> >  arm/include/arm-common/kvm-arch.h |   8 +-
> >  builtin-run.c |   2 +
> >  hw/cfi_flash.c| 576 ++
> >  include/kvm/kvm-config.h  |   1 +
> >  include/kvm/util.h|   8 +
> >  6 files changed, 599 insertions(+), 2 deletions(-)
> >  create mode 100644 hw/cfi_flash.c
> >
> > diff --git a/Makefile b/Makefile
> > index 3862112c..7ed6fb5e 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -170,6 +170,7 @@ ifeq ($(ARCH), arm)
> >   CFLAGS  += -march=armv7-a
> >
> >   ARCH_WANT_LIBFDT := y
> > + ARCH_HAS_FLASH_MEM := y
> >  endif
> >
> >  # ARM64
> > @@ -182,6 +183,7 @@ ifeq ($(ARCH), arm64)
> >   ARCH_INCLUDE+= -Iarm/aarch64/include
> >
> >   ARCH_WANT_LIBFDT := y
> > + ARCH_HAS_FLASH_MEM := y
> >  endif
> >
> >  ifeq ($(ARCH),mips)
> > @@ -261,6 +263,10 @@ ifeq (y,$(ARCH_HAS_FRAMEBUFFER))
> >   

Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Wed, 15 Apr 2020 at 18:11, André Przywara  wrote:
>
> On 15/04/2020 16:55, Ard Biesheuvel wrote:
> > On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:
> >>
> >> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I've tested this patch by running badblocks and fio on a flash device 
> >>> inside a
> >>> guest, everything worked as expected.
> >>>
> >>> I've also looked at the flowcharts for device operation from Intel 
> >>> Application
> >>> Note 646, pages 12-21, and they seem implemented correctly.
> >>>
> >>> A few minor issues below.
> >>>
> >>> On 2/21/20 4:55 PM, Andre Przywara wrote:
>  From: Raphael Gault 
> 
>  The EDK II UEFI firmware implementation requires some storage for the EFI
>  variables, which is typically some flash storage.
>  Since this is already supported on the EDK II side, we add a CFI flash
>  emulation to kvmtool.
>  This is backed by a file, specified via the --flash or -F command line
>  option. Any flash writes done by the guest will immediately be reflected
>  into this file (kvmtool mmap's the file).
>  The flash will be limited to the nearest power-of-2 size, so only the
>  first 2 MB of a 3 MB file will be used.
> 
>  This implements a CFI flash using the "Intel/Sharp extended command
>  set", as specified in:
>  - JEDEC JESD68.01
>  - JEDEC JEP137B
>  - Intel Application Note 646
>  Some gaps in those specs have been filled by looking at real devices and
>  other implementations (QEMU, Linux kernel driver).
> 
>  At the moment this relies on DT to advertise the base address of the
>  flash memory (mapped into the MMIO address space) and is only enabled
>  for ARM/ARM64. The emulation itself is architecture agnostic, though.
> 
>  This is one missing piece toward a working UEFI boot with kvmtool on
>  ARM guests, the other is to provide writable PCI BARs, which is WIP.
> 
> >>
> >> I have given this a spin with UEFI built for kvmtool, and it appears
> >> to be working correctly. However, I noticed that it is intolerably
> >> slow, which seems to be caused by the fact that both array mode and
> >> command mode (or whatever it is called in the CFI spec) are fully
> >> emulated, whereas in the QEMU implementation (for instance), the
> >> region is actually exposed to the guest using a read-only KVM memslot
> >> in array mode, and so the read accesses are made natively.
> >>
> >> It is also causing problems in the UEFI implementation, as we can no
> >> longer use unaligned accesses to read from the region, which is
> >> something the code currently relies on (and which works fine on actual
> >> hardware as long as you use normal non-cacheable mappings)
> >>
> >
> > Actually, the issue is not alignment. The issue is with instructions
> > with multiple outputs, which means you cannot do an ordinary memcpy()
> > from the NOR region using ldp instructions, aligned or not.
>
> Yes, we traced that down to an "ldrb with post-inc", in the memcpy code.
> My suggestion was to provide a version of memcpy_{from,to}_io(), as
> Linux does, which only uses MMIO accessors to avoid "fancy" instructions.
>

That is possible, and the impact on the code is manageable, given the
modular nature of EDK2.

> Back at this point I was challenging the idea of accessing a flash
> device with a normal memory mapping, because of it failing when being in
> some query mode. Do you know of any best practices for flash mappings?
> Are two mappings common?
>

In the QEMU port of EDK2, we use normal non-cacheable for the first
flash device, which contains the executable image, and is not
updatable by the guest. The second flash bank is used for the variable
store, and is actually mapped as a device all the time.

Another thing I just realized is that you cannot fetch instructions
from an emulated flash device either, so to execute from NOR flash,
you will need a true memory mapping as well.

So in summary, I think the mode switch is needed to be generally
useful, even if the current approach is sufficient for (slow)
read/write using special memory accessors.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Wed, 15 Apr 2020 at 18:36, André Przywara  wrote:
>
> On 15/04/2020 17:20, Ard Biesheuvel wrote:
> > On Wed, 15 Apr 2020 at 18:11, André Przywara  wrote:
> >>
> >> On 15/04/2020 16:55, Ard Biesheuvel wrote:
> >>> On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:
> 
>  On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
>  wrote:
> >
> > Hi,
> >
> > I've tested this patch by running badblocks and fio on a flash device 
> > inside a
> > guest, everything worked as expected.
> >
> > I've also looked at the flowcharts for device operation from Intel 
> > Application
> > Note 646, pages 12-21, and they seem implemented correctly.
> >
> > A few minor issues below.
> >
> > On 2/21/20 4:55 PM, Andre Przywara wrote:
> >> From: Raphael Gault 
> >>
> >> The EDK II UEFI firmware implementation requires some storage for the 
> >> EFI
> >> variables, which is typically some flash storage.
> >> Since this is already supported on the EDK II side, we add a CFI flash
> >> emulation to kvmtool.
> >> This is backed by a file, specified via the --flash or -F command line
> >> option. Any flash writes done by the guest will immediately be 
> >> reflected
> >> into this file (kvmtool mmap's the file).
> >> The flash will be limited to the nearest power-of-2 size, so only the
> >> first 2 MB of a 3 MB file will be used.
> >>
> >> This implements a CFI flash using the "Intel/Sharp extended command
> >> set", as specified in:
> >> - JEDEC JESD68.01
> >> - JEDEC JEP137B
> >> - Intel Application Note 646
> >> Some gaps in those specs have been filled by looking at real devices 
> >> and
> >> other implementations (QEMU, Linux kernel driver).
> >>
> >> At the moment this relies on DT to advertise the base address of the
> >> flash memory (mapped into the MMIO address space) and is only enabled
> >> for ARM/ARM64. The emulation itself is architecture agnostic, though.
> >>
> >> This is one missing piece toward a working UEFI boot with kvmtool on
> >> ARM guests, the other is to provide writable PCI BARs, which is WIP.
> >>
> 
>  I have given this a spin with UEFI built for kvmtool, and it appears
>  to be working correctly. However, I noticed that it is intolerably
>  slow, which seems to be caused by the fact that both array mode and
>  command mode (or whatever it is called in the CFI spec) are fully
>  emulated, whereas in the QEMU implementation (for instance), the
>  region is actually exposed to the guest using a read-only KVM memslot
>  in array mode, and so the read accesses are made natively.
> 
>  It is also causing problems in the UEFI implementation, as we can no
>  longer use unaligned accesses to read from the region, which is
>  something the code currently relies on (and which works fine on actual
>  hardware as long as you use normal non-cacheable mappings)
> 
> >>>
> >>> Actually, the issue is not alignment. The issue is with instructions
> >>> with multiple outputs, which means you cannot do an ordinary memcpy()
> >>> from the NOR region using ldp instructions, aligned or not.
> >>
> >> Yes, we traced that down to an "ldrb with post-inc", in the memcpy code.
> >> My suggestion was to provide a version of memcpy_{from,to}_io(), as
> >> Linux does, which only uses MMIO accessors to avoid "fancy" instructions.
> >>
> >
> > That is possible, and the impact on the code is manageable, given the
> > modular nature of EDK2.
> >
> >> Back at this point I was challenging the idea of accessing a flash
> >> device with a normal memory mapping, because of it failing when being in
> >> some query mode. Do you know of any best practices for flash mappings?
> >> Are two mappings common?
> >>
> >
> > In the QEMU port of EDK2, we use normal non-cacheable for the first
> > flash device, which contains the executable image, and is not
> > updatable by the guest. The second flash bank is used for the variable
> > store, and is actually mapped as a device all the time.
> >
> > Another thing I just realized is that you cannot fetch instructions
> > from an emulated flash device either, so to execute from NOR flash,
> > you will need a true memory mapping as well.
>
> Wait, did you put the whole of EDK-2 image in the flash?

No, my point is that you cannot actually do that, since I don't think
you can fetch instructions using MMIO emulation.

> My assumption
> (and testing) was to use
>
> $ lkvm run -f KVMTOOL_EFI.fd --flash just_the_variables.img
>
> Hence my ignorance about performance, because it would just be a few
> bytes written/read. -f loads the firmware image into guest RAM.
>

No, the performance impact is due to the numerous variable accesses
done by UEFI during boot.

> > So in summary, I think the mode switch is needed to be generally
> > useful, even if the current approach is 

Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support

2020-04-15 Thread Will Deacon
Hi Marc,

On Wed, Apr 15, 2020 at 09:55:57AM +0100, Marc Zyngier wrote:
> On 2020-04-14 22:31, Will Deacon wrote:
> > Although we emit a "SANITY CHECK" warning and taint the kernel if we
> > detect a CPU mismatch for AArch32 support at EL1, we still online the
> > CPU with disastrous consequences for any running 32-bit VMs.
> > 
> > Introduce a capability for AArch32 support at EL1 so that late onlining
> > of incompatible CPUs is forbidden.
> > 
> > Signed-off-by: Will Deacon 
> 
> Definitely an improvement over the current situation, as the direct read
> of ID_AA64PFR0 was always a bit dodgy. Given that I'm pretty sure these new
> braindead SoCs are going to run an older version of the kernel, should we
> Cc stable for this?

I don't think there's a real need for -stable given that we do at least
taint the kernel. That's likely to annoy vendors enough to backport this
themselves ;)

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread André Przywara
On 15/04/2020 17:20, Ard Biesheuvel wrote:
> On Wed, 15 Apr 2020 at 18:11, André Przywara  wrote:
>>
>> On 15/04/2020 16:55, Ard Biesheuvel wrote:
>>> On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:

 On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
 wrote:
>
> Hi,
>
> I've tested this patch by running badblocks and fio on a flash device 
> inside a
> guest, everything worked as expected.
>
> I've also looked at the flowcharts for device operation from Intel 
> Application
> Note 646, pages 12-21, and they seem implemented correctly.
>
> A few minor issues below.
>
> On 2/21/20 4:55 PM, Andre Przywara wrote:
>> From: Raphael Gault 
>>
>> The EDK II UEFI firmware implementation requires some storage for the EFI
>> variables, which is typically some flash storage.
>> Since this is already supported on the EDK II side, we add a CFI flash
>> emulation to kvmtool.
>> This is backed by a file, specified via the --flash or -F command line
>> option. Any flash writes done by the guest will immediately be reflected
>> into this file (kvmtool mmap's the file).
>> The flash will be limited to the nearest power-of-2 size, so only the
>> first 2 MB of a 3 MB file will be used.
>>
>> This implements a CFI flash using the "Intel/Sharp extended command
>> set", as specified in:
>> - JEDEC JESD68.01
>> - JEDEC JEP137B
>> - Intel Application Note 646
>> Some gaps in those specs have been filled by looking at real devices and
>> other implementations (QEMU, Linux kernel driver).
>>
>> At the moment this relies on DT to advertise the base address of the
>> flash memory (mapped into the MMIO address space) and is only enabled
>> for ARM/ARM64. The emulation itself is architecture agnostic, though.
>>
>> This is one missing piece toward a working UEFI boot with kvmtool on
>> ARM guests, the other is to provide writable PCI BARs, which is WIP.
>>

 I have given this a spin with UEFI built for kvmtool, and it appears
 to be working correctly. However, I noticed that it is intolerably
 slow, which seems to be caused by the fact that both array mode and
 command mode (or whatever it is called in the CFI spec) are fully
 emulated, whereas in the QEMU implementation (for instance), the
 region is actually exposed to the guest using a read-only KVM memslot
 in array mode, and so the read accesses are made natively.

 It is also causing problems in the UEFI implementation, as we can no
 longer use unaligned accesses to read from the region, which is
 something the code currently relies on (and which works fine on actual
 hardware as long as you use normal non-cacheable mappings)

>>>
>>> Actually, the issue is not alignment. The issue is with instructions
>>> with multiple outputs, which means you cannot do an ordinary memcpy()
>>> from the NOR region using ldp instructions, aligned or not.
>>
>> Yes, we traced that down to an "ldrb with post-inc", in the memcpy code.
>> My suggestion was to provide a version of memcpy_{from,to}_io(), as
>> Linux does, which only uses MMIO accessors to avoid "fancy" instructions.
>>
> 
> That is possible, and the impact on the code is manageable, given the
> modular nature of EDK2.
> 
>> Back at this point I was challenging the idea of accessing a flash
>> device with a normal memory mapping, because of it failing when being in
>> some query mode. Do you know of any best practices for flash mappings?
>> Are two mappings common?
>>
> 
> In the QEMU port of EDK2, we use normal non-cacheable for the first
> flash device, which contains the executable image, and is not
> updatable by the guest. The second flash bank is used for the variable
> store, and is actually mapped as a device all the time.
> 
> Another thing I just realized is that you cannot fetch instructions
> from an emulated flash device either, so to execute from NOR flash,
> you will need a true memory mapping as well.

Wait, did you put the whole of EDK-2 image in the flash? My assumption
(and testing) was to use

$ lkvm run -f KVMTOOL_EFI.fd --flash just_the_variables.img

Hence my ignorance about performance, because it would just be a few
bytes written/read. -f loads the firmware image into guest RAM.

> So in summary, I think the mode switch is needed to be generally
> useful, even if the current approach is sufficient for (slow)
> read/write using special memory accessors.

Well,in hindsight I regret pursuing this whole flash emulation approach
in kvmtool in the first place. Just some magic "this memory region is
persistent" (mmapping a file and presenting as a memslot) would be
*much* easier on the kvmtool side. It just seems that there wasn't any
good DT binding or existing device class for this (to my surprise), or
at least not one without issues. And then EDK-2 had this CFI flash
support already, so 

Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place

2020-04-15 Thread Cornelia Huck
On Tue, 14 Apr 2020 17:56:25 +0200
Emanuele Giuseppe Esposito  wrote:

> The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple
> files, each used by a different architecure to initialize the debugfs
> entries for statistics. Since they all have the same purpose, they can be
> unified in a single common definition in include/linux/kvm_host.h
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  arch/arm64/kvm/guest.c|  23 ++---
>  arch/mips/kvm/mips.c  |  61 ++--
>  arch/powerpc/kvm/book3s.c |  61 ++--
>  arch/powerpc/kvm/booke.c  |  41 
>  arch/s390/kvm/kvm-s390.c  | 203 +++---
>  arch/x86/kvm/x86.c|  80 +++
>  include/linux/kvm_host.h  |   5 +
>  7 files changed, 231 insertions(+), 243 deletions(-)

Adds a bit of churn, but the end result does look nicer. Looks sane,
but did not review in detail.

Acked-by: Cornelia Huck 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] KVM/arm64: Support enabling dirty log gradually in small chunks

2020-04-15 Thread Paolo Bonzini
On 13/04/20 14:20, Keqian Zhu wrote:
> There is already support of enabling dirty log graually in small chunks
> for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in
> small chunks"). This adds support for arm64.
> 
> x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET
> is eanbled. However, for arm64, both huge pages and normal pages can be
> write protected gradually by userspace.
> 
> Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G
> Linux VMs with different page size. The memory pressure is 127G in each
> case. The time taken of memory_global_dirty_log_start in QEMU is listed
> below:
> 
> Page Size  BeforeAfter Optimization
>   4K650ms 1.8ms
>   2M 4ms  1.8ms
>   1G 2ms  1.8ms
> 
> Besides the time reduction, the biggest income is that we will minimize
> the performance side effect (because of dissloving huge pages and marking
> memslots dirty) on guest after enabling dirty log.
> 
> Signed-off-by: Keqian Zhu 
> ---
>  Documentation/virt/kvm/api.rst|  2 +-
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  virt/kvm/arm/mmu.c| 12 ++--
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index efbbe570aa9b..0017f63fa44f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5777,7 +5777,7 @@ will be initialized to 1 when created.  This also 
> improves performance because
>  dirty logging can be enabled gradually in small chunks on the first call
>  to KVM_CLEAR_DIRTY_LOG.  KVM_DIRTY_LOG_INITIALLY_SET depends on
>  KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on
> -x86 for now).
> +x86 and arm64 for now).
>  
>  KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
>  KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 32c8a675e5a4..a723f84fab83 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -46,6 +46,9 @@
>  #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3)
>  #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4)
>  
> +#define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
> +  KVM_DIRTY_LOG_INITIALLY_SET)
> +
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
>  extern unsigned int kvm_sve_max_vl;
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index e3b9ee268823..1077f653a611 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -2265,8 +2265,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>* allocated dirty_bitmap[], dirty pages will be be tracked while the
>* memory slot is write protected.
>*/
> - if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES)
> - kvm_mmu_wp_memory_region(kvm, mem->slot);
> + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> + /*
> +  * If we're with initial-all-set, we don't need to write
> +  * protect any pages because they're all reported as dirty.
> +  * Huge pages and normal pages will be write protect gradually.
> +  */
> + if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) {
> + kvm_mmu_wp_memory_region(kvm, mem->slot);
> + }
> + }
>  }
>  
>  int kvm_arch_prepare_memory_region(struct kvm *kvm,
> 

Marc, what is the status of this patch?

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread André Przywara
On 15/04/2020 16:55, Ard Biesheuvel wrote:
> On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:
>>
>> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
>> wrote:
>>>
>>> Hi,
>>>
>>> I've tested this patch by running badblocks and fio on a flash device 
>>> inside a
>>> guest, everything worked as expected.
>>>
>>> I've also looked at the flowcharts for device operation from Intel 
>>> Application
>>> Note 646, pages 12-21, and they seem implemented correctly.
>>>
>>> A few minor issues below.
>>>
>>> On 2/21/20 4:55 PM, Andre Przywara wrote:
 From: Raphael Gault 

 The EDK II UEFI firmware implementation requires some storage for the EFI
 variables, which is typically some flash storage.
 Since this is already supported on the EDK II side, we add a CFI flash
 emulation to kvmtool.
 This is backed by a file, specified via the --flash or -F command line
 option. Any flash writes done by the guest will immediately be reflected
 into this file (kvmtool mmap's the file).
 The flash will be limited to the nearest power-of-2 size, so only the
 first 2 MB of a 3 MB file will be used.

 This implements a CFI flash using the "Intel/Sharp extended command
 set", as specified in:
 - JEDEC JESD68.01
 - JEDEC JEP137B
 - Intel Application Note 646
 Some gaps in those specs have been filled by looking at real devices and
 other implementations (QEMU, Linux kernel driver).

 At the moment this relies on DT to advertise the base address of the
 flash memory (mapped into the MMIO address space) and is only enabled
 for ARM/ARM64. The emulation itself is architecture agnostic, though.

 This is one missing piece toward a working UEFI boot with kvmtool on
 ARM guests, the other is to provide writable PCI BARs, which is WIP.

>>
>> I have given this a spin with UEFI built for kvmtool, and it appears
>> to be working correctly. However, I noticed that it is intolerably
>> slow, which seems to be caused by the fact that both array mode and
>> command mode (or whatever it is called in the CFI spec) are fully
>> emulated, whereas in the QEMU implementation (for instance), the
>> region is actually exposed to the guest using a read-only KVM memslot
>> in array mode, and so the read accesses are made natively.
>>
>> It is also causing problems in the UEFI implementation, as we can no
>> longer use unaligned accesses to read from the region, which is
>> something the code currently relies on (and which works fine on actual
>> hardware as long as you use normal non-cacheable mappings)
>>
> 
> Actually, the issue is not alignment. The issue is with instructions
> with multiple outputs, which means you cannot do an ordinary memcpy()
> from the NOR region using ldp instructions, aligned or not.

Yes, we traced that down to an "ldrb with post-inc", in the memcpy code.
My suggestion was to provide a version of memcpy_{from,to}_io(), as
Linux does, which only uses MMIO accessors to avoid "fancy" instructions.

Back at this point I was challenging the idea of accessing a flash
device with a normal memory mapping, because of it failing when being in
some query mode. Do you know of any best practices for flash mappings?
Are two mappings common?

Cheers,
Andre
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread André Przywara
On 15/04/2020 16:43, Ard Biesheuvel wrote:

Hi Ard,

> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
> wrote:
>>
>> Hi,
>>
>> I've tested this patch by running badblocks and fio on a flash device inside 
>> a
>> guest, everything worked as expected.
>>
>> I've also looked at the flowcharts for device operation from Intel 
>> Application
>> Note 646, pages 12-21, and they seem implemented correctly.
>>
>> A few minor issues below.
>>
>> On 2/21/20 4:55 PM, Andre Przywara wrote:
>>> From: Raphael Gault 
>>>
>>> The EDK II UEFI firmware implementation requires some storage for the EFI
>>> variables, which is typically some flash storage.
>>> Since this is already supported on the EDK II side, we add a CFI flash
>>> emulation to kvmtool.
>>> This is backed by a file, specified via the --flash or -F command line
>>> option. Any flash writes done by the guest will immediately be reflected
>>> into this file (kvmtool mmap's the file).
>>> The flash will be limited to the nearest power-of-2 size, so only the
>>> first 2 MB of a 3 MB file will be used.
>>>
>>> This implements a CFI flash using the "Intel/Sharp extended command
>>> set", as specified in:
>>> - JEDEC JESD68.01
>>> - JEDEC JEP137B
>>> - Intel Application Note 646
>>> Some gaps in those specs have been filled by looking at real devices and
>>> other implementations (QEMU, Linux kernel driver).
>>>
>>> At the moment this relies on DT to advertise the base address of the
>>> flash memory (mapped into the MMIO address space) and is only enabled
>>> for ARM/ARM64. The emulation itself is architecture agnostic, though.
>>>
>>> This is one missing piece toward a working UEFI boot with kvmtool on
>>> ARM guests, the other is to provide writable PCI BARs, which is WIP.
>>>
> 
> I have given this a spin with UEFI built for kvmtool, and it appears
> to be working correctly. However, I noticed that it is intolerably
> slow, which seems to be caused by the fact that both array mode and
> command mode (or whatever it is called in the CFI spec) are fully
> emulated, whereas in the QEMU implementation (for instance), the
> region is actually exposed to the guest using a read-only KVM memslot
> in array mode, and so the read accesses are made natively.

Yes, I have been thinking about this, but didn't implement it this way
for the following reasons:
1) The use case here is pure UEFI firmware load, which should not be too
much affected by performance. At least this what I was thinking so far.
Your "intolerably slow" make me wonder if this assumption is not true.
Can you elaborate on that? Do you have any numbers? Is that due to the
trapping or something else?
2) As you mentioned, we need to switch between trapping and mapping,
upon the guest switching between array mode and command mode. So that
just adds complexity. Given 1) it didn't seem worth to do.

> It is also causing problems in the UEFI implementation, as we can no
> longer use unaligned accesses to read from the region, which is
> something the code currently relies on (and which works fine on actual
> hardware as long as you use normal non-cacheable mappings)

So this is something I was discussing with Sami about already:
It seems to me that a parallel memory mapped flash chip is actually a
device, just with the twist of behaving with normal (ROM) memory
semantics under certain circumstances. So for write accesses and read
access in any of the query modes we would definitely need to use device
memory and MMIO accessors, otherwise the compiler could mess up the
carefully crafted access semantics. So does EDK-2 use separate mappings
for both cases? Does it make sure to not access the array when being in
command mode? I couldn't find any trace of two mappings in the Linux
driver, IIRC.

> Are there any plans to implement this as well? I am aware that this is
> a big ask, but for the general utility of this feature, I think it is
> rather important.

I wasn't aware that this has a significant impact, so avoided the added
complexity. I doesn't sound like a big deal, though, so I might have a
look at it.

Cheers,
Andre
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API

2020-04-15 Thread Auger Eric
Hi Jacob,

On 4/15/20 5:59 PM, Jacob Pan wrote:
> On Wed, 15 Apr 2020 16:52:10 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>> On 4/15/20 12:15 AM, Jacob Pan wrote:
>>> Hi Eric,
>>>
>>> There are some discussions about how to size the uAPI data.
>>> https://lkml.org/lkml/2020/4/14/939
>>>
>>> I think the problem with the current scheme is that when uAPI data
>>> gets extended, if VFIO continue to use:
>>>
>>> minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table,
>>> config); if (copy_from_user(, (void __user *)arg, minsz))
>>>
>>> It may copy more data from user than what was setup by the user.
>>>
>>> So, as suggested by Alex, we could add argsz to the IOMMU uAPI
>>> struct. So if argsz > minsz, then fail the attach_table since
>>> kernel might be old, doesn't know about the extra data.
>>> If argsz <= minsz, kernel can support the attach_table but must
>>> process the data based on flags or config.  
>>
>> So I guess we would need both an argsz _u32 + a new flag _u32 right?
>>
> Yes.
>> I am ok with that idea. Besides how will you manage for existing IOMMU
>> UAPIs?
> I plan to add argsz and flags (if not already have one)
> 
>> At some point you envisionned to have a getter at iommu api
>> level to retrieve the size of a structure for a given version, right?
>>
> This idea is shot down. There is no version-size lookup.
> So the current plan is for user to fill out argsz in each IOMMU uAPI
> struct. VFIO does the copy_from_user() based on argsz (sanitized
> against the size of current kernel struct).
> 
> IOMMU vendor driver process the data based on flags which indicates
> new capability/extensions.
OK. Sounds sensible

Thanks

Eric
> 
>> Thanks
>>
>> Eric
>>>
>>> Does it make sense to you?
>>>
>>>
>>> On Tue, 14 Apr 2020 17:05:55 +0200
>>> Eric Auger  wrote:
>>>   
 From: Jacob Pan 

 In virtualization use case, when a guest is assigned
 a PCI host device, protected by a virtual IOMMU on the guest,
 the physical IOMMU must be programmed to be consistent with
 the guest mappings. If the physical IOMMU supports two
 translation stages it makes sense to program guest mappings
 onto the first stage/level (ARM/Intel terminology) while the host
 owns the stage/level 2.

 In that case, it is mandated to trap on guest configuration
 settings and pass those to the physical iommu driver.

 This patch adds a new API to the iommu subsystem that allows
 to set/unset the pasid table information.

 A generic iommu_pasid_table_config struct is introduced in
 a new iommu.h uapi header. This is going to be used by the VFIO
 user API.

 Signed-off-by: Jean-Philippe Brucker
  Signed-off-by: Liu, Yi L
  Signed-off-by: Ashok Raj
  Signed-off-by: Jacob Pan
  Signed-off-by: Eric Auger
  Reviewed-by: Jean-Philippe Brucker
  ---
  drivers/iommu/iommu.c  | 19 ++
  include/linux/iommu.h  | 18 ++
  include/uapi/linux/iommu.h | 51
 ++ 3 files changed, 88
 insertions(+)

 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index 2b471419e26c..b71ad56f8c99 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct
 iommu_domain *domain, struct device *dev, }
  EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
  
 +int iommu_attach_pasid_table(struct iommu_domain *domain,
 +   struct iommu_pasid_table_config *cfg)
 +{
 +  if (unlikely(!domain->ops->attach_pasid_table))
 +  return -ENODEV;
 +
 +  return domain->ops->attach_pasid_table(domain, cfg);
 +}
 +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
 +
 +void iommu_detach_pasid_table(struct iommu_domain *domain)
 +{
 +  if (unlikely(!domain->ops->detach_pasid_table))
 +  return;
 +
 +  domain->ops->detach_pasid_table(domain);
 +}
 +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
 +
  static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
  {
 diff --git a/include/linux/iommu.h b/include/linux/iommu.h
 index 7ef8b0bda695..3e1057c3585a 100644
 --- a/include/linux/iommu.h
 +++ b/include/linux/iommu.h
 @@ -248,6 +248,8 @@ struct iommu_iotlb_gather {
   * @cache_invalidate: invalidate translation caches
   * @sva_bind_gpasid: bind guest pasid and mm
   * @sva_unbind_gpasid: unbind guest pasid and mm
 + * @attach_pasid_table: attach a pasid table
 + * @detach_pasid_table: detach the pasid table
   * @pgsize_bitmap: bitmap of all possible supported page sizes
   * @owner: Driver module providing these ops
   */
 @@ -307,6 +309,9 @@ struct iommu_ops {
  void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);

Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API

2020-04-15 Thread Jacob Pan
On Wed, 15 Apr 2020 16:52:10 +0200
Auger Eric  wrote:

> Hi Jacob,
> On 4/15/20 12:15 AM, Jacob Pan wrote:
> > Hi Eric,
> > 
> > There are some discussions about how to size the uAPI data.
> > https://lkml.org/lkml/2020/4/14/939
> > 
> > I think the problem with the current scheme is that when uAPI data
> > gets extended, if VFIO continue to use:
> > 
> > minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table,
> > config); if (copy_from_user(, (void __user *)arg, minsz))
> > 
> > It may copy more data from user than what was setup by the user.
> > 
> > So, as suggested by Alex, we could add argsz to the IOMMU uAPI
> > struct. So if argsz > minsz, then fail the attach_table since
> > kernel might be old, doesn't know about the extra data.
> > If argsz <= minsz, kernel can support the attach_table but must
> > process the data based on flags or config.  
> 
> So I guess we would need both an argsz _u32 + a new flag _u32 right?
> 
Yes.
> I am ok with that idea. Besides how will you manage for existing IOMMU
> UAPIs?
I plan to add argsz and flags (if not already have one)

> At some point you envisionned to have a getter at iommu api
> level to retrieve the size of a structure for a given version, right?
> 
This idea is shot down. There is no version-size lookup.
So the current plan is for user to fill out argsz in each IOMMU uAPI
struct. VFIO does the copy_from_user() based on argsz (sanitized
against the size of current kernel struct).

IOMMU vendor driver process the data based on flags which indicates
new capability/extensions.

> Thanks
> 
> Eric
> > 
> > Does it make sense to you?
> > 
> > 
> > On Tue, 14 Apr 2020 17:05:55 +0200
> > Eric Auger  wrote:
> >   
> >> From: Jacob Pan 
> >>
> >> In virtualization use case, when a guest is assigned
> >> a PCI host device, protected by a virtual IOMMU on the guest,
> >> the physical IOMMU must be programmed to be consistent with
> >> the guest mappings. If the physical IOMMU supports two
> >> translation stages it makes sense to program guest mappings
> >> onto the first stage/level (ARM/Intel terminology) while the host
> >> owns the stage/level 2.
> >>
> >> In that case, it is mandated to trap on guest configuration
> >> settings and pass those to the physical iommu driver.
> >>
> >> This patch adds a new API to the iommu subsystem that allows
> >> to set/unset the pasid table information.
> >>
> >> A generic iommu_pasid_table_config struct is introduced in
> >> a new iommu.h uapi header. This is going to be used by the VFIO
> >> user API.
> >>
> >> Signed-off-by: Jean-Philippe Brucker
> >>  Signed-off-by: Liu, Yi L
> >>  Signed-off-by: Ashok Raj
> >>  Signed-off-by: Jacob Pan
> >>  Signed-off-by: Eric Auger
> >>  Reviewed-by: Jean-Philippe Brucker
> >>  ---
> >>  drivers/iommu/iommu.c  | 19 ++
> >>  include/linux/iommu.h  | 18 ++
> >>  include/uapi/linux/iommu.h | 51
> >> ++ 3 files changed, 88
> >> insertions(+)
> >>
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index 2b471419e26c..b71ad56f8c99 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct
> >> iommu_domain *domain, struct device *dev, }
> >>  EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> >>  
> >> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> >> +   struct iommu_pasid_table_config *cfg)
> >> +{
> >> +  if (unlikely(!domain->ops->attach_pasid_table))
> >> +  return -ENODEV;
> >> +
> >> +  return domain->ops->attach_pasid_table(domain, cfg);
> >> +}
> >> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
> >> +
> >> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> >> +{
> >> +  if (unlikely(!domain->ops->detach_pasid_table))
> >> +  return;
> >> +
> >> +  domain->ops->detach_pasid_table(domain);
> >> +}
> >> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> >> +
> >>  static void __iommu_detach_device(struct iommu_domain *domain,
> >>  struct device *dev)
> >>  {
> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >> index 7ef8b0bda695..3e1057c3585a 100644
> >> --- a/include/linux/iommu.h
> >> +++ b/include/linux/iommu.h
> >> @@ -248,6 +248,8 @@ struct iommu_iotlb_gather {
> >>   * @cache_invalidate: invalidate translation caches
> >>   * @sva_bind_gpasid: bind guest pasid and mm
> >>   * @sva_unbind_gpasid: unbind guest pasid and mm
> >> + * @attach_pasid_table: attach a pasid table
> >> + * @detach_pasid_table: detach the pasid table
> >>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> >>   * @owner: Driver module providing these ops
> >>   */
> >> @@ -307,6 +309,9 @@ struct iommu_ops {
> >>  void *drvdata);
> >>void (*sva_unbind)(struct iommu_sva *handle);
> >>int (*sva_get_pasid)(struct iommu_sva *handle);
> >> +  int (*attach_pasid_table)(struct iommu_domain *domain,

Re: [PATCH] KVM: Optimize kvm_arch_vcpu_ioctl_run function

2020-04-15 Thread Paolo Bonzini
On 15/04/20 11:07, Vitaly Kuznetsov wrote:
> In case this is no longer needed I'd suggest we drop 'kvm_run' parameter
> and extract it from 'struct kvm_vcpu' when needed. This looks like a
> natural add-on to your cleanup patch.

I agree, though I think it should be _instead_ of Tianjia's patch rather
than on top.

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API

2020-04-15 Thread Auger Eric
Hi Jacob,
On 4/15/20 12:15 AM, Jacob Pan wrote:
> Hi Eric,
> 
> There are some discussions about how to size the uAPI data.
> https://lkml.org/lkml/2020/4/14/939
> 
> I think the problem with the current scheme is that when uAPI data gets
> extended, if VFIO continue to use:
> 
> minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, config);
> if (copy_from_user(, (void __user *)arg, minsz))
> 
> It may copy more data from user than what was setup by the user.
> 
> So, as suggested by Alex, we could add argsz to the IOMMU uAPI struct.
> So if argsz > minsz, then fail the attach_table since kernel might be
> old, doesn't know about the extra data.
> If argsz <= minsz, kernel can support the attach_table but must process
> the data based on flags or config.

So I guess we would need both an argsz _u32 + a new flag _u32 right?

I am ok with that idea. Besides how will you manage for existing IOMMU
UAPIs? At some point you envisionned to have a getter at iommu api level
to retrieve the size of a structure for a given version, right?

Thanks

Eric
> 
> Does it make sense to you?
> 
> 
> On Tue, 14 Apr 2020 17:05:55 +0200
> Eric Auger  wrote:
> 
>> From: Jacob Pan 
>>
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Eric Auger 
>> Reviewed-by: Jean-Philippe Brucker 
>> ---
>>  drivers/iommu/iommu.c  | 19 ++
>>  include/linux/iommu.h  | 18 ++
>>  include/uapi/linux/iommu.h | 51
>> ++ 3 files changed, 88
>> insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 2b471419e26c..b71ad56f8c99 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct
>> iommu_domain *domain, struct device *dev, }
>>  EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> + struct iommu_pasid_table_config *cfg)
>> +{
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +return domain->ops->attach_pasid_table(domain, cfg);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +if (unlikely(!domain->ops->detach_pasid_table))
>> +return;
>> +
>> +domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 7ef8b0bda695..3e1057c3585a 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -248,6 +248,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * @detach_pasid_table: detach the pasid table
>>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>>   * @owner: Driver module providing these ops
>>   */
>> @@ -307,6 +309,9 @@ struct iommu_ops {
>>void *drvdata);
>>  void (*sva_unbind)(struct iommu_sva *handle);
>>  int (*sva_get_pasid)(struct iommu_sva *handle);
>> +int (*attach_pasid_table)(struct iommu_domain *domain,
>> +  struct iommu_pasid_table_config
>> *cfg);
>> +void (*detach_pasid_table)(struct iommu_domain *domain);
>>  
>>  int (*page_response)(struct device *dev,
>>   struct iommu_fault_event *evt,
>> @@ -446,6 +451,9 @@ extern int iommu_sva_bind_gpasid(struct
>> iommu_domain *domain, struct device *dev, struct
>> iommu_gpasid_bind_data *data); extern int
>> iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device
>> *dev, ioasid_t pasid); +extern int iommu_attach_pasid_table(struct
>> iommu_domain *domain,
>> +struct iommu_pasid_table_config
>> *cfg); +extern 

Re: [PATCH 1/3] KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read

2020-04-15 Thread Marc Zyngier

Hi Zenghui,

On 2020-04-15 14:15, Zenghui Yu wrote:

Hi Marc,

On 2020/4/14 18:35, Marc Zyngier wrote:

When a guest tries to read the active state of its interrupts,
we currently just return whatever state we have in memory. This
means that if such an interrupt lives in a List Register on another
CPU, we fail to obsertve the latest active state for this interrupt.

In order to remedy this, stop all the other vcpus so that they exit
and we can observe the most recent value for the state.

Reported-by: Julien Grall 
Signed-off-by: Marc Zyngier 
---
  virt/kvm/arm/vgic/vgic-mmio-v2.c |   4 +-
  virt/kvm/arm/vgic/vgic-mmio-v3.c |   4 +-
  virt/kvm/arm/vgic/vgic-mmio.c| 100 
---

  virt/kvm/arm/vgic/vgic-mmio.h|   3 +
  4 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c 
b/virt/kvm/arm/vgic/vgic-mmio-v2.c

index 5945f062d749..d63881f60e1a 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -422,11 +422,11 @@ static const struct vgic_register_region 
vgic_v2_dist_registers[] = {

VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET,
vgic_mmio_read_active, vgic_mmio_write_sactive,
-   NULL, vgic_mmio_uaccess_write_sactive, 1,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR,
vgic_mmio_read_active, vgic_mmio_write_cactive,
-   NULL, vgic_mmio_uaccess_write_cactive, 1,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI,
vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c 
b/virt/kvm/arm/vgic/vgic-mmio-v3.c

index e72dcc454247..77c8ba1a2535 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -553,11 +553,11 @@ static const struct vgic_register_region 
vgic_v3_dist_registers[] = {

VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER,
vgic_mmio_read_active, vgic_mmio_write_sactive,
-   NULL, vgic_mmio_uaccess_write_sactive, 1,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER,
vgic_mmio_read_active, vgic_mmio_write_cactive,
-   NULL, vgic_mmio_uaccess_write_cactive,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive,
1, VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR,
vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,


Shouldn't we also set this uaccess_read cb for GICR_I{S,C}ACTIVER0?


Duh. Yes, of course...

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support

2020-04-15 Thread Marc Zyngier
On Wed, 15 Apr 2020 14:15:51 +0100
Suzuki K Poulose  wrote:

> On 04/15/2020 11:14 AM, Will Deacon wrote:
> > On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote:  
> >> On 04/14/2020 10:31 PM, Will Deacon wrote:  
> >>> Although we emit a "SANITY CHECK" warning and taint the kernel if we
> >>> detect a CPU mismatch for AArch32 support at EL1, we still online the
> >>> CPU with disastrous consequences for any running 32-bit VMs.
> >>>
> >>> Introduce a capability for AArch32 support at EL1 so that late onlining
> >>> of incompatible CPUs is forbidden.
> >>>
> >>> Signed-off-by: Will Deacon   
> >>
> >> One of the other important missing sanity check for KVM is the VMID width
> >> check. I will code something up.  
> > 
> > Cheers! Do we handle things like the IPA size already?  
> 
> Good point. No, we don't. I will include this too.

There is also the question of the ARMv8.5-GTG extension. I have a patch
that treats it as non-strict, but that approach would fail with KVM if
we online a late CPU without support for the right page size at S2.

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read

2020-04-15 Thread Zenghui Yu

Hi Marc,

On 2020/4/14 18:35, Marc Zyngier wrote:

When a guest tries to read the active state of its interrupts,
we currently just return whatever state we have in memory. This
means that if such an interrupt lives in a List Register on another
CPU, we fail to obsertve the latest active state for this interrupt.

In order to remedy this, stop all the other vcpus so that they exit
and we can observe the most recent value for the state.

Reported-by: Julien Grall 
Signed-off-by: Marc Zyngier 
---
  virt/kvm/arm/vgic/vgic-mmio-v2.c |   4 +-
  virt/kvm/arm/vgic/vgic-mmio-v3.c |   4 +-
  virt/kvm/arm/vgic/vgic-mmio.c| 100 ---
  virt/kvm/arm/vgic/vgic-mmio.h|   3 +
  4 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index 5945f062d749..d63881f60e1a 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -422,11 +422,11 @@ static const struct vgic_register_region 
vgic_v2_dist_registers[] = {
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET,
vgic_mmio_read_active, vgic_mmio_write_sactive,
-   NULL, vgic_mmio_uaccess_write_sactive, 1,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR,
vgic_mmio_read_active, vgic_mmio_write_cactive,
-   NULL, vgic_mmio_uaccess_write_cactive, 1,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI,
vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index e72dcc454247..77c8ba1a2535 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -553,11 +553,11 @@ static const struct vgic_register_region 
vgic_v3_dist_registers[] = {
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER,
vgic_mmio_read_active, vgic_mmio_write_sactive,
-   NULL, vgic_mmio_uaccess_write_sactive, 1,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER,
vgic_mmio_read_active, vgic_mmio_write_cactive,
-   NULL, vgic_mmio_uaccess_write_cactive,
+   vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive,
1, VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR,
vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,


Shouldn't we also set this uaccess_read cb for GICR_I{S,C}ACTIVER0?


Thanks,
Zenghui

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support

2020-04-15 Thread Suzuki K Poulose

On 04/15/2020 11:14 AM, Will Deacon wrote:

On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote:

On 04/14/2020 10:31 PM, Will Deacon wrote:

Although we emit a "SANITY CHECK" warning and taint the kernel if we
detect a CPU mismatch for AArch32 support at EL1, we still online the
CPU with disastrous consequences for any running 32-bit VMs.

Introduce a capability for AArch32 support at EL1 so that late onlining
of incompatible CPUs is forbidden.

Signed-off-by: Will Deacon 


One of the other important missing sanity check for KVM is the VMID width
check. I will code something up.


Cheers! Do we handle things like the IPA size already?


Good point. No, we don't. I will include this too.

Cheers
Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place

2020-04-15 Thread Claudio Imbrenda
On Wed, 15 Apr 2020 10:30:06 +0200
Emanuele Giuseppe Esposito  wrote:

> On 4/15/20 8:36 AM, Philippe Mathieu-Daudé wrote:
> > On 4/14/20 5:56 PM, Emanuele Giuseppe Esposito wrote:  
> >> The macros VM_STAT and VCPU_STAT are redundantly implemented in
> >> multiple files, each used by a different architecure to initialize
> >> the debugfs entries for statistics. Since they all have the same
> >> purpose, they can be unified in a single common definition in
> >> include/linux/kvm_host.h
> >>
> >> Signed-off-by: Emanuele Giuseppe Esposito 
> >> ---
> >>   arch/arm64/kvm/guest.c|  23 ++---
> >>   arch/mips/kvm/mips.c  |  61 ++--
> >>   arch/powerpc/kvm/book3s.c |  61 ++--
> >>   arch/powerpc/kvm/booke.c  |  41 
> >>   arch/s390/kvm/kvm-s390.c  | 203
> >> +++--- arch/x86/kvm/x86.c|
> >>  80 +++ include/linux/kvm_host.h  |   5 +
> >>   7 files changed, 231 insertions(+), 243 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> >> index 23ebe51410f0..8417b200bec9 100644
> >> --- a/arch/arm64/kvm/guest.c
> >> +++ b/arch/arm64/kvm/guest.c
> >> @@ -29,20 +29,17 @@
> >>   
> >>   #include "trace.h"
> >>   
> >> -#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x),
> >> KVM_STAT_VM } -#define VCPU_STAT(x) { #x, offsetof(struct
> >> kvm_vcpu, stat.x), KVM_STAT_VCPU } -
> >>   struct kvm_stats_debugfs_item debugfs_entries[] = {
> >> -  VCPU_STAT(halt_successful_poll),
> >> -  VCPU_STAT(halt_attempted_poll),
> >> -  VCPU_STAT(halt_poll_invalid),
> >> -  VCPU_STAT(halt_wakeup),
> >> -  VCPU_STAT(hvc_exit_stat),
> >> -  VCPU_STAT(wfe_exit_stat),
> >> -  VCPU_STAT(wfi_exit_stat),
> >> -  VCPU_STAT(mmio_exit_user),
> >> -  VCPU_STAT(mmio_exit_kernel),
> >> -  VCPU_STAT(exits),
> >> +  VCPU_STAT("halt_successful_poll", halt_successful_poll),
> >> +  VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> >> +  VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> >> +  VCPU_STAT("halt_wakeup", halt_wakeup),
> >> +  VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
> >> +  VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
> >> +  VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
> >> +  VCPU_STAT("mmio_exit_user", mmio_exit_user),
> >> +  VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
> >> +  VCPU_STAT("exits", exits),
> >>{ NULL }
> >>   };  
> > 
> > Patch easily reviewed with --word-diff.
> > 
> > [...]  
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index 6d58beb65454..2e6ead872957 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1130,6 +1130,11 @@ struct kvm_stats_debugfs_item {
> >>   #define KVM_DBGFS_GET_MODE(dbgfs_item)
> >>   \ ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
> >>   
> >> +#define VM_STAT(n, x, ...)
> >>   \
> >> +  { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ##
> >> __VA_ARGS__ } +#define VCPU_STAT(n, x, ...)
> >>   \
> >>  
> > 
> > Not sure while you use so many whitespaces here... (maybe Paolo can
> > strip some when applying?).  
> 
> I honestly am not sure why it added few more spaces than I wanted,
> but the idea was to follow the KVM_DBGFS_GET_MODE macro above and put
> the backslash at the end of the line (80th char).

once the whitespace issues are fixed, you can also add

Reviewed-by: Claudio Imbrenda 
 
> > 
> > Otherwise it looks nicer that v1, thanks.
> > 
> > Reviewed-by: Philippe Mathieu-Daudé 
> >   
> >> +  { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ##
> >> __VA_ARGS__ } +
> >>   extern struct kvm_stats_debugfs_item debugfs_entries[];
> >>   extern struct dentry *kvm_debugfs_dir;
> >>   
> >>  
> >   
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]

2020-04-15 Thread Will Deacon
On Wed, Apr 15, 2020 at 12:37:31PM +0100, Suzuki K Poulose wrote:
> On 04/15/2020 11:58 AM, Will Deacon wrote:
> > On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote:
> > > On 04/14/2020 10:31 PM, Will Deacon wrote:
> > > > We don't need to be quite as strict about mismatched AArch32 support,
> > > > which is good because the friendly hardware folks have been busy
> > > > mismatching this to their hearts' content.
> > > > 
> > > > * We don't care about EL2 or EL3 (there are silly comments 
> > > > concerning
> > > >   the latter, so remove those)
> > > > 
> > > > * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and 
> > > > handled
> > > >   gracefully when a mismatch occurs
> > > > 
> > > > * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and 
> > > > handled
> > > 
> > > s/EL1/EL0
> > > 
> > > >   gracefully when a mismatch occurs
> > > > 
> > > > Relax the AArch32 checks to FTR_NONSTRICT.
> > > 
> > > Agreed. We should do something similar for the features exposed by the
> > > ELF_HWCAP, of course in a separate series.
> > 
> > Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from
> > the sanitised feature register values. What am I missing?
> 
> sorry, that was cryptic. I was suggesting to relax the ftr fields to
> NONSTRICT for the fields covered by ELF HWCAPs (and other CPU hwcaps).

Ah, gotcha. Given that the HWCAPs usually describe EL0 features, I say we
can punt this down the road until people give us hardware with mismatched
AArch32 at EL0.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/4] kvm: Replace vcpu->swait with rcuwait

2020-04-15 Thread Paolo Bonzini
On 14/04/20 23:12, Davidlohr Bueso wrote:
> On Wed, 25 Mar 2020, Paolo Bonzini wrote:
> 
>> On 24/03/20 05:44, Davidlohr Bueso wrote:
>>> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
>>> index 71244bf87c3a..e049fcb3dffb 100644
>>> --- a/arch/mips/kvm/mips.c
>>> +++ b/arch/mips/kvm/mips.c
>>> @@ -290,8 +290,7 @@ static enum hrtimer_restart
>>> kvm_mips_comparecount_wakeup(struct hrtimer *timer)
>>> kvm_mips_callbacks->queue_timer_int(vcpu);
>>>
>>> vcpu->arch.wait = 0;
>>> -    if (swq_has_sleeper(>wq))
>>> -    swake_up_one(>wq);
>>> +    rcuwait_wake_up(>wait)
>>
>> This is missing a semicolon.  (KVM MIPS is known not to compile and will
>> be changed to "depends on BROKEN" in 5.7).
> 
> Do you want me to send another version with this fix or do you prefer
> fixing it when/if picked up?

It's up to the TIP tree people, but sending a fixed version is probably
the best way to get their attention. :)

I can also queue it myself (for 5.7 even) if I get an Acked-by from
Peter though.

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]

2020-04-15 Thread Suzuki K Poulose

On 04/15/2020 11:58 AM, Will Deacon wrote:

On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote:

On 04/14/2020 10:31 PM, Will Deacon wrote:

We don't need to be quite as strict about mismatched AArch32 support,
which is good because the friendly hardware folks have been busy
mismatching this to their hearts' content.

* We don't care about EL2 or EL3 (there are silly comments concerning
  the latter, so remove those)

* EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled
  gracefully when a mismatch occurs

* EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled


s/EL1/EL0


  gracefully when a mismatch occurs

Relax the AArch32 checks to FTR_NONSTRICT.


Agreed. We should do something similar for the features exposed by the
ELF_HWCAP, of course in a separate series.


Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from
the sanitised feature register values. What am I missing?


sorry, that was cryptic. I was suggesting to relax the ftr fields to
NONSTRICT for the fields covered by ELF HWCAPs (and other CPU hwcaps).

Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]

2020-04-15 Thread Will Deacon
On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote:
> On 04/14/2020 10:31 PM, Will Deacon wrote:
> > We don't need to be quite as strict about mismatched AArch32 support,
> > which is good because the friendly hardware folks have been busy
> > mismatching this to their hearts' content.
> > 
> >* We don't care about EL2 or EL3 (there are silly comments concerning
> >  the latter, so remove those)
> > 
> >* EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled
> >  gracefully when a mismatch occurs
> > 
> >* EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled
> 
> s/EL1/EL0
> 
> >  gracefully when a mismatch occurs
> > 
> > Relax the AArch32 checks to FTR_NONSTRICT.
> 
> Agreed. We should do something similar for the features exposed by the
> ELF_HWCAP, of course in a separate series.

Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from
the sanitised feature register values. What am I missing?

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm64: Drop PTE_S2_MEMATTR_MASK

2020-04-15 Thread Zenghui Yu
The only user of PTE_S2_MEMATTR_MASK macro had been removed since
commit a501e32430d4 ("arm64: Clean up the default pgprot setting").
It has been about six years and no one has used it again.

Let's drop it.

Signed-off-by: Zenghui Yu 
---
 arch/arm64/include/asm/pgtable-hwdef.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 6bf5e650da78..99315bdca0e6 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -190,7 +190,6 @@
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
  */
 #define PTE_S2_MEMATTR(t)  (_AT(pteval_t, (t)) << 2)
-#define PTE_S2_MEMATTR_MASK(_AT(pteval_t, 0xf) << 2)
 
 /*
  * EL2/HYP PTE/PMD definitions
-- 
2.19.1


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]

2020-04-15 Thread Suzuki K Poulose

On 04/14/2020 10:31 PM, Will Deacon wrote:

We don't need to be quite as strict about mismatched AArch32 support,
which is good because the friendly hardware folks have been busy
mismatching this to their hearts' content.

   * We don't care about EL2 or EL3 (there are silly comments concerning
 the latter, so remove those)

   * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled
 gracefully when a mismatch occurs

   * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled


s/EL1/EL0


 gracefully when a mismatch occurs

Relax the AArch32 checks to FTR_NONSTRICT.


Agreed. We should do something similar for the features exposed by the
ELF_HWCAP, of course in a separate series.



Signed-off-by: Will Deacon 


Reviewed-by: Suzuki K Poulose 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 6/8] arm64: cpufeature: Relax AArch32 system checks if EL1 is 64-bit only

2020-04-15 Thread Suzuki K Poulose

On 04/14/2020 10:31 PM, Will Deacon wrote:

If AArch32 is not supported at EL1, the AArch32 feature register fields
no longer advertise support for some system features:

   * ISAR4.SMC
   * PFR1.{Virt_frac, Sec_frac, Virtualization, Security, ProgMod}

In which case, we don't need to emit "SANITY CHECK" failures for all of
them.

Add logic to relax the strictness of individual feature register fields
at runtime and use this for the fields above if 32-bit EL1 is not
supported.

Signed-off-by: Will Deacon 


Reviewed-by: Suzuki K Poulose 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 5/8] arm64: cpufeature: Factor out checking of AArch32 features

2020-04-15 Thread Suzuki K Poulose

On 04/14/2020 10:31 PM, Will Deacon wrote:

update_cpu_features() is pretty large, so split out the checking of the
AArch32 features into a separate function and call it after checking the
AArch64 features.

Signed-off-by: Will Deacon 
---
  arch/arm64/kernel/cpufeature.c | 108 +++--
  1 file changed, 61 insertions(+), 47 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7dfcdd9e75c1..32828a77acc3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -715,6 +715,65 @@ static int check_update_ftr_reg(u32 sys_id, int cpu, u64 
val, u64 boot)
return 1;
  }
  
+static int update_32bit_cpu_features(int cpu, struct cpuinfo_arm64 *info,

+struct cpuinfo_arm64 *boot)
+{


...


-
if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu,
info->reg_zcr, boot->reg_zcr);
@@ -845,6 +857,8 @@ void update_cpu_features(int cpu,
sve_update_vq_map();
}
  
+	taint |= update_32bit_cpu_features(cpu, info, boot);

+


This relies on the assumption that the id_aa64pfr0 has been sanitised. 
It may be worth adding a comment to make sure people (hacking the

kernel) don't move this around and break that dependency.

Either ways:

Reviewed-by: Suzuki K Poulose 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 4/8] arm64: cpufeature: Remove redundant call to id_aa64pfr0_32bit_el0()

2020-04-15 Thread Suzuki K Poulose

On 04/14/2020 10:31 PM, Will Deacon wrote:

There's no need to call id_aa64pfr0_32bit_el0() twice because the
sanitised value of ID_AA64PFR0_EL1 has already been updated for the CPU
being onlined.

Remove the redundant function call.

Signed-off-by: Will Deacon 


Reviewed-by: Suzuki K Poulose 


---
  arch/arm64/kernel/cpufeature.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 838fe5cc8d7e..7dfcdd9e75c1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -792,9 +792,7 @@ void update_cpu_features(int cpu,
 * If we have AArch32, we care about 32-bit features for compat.
 * If the system doesn't support AArch32, don't update them.
 */
-   if (id_aa64pfr0_32bit_el0(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1)) 
&&
-   id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
-
+   if (id_aa64pfr0_32bit_el0(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1))) 
{
taint |= check_update_ftr_reg(SYS_ID_DFR0_EL1, cpu,
info->reg_id_dfr0, boot->reg_id_dfr0);
taint |= check_update_ftr_reg(SYS_ID_ISAR0_EL1, cpu,



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support

2020-04-15 Thread Will Deacon
On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote:
> On 04/14/2020 10:31 PM, Will Deacon wrote:
> > Although we emit a "SANITY CHECK" warning and taint the kernel if we
> > detect a CPU mismatch for AArch32 support at EL1, we still online the
> > CPU with disastrous consequences for any running 32-bit VMs.
> > 
> > Introduce a capability for AArch32 support at EL1 so that late onlining
> > of incompatible CPUs is forbidden.
> > 
> > Signed-off-by: Will Deacon 
> 
> One of the other important missing sanity check for KVM is the VMID width
> check. I will code something up.

Cheers! Do we handle things like the IPA size already?

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support

2020-04-15 Thread Suzuki K Poulose

On 04/14/2020 10:31 PM, Will Deacon wrote:

Although we emit a "SANITY CHECK" warning and taint the kernel if we
detect a CPU mismatch for AArch32 support at EL1, we still online the
CPU with disastrous consequences for any running 32-bit VMs.

Introduce a capability for AArch32 support at EL1 so that late onlining
of incompatible CPUs is forbidden.

Signed-off-by: Will Deacon 


One of the other important missing sanity check for KVM is the VMID 
width check. I will code something up.


Reviewed-by: Suzuki K Poulose 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/8] arm64: cpufeature: Spell out register fields for ID_ISAR4 and ID_PFR1

2020-04-15 Thread Suzuki K Poulose

On 04/14/2020 10:31 PM, Will Deacon wrote:

In preparation for runtime updates to the strictness of some AArch32
features, spell out the register fields for ID_ISAR4 and ID_PFR1 to make
things clearer to read. Note that this isn't functionally necessary, as
the feature arrays themselves are not modified dynamically and remain
'const'.

Signed-off-by: Will Deacon 
---
  arch/arm64/include/asm/sysreg.h | 17 +
  arch/arm64/kernel/cpufeature.c  | 28 ++--
  2 files changed, 43 insertions(+), 2 deletions(-)


Reviewed-by: Suzuki K Poulose 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/8] arm64: cpufeature: Relax check for IESB support

2020-04-15 Thread Suzuki K Poulose

Hi Will

On 04/14/2020 10:31 PM, Will Deacon wrote:

From: Sai Prakash Ranjan 

We don't care if IESB is supported or not as we always set
SCTLR_ELx.IESB and, if it works, that's really great.

Relax the ID_AA64MMFR2.IESB cpufeature check so that we don't warn and
taint if it's mismatched.

Signed-off-by: Sai Prakash Ranjan 
[will: rewrote commit message]
Signed-off-by: Will Deacon 


Reviewed-by: Suzuki K Poulose 


---
  arch/arm64/kernel/cpufeature.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fac745aa7bb..63df28e6a425 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -247,7 +247,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_FWB_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_AT_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_LVA_SHIFT, 4, 0),
-   ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_IESB_SHIFT, 4, 0),
+   ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_IESB_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_LSM_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_UAO_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR2_CNP_SHIFT, 4, 0),



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/4] kvm: Replace vcpu->swait with rcuwait

2020-04-15 Thread Davidlohr Bueso

On Wed, 25 Mar 2020, Paolo Bonzini wrote:


On 24/03/20 05:44, Davidlohr Bueso wrote:

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 71244bf87c3a..e049fcb3dffb 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -290,8 +290,7 @@ static enum hrtimer_restart 
kvm_mips_comparecount_wakeup(struct hrtimer *timer)
kvm_mips_callbacks->queue_timer_int(vcpu);

vcpu->arch.wait = 0;
-   if (swq_has_sleeper(>wq))
-   swake_up_one(>wq);
+   rcuwait_wake_up(>wait)


This is missing a semicolon.  (KVM MIPS is known not to compile and will
be changed to "depends on BROKEN" in 5.7).


Do you want me to send another version with this fix or do you prefer
fixing it when/if picked up?

Thanks,
Davidlohr
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place

2020-04-15 Thread Philippe Mathieu-Daudé
On 4/14/20 5:56 PM, Emanuele Giuseppe Esposito wrote:
> The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple
> files, each used by a different architecure to initialize the debugfs
> entries for statistics. Since they all have the same purpose, they can be
> unified in a single common definition in include/linux/kvm_host.h
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  arch/arm64/kvm/guest.c|  23 ++---
>  arch/mips/kvm/mips.c  |  61 ++--
>  arch/powerpc/kvm/book3s.c |  61 ++--
>  arch/powerpc/kvm/booke.c  |  41 
>  arch/s390/kvm/kvm-s390.c  | 203 +++---
>  arch/x86/kvm/x86.c|  80 +++
>  include/linux/kvm_host.h  |   5 +
>  7 files changed, 231 insertions(+), 243 deletions(-)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 23ebe51410f0..8417b200bec9 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -29,20 +29,17 @@
>  
>  #include "trace.h"
>  
> -#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
> -#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
> -
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
> - VCPU_STAT(halt_successful_poll),
> - VCPU_STAT(halt_attempted_poll),
> - VCPU_STAT(halt_poll_invalid),
> - VCPU_STAT(halt_wakeup),
> - VCPU_STAT(hvc_exit_stat),
> - VCPU_STAT(wfe_exit_stat),
> - VCPU_STAT(wfi_exit_stat),
> - VCPU_STAT(mmio_exit_user),
> - VCPU_STAT(mmio_exit_kernel),
> - VCPU_STAT(exits),
> + VCPU_STAT("halt_successful_poll", halt_successful_poll),
> + VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> + VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> + VCPU_STAT("halt_wakeup", halt_wakeup),
> + VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
> + VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
> + VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
> + VCPU_STAT("mmio_exit_user", mmio_exit_user),
> + VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
> + VCPU_STAT("exits", exits),
>   { NULL }
>  };

Patch easily reviewed with --word-diff.

[...]
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6d58beb65454..2e6ead872957 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1130,6 +1130,11 @@ struct kvm_stats_debugfs_item {
>  #define KVM_DBGFS_GET_MODE(dbgfs_item)   
>   \
>   ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
>  
> +#define VM_STAT(n, x, ...)   
>\
> + { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
> +#define VCPU_STAT(n, x, ...) 
>\

Not sure while you use so many whitespaces here... (maybe Paolo can
strip some when applying?).

Otherwise it looks nicer that v1, thanks.

Reviewed-by: Philippe Mathieu-Daudé 

> + { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
> +
>  extern struct kvm_stats_debugfs_item debugfs_entries[];
>  extern struct dentry *kvm_debugfs_dir;
>  
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place

2020-04-15 Thread Emanuele Giuseppe Esposito



On 4/15/20 8:36 AM, Philippe Mathieu-Daudé wrote:

On 4/14/20 5:56 PM, Emanuele Giuseppe Esposito wrote:

The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple
files, each used by a different architecure to initialize the debugfs
entries for statistics. Since they all have the same purpose, they can be
unified in a single common definition in include/linux/kvm_host.h

Signed-off-by: Emanuele Giuseppe Esposito 
---
  arch/arm64/kvm/guest.c|  23 ++---
  arch/mips/kvm/mips.c  |  61 ++--
  arch/powerpc/kvm/book3s.c |  61 ++--
  arch/powerpc/kvm/booke.c  |  41 
  arch/s390/kvm/kvm-s390.c  | 203 +++---
  arch/x86/kvm/x86.c|  80 +++
  include/linux/kvm_host.h  |   5 +
  7 files changed, 231 insertions(+), 243 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 23ebe51410f0..8417b200bec9 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,20 +29,17 @@
  
  #include "trace.h"
  
-#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }

-#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
-
  struct kvm_stats_debugfs_item debugfs_entries[] = {
-   VCPU_STAT(halt_successful_poll),
-   VCPU_STAT(halt_attempted_poll),
-   VCPU_STAT(halt_poll_invalid),
-   VCPU_STAT(halt_wakeup),
-   VCPU_STAT(hvc_exit_stat),
-   VCPU_STAT(wfe_exit_stat),
-   VCPU_STAT(wfi_exit_stat),
-   VCPU_STAT(mmio_exit_user),
-   VCPU_STAT(mmio_exit_kernel),
-   VCPU_STAT(exits),
+   VCPU_STAT("halt_successful_poll", halt_successful_poll),
+   VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
+   VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
+   VCPU_STAT("halt_wakeup", halt_wakeup),
+   VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
+   VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
+   VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
+   VCPU_STAT("mmio_exit_user", mmio_exit_user),
+   VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
+   VCPU_STAT("exits", exits),
{ NULL }
  };


Patch easily reviewed with --word-diff.

[...]

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6d58beb65454..2e6ead872957 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1130,6 +1130,11 @@ struct kvm_stats_debugfs_item {
  #define KVM_DBGFS_GET_MODE(dbgfs_item)
 \
((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
  
+#define VM_STAT(n, x, ...) 	   \

+   { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
+#define VCPU_STAT(n, x, ...)   
   \


Not sure while you use so many whitespaces here... (maybe Paolo can
strip some when applying?).


I honestly am not sure why it added few more spaces than I wanted, but 
the idea was to follow the KVM_DBGFS_GET_MODE macro above and put the 
backslash at the end of the line (80th char).




Otherwise it looks nicer that v1, thanks.

Reviewed-by: Philippe Mathieu-Daudé 


+   { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
+
  extern struct kvm_stats_debugfs_item debugfs_entries[];
  extern struct dentry *kvm_debugfs_dir;
  





___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: Optimize kvm_arch_vcpu_ioctl_run function

2020-04-15 Thread Tianjia Zhang




On 2020/4/14 22:26, Vitaly Kuznetsov wrote:

Tianjia Zhang  writes:


kvm_arch_vcpu_ioctl_run() is only called in the file kvm_main.c,
where vcpu->run is the kvm_run parameter, so it has been replaced.

Signed-off-by: Tianjia Zhang 
---
  arch/x86/kvm/x86.c | 8 
  virt/kvm/arm/arm.c | 2 +-
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bf2ecafd027..70e3f4abbd4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8726,18 +8726,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
r = -EAGAIN;
if (signal_pending(current)) {
r = -EINTR;
-   vcpu->run->exit_reason = KVM_EXIT_INTR;
+   kvm_run->exit_reason = KVM_EXIT_INTR;


I have a more generic question: why do we need to pass 'kvm_run' to
kvm_arch_vcpu_ioctl_run() if it can be extracted from 'struct kvm_vcpu'?
The only call site looks like

virt/kvm/kvm_main.c:r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);



In the earlier version, kvm_run is used to pass parameters with user 
mode and is not included in the vcpu structure, so it has been retained 
until now.


Thanks,
Tianjia


++vcpu->stat.signal_exits;
}
goto out;
}
  
-	if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {

+   if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
r = -EINVAL;
goto out;
}
  
-	if (vcpu->run->kvm_dirty_regs) {

+   if (kvm_run->kvm_dirty_regs) {
r = sync_regs(vcpu);
if (r != 0)
goto out;
@@ -8767,7 +8767,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
  
  out:

kvm_put_guest_fpu(vcpu);
-   if (vcpu->run->kvm_valid_regs)
+   if (kvm_run->kvm_valid_regs)
store_regs(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..ab9d7966a4c8 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -659,7 +659,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return ret;
  
  	if (run->exit_reason == KVM_EXIT_MMIO) {

-   ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+   ret = kvm_handle_mmio_return(vcpu, run);
if (ret)
return ret;
}



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support

2020-04-15 Thread Marc Zyngier

Hi Will,

On 2020-04-14 22:31, Will Deacon wrote:

Although we emit a "SANITY CHECK" warning and taint the kernel if we
detect a CPU mismatch for AArch32 support at EL1, we still online the
CPU with disastrous consequences for any running 32-bit VMs.

Introduce a capability for AArch32 support at EL1 so that late onlining
of incompatible CPUs is forbidden.

Signed-off-by: Will Deacon 


Definitely an improvement over the current situation, as the direct read
of ID_AA64PFR0 was always a bit dodgy. Given that I'm pretty sure these 
new
braindead SoCs are going to run an older version of the kernel, should 
we

Cc stable for this?

Otherwise:

Acked-by: Marc Zyngier 

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported

2020-04-15 Thread Zenghui Yu
stage2_unmap_vm() was introduced to unmap user RAM region in the stage2
page table to make the caches coherent. E.g., a guest reboot with stage1
MMU disabled will access memory using non-cacheable attributes. If the
RAM and caches are not coherent at this stage, some evicted dirty cache
line may go and corrupt guest data in RAM.

Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage
of it to configure the stage2 page table and the attributes of memory
access. So we ensure that guests always access memory using cacheable
attributes and thus, the caches always be coherent.

So on CPUs that support S2FWB, we can safely reset the vcpu without a
heavy stage2 unmapping.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: James Morse 
Cc: Julien Thierry 
Cc: Suzuki K Poulose 
Signed-off-by: Zenghui Yu 
---

If this is correct, there should be a great performance improvement on
a guest reboot (or reset) on systems support S2FWB. But I'm afraid that
I've missed some points here, so please comment!

The commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
was merged about six years ago and I failed to track its histroy and
intention. Instead of a whole stage2 unmapping, something like
stage2_flush_vm() looks enough to me. But again, I'm unsure...

Thanks for having a look!

 virt/kvm/arm/arm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..e6378162cdef 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu 
*vcpu,
/*
 * Ensure a rebooted VM will fault in RAM pages and detect if the
 * guest MMU is turned off and flush the caches as needed.
+*
+* S2FWB enforces all memory accesses to RAM being cacheable, we
+* ensure that the cache is always coherent.
 */
-   if (vcpu->arch.has_run_once)
+   if (vcpu->arch.has_run_once && 
!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
stage2_unmap_vm(vcpu->kvm);
 
vcpu_reset_hcr(vcpu);
-- 
2.19.1


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm