date:20171017

On Tue, Oct 17, 2017 at 09:10:08AM +0200, Eric Auger wrote:
> At the moment, the in-kernel emulated ITS is not properly reset.
> On guest restart/reset some registers keep their old values and
> internal structures like device, ITE, and collection lists are not
> freed.
> 
> This may lead to various bugs. Among them, we can have incorrect state
> backup or failure when saving the ITS state at early guest boot stage.
> 
> This patch documents a new attribute, KVM_DEV_ARM_ITS_CTRL_RESET in
> the KVM_DEV_ARM_VGIC_GRP_CTRL group.
> 
> Upon this action, we can reset registers and especially those
> pointing to tables previously allocated by the guest and free
> the internal data structures storing the list of devices, collections
> and lpis.
> 
> The usual approach for device reset of having userspace write
> the reset values of the registers to the kernel via the register
> read/write APIs doesn't work for the ITS because it has some
> internal state (caches) which is not exposed as registers,
> and there is no register interface for "drop cached data without
> writing it back to RAM". So we need a KVM API which mimics the
> hardware's reset line, to provide the equivalent behaviour to
> a "pull the power cord out of the back of the machine" reset.
> 
> Signed-off-by: Eric Auger 
> Reported-by: wanghaibin 
> 
> ---
> v2 -> v3:
> - reword commit message, credit to Peter Maydell.
> - take into account Christoffer rewording comments but still
>   kept details. Added Peter's comment but still kept details.
>   Peter may disagree.
> 
> v1 -> v2:
> - Describe architecturally-defined reset values
> ---
>  Documentation/virtual/kvm/devices/arm-vgic-its.txt | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-vgic-its.txt 
> b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
> index eb06beb..4e9bb6f 100644
> --- a/Documentation/virtual/kvm/devices/arm-vgic-its.txt
> +++ b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
> @@ -33,6 +33,10 @@ Groups:
>request the initialization of the ITS, no additional parameter in
>kvm_device_attr.addr.
>  
> +KVM_DEV_ARM_ITS_CTRL_RESET
> +  reset the ITS, no additional parameter in kvm_device_attr.addr.
> +  See "ITS Reset State" section.
> +
>  KVM_DEV_ARM_ITS_SAVE_TABLES
>save the ITS table data into guest RAM, at the location provisioned
>by the guest in corresponding registers/table entries.
> @@ -157,3 +161,19 @@ Then vcpus can be started.
>   - pINTID is the physical LPI ID; if zero, it means the entry is not valid
> and other fields are not meaningful.
>   - ICID is the collection ID
> +
> + ITS Reset State:
> + 
> +
> +RESET returns the ITS to the same state that it was when first created and
> +inited:

initialized.  When the RESET command returns, the following things are
guaranteed:

> +
> +- The ITS is not enabled and quiescent
> +  GITS_CTLR.Enabled = 0 .Quiescent=1
> +- There is no internally cache state

internally cached,  or internal cache

> +- No collection or device table are used
> +  GITS_BASER.Valid = 0
> +- The command queue is not allocated:
> +  GITS_CBASER = 0, GITS_CREADR = 0, GITS_CWRITER = 0
> +- The ABI version is unchanged and remains the one set when the ITS
> +  device was first created.
> -- 
> 2.5.5
> 
With those changes, this looks ok to me.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 09/11] KVM: arm/arm64: vgic-its: free caches when GITS_BASER Valid bit is cleared

On Tue, Oct 17, 2017 at 09:10:07AM +0200, Eric Auger wrote:
> When the GITS_BASER.Valid gets cleared, the data structures in
> guest RAM are not valid anymore. The device, collection
> and LPI lists stored in the in-kernel ITS represent the same
> information in some form of cache. So let's void the cache.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v2 -> v3:
> - add a comment and clear cache in if block
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index f3f0026f..084239c 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -1471,8 +1471,9 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm,
> unsigned long val)
>  {
>   const struct vgic_its_abi *abi = vgic_its_get_abi(its);
> - u64 entry_size, device_type;
> + u64 entry_size;
>   u64 reg, *regptr, clearbits = 0;
> + int type;
>  
>   /* When GITS_CTLR.Enable is 1, we ignore write accesses. */
>   if (its->enabled)
> @@ -1482,12 +1483,12 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm,
>   case 0:
>   regptr = >baser_device_table;
>   entry_size = abi->dte_esz;
> - device_type = GITS_BASER_TYPE_DEVICE;
> + type = GITS_BASER_TYPE_DEVICE;
>   break;
>   case 1:
>   regptr = >baser_coll_table;
>   entry_size = abi->cte_esz;
> - device_type = GITS_BASER_TYPE_COLLECTION;
> + type = GITS_BASER_TYPE_COLLECTION;
>   clearbits = GITS_BASER_INDIRECT;
>   break;
>   default:
> @@ -1499,10 +1500,24 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm,
>   reg &= ~clearbits;
>  
>   reg |= (entry_size - 1) << GITS_BASER_ENTRY_SIZE_SHIFT;
> - reg |= device_type << GITS_BASER_TYPE_SHIFT;
> + reg |= (u64)type << GITS_BASER_TYPE_SHIFT;
>   reg = vgic_sanitise_its_baser(reg);
>  
>   *regptr = reg;
> +
> + /* Table no longer valid: clear cached data */
> + if (!(reg & GITS_BASER_VALID)) {
> + switch (type) {
> + case GITS_BASER_TYPE_DEVICE:
> + vgic_its_free_device_list(kvm, its);
> + break;
> + case GITS_BASER_TYPE_COLLECTION:
> + vgic_its_free_collection_list(kvm, its);
> + break;
> + default:
> + break;
> + }
> + }

So we do this after setting the *regptr, which makes we worried about
races.

How are guest writes to these registers synchronized with, for example
trying to save the tables.  Perhaps we don't care because userspace
should have stopped the VM before trying to save the ITS state?



>  }
>  
>  static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
> -- 
> 2.5.5
> 

Otherwise looks good to me.
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 08/11] KVM: arm/arm64: vgic-its: new helper functions to free the caches

On Tue, Oct 17, 2017 at 09:10:06AM +0200, Eric Auger wrote:
> From: wanghaibin 
> 
> We create 2 new functions that frees the device and
> collection lists. this is currently called by vgic_its_destroy()
> and we will add other callers in subsequent patches.

See my previous comments about language issues in this paragraph.

> 
> We also remove the check on its->device_list.next as it looks
> unnecessary. Indeed, the device list always is initialized
> when vgic_its_destroy gets called: the kvm device is removed
> by kvm_destroy_devices() which loops on all the devices
> added to kvm->devices. kvm_ioctl_create_device() only adds
> the device to kvm_devices once the lists have been initialized
> (in vgic_create_its).
> 
> We also move vgic_its_free_device to prepare for new callers.
> 
> Signed-off-by: wanghaibin 
> Signed-off-by: Eric Auger 
> 
> ---
> [Eric] removed its->device_list.next which is not needed as
> pointed out by Wanghaibin. Reword the commit message
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 76 
> 
>  1 file changed, 41 insertions(+), 35 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index 1c3e83f..f3f0026f 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -611,6 +611,45 @@ static void its_free_ite(struct kvm *kvm, struct its_ite 
> *ite)
>   kfree(ite);
>  }
>  
> +static void vgic_its_free_device(struct kvm *kvm, struct its_device *dev)
> +{
> + struct its_ite *ite, *tmp;
> +
> + list_for_each_entry_safe(ite, tmp, >itt_head, ite_list)
> + its_free_ite(kvm, ite);
> + list_del(>dev_list);
> + kfree(dev);
> +}
> +
> +static void vgic_its_free_device_list(struct kvm *kvm, struct vgic_its *its)
> +{
> + struct list_head *cur, *temp;
> +
> + mutex_lock(>its_lock);
> + list_for_each_safe(cur, temp, >device_list) {
> + struct its_device *dev;
> +
> + dev = list_entry(cur, struct its_device, dev_list);
> + vgic_its_free_device(kvm, dev);
> + }
> + mutex_unlock(>its_lock);
> +}
> +
> +static void vgic_its_free_collection_list(struct kvm *kvm, struct vgic_its 
> *its)
> +{
> + struct list_head *cur, *temp;
> +
> + list_for_each_safe(cur, temp, >collection_list) {
> + struct its_collection *coll;
> +
> + coll = list_entry(cur, struct its_collection, coll_list);
> + list_del(cur);
> + kfree(coll);
> + }
> + mutex_unlock(>its_lock);
> +}
> +
> +
>  static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
>  {
>   return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
> @@ -1644,46 +1683,13 @@ static int vgic_its_create(struct kvm_device *dev, 
> u32 type)
>   return vgic_its_set_abi(its, NR_ITS_ABIS - 1);
>  }
>  
> -static void vgic_its_free_device(struct kvm *kvm, struct its_device *dev)
> -{
> - struct its_ite *ite, *tmp;
> -
> - list_for_each_entry_safe(ite, tmp, >itt_head, ite_list)
> - its_free_ite(kvm, ite);
> - list_del(>dev_list);
> - kfree(dev);
> -}
> -
>  static void vgic_its_destroy(struct kvm_device *kvm_dev)
>  {
>   struct kvm *kvm = kvm_dev->kvm;
>   struct vgic_its *its = kvm_dev->private;
> - struct list_head *cur, *temp;
> -
> - /*
> -  * We may end up here without the lists ever having been initialized.
> -  * Check this and bail out early to avoid dereferencing a NULL pointer.
> -  */
> - if (!its->device_list.next)
> - return;

Hmm, I feel like we managed to convince ourselves this was needed
before.

Andre, can you remember what your original rationale was here?

> -
> - mutex_lock(>its_lock);
> - list_for_each_safe(cur, temp, >device_list) {
> - struct its_device *dev;
> -
> - dev = list_entry(cur, struct its_device, dev_list);
> - vgic_its_free_device(kvm, dev);
> - }
> -
> - list_for_each_safe(cur, temp, >collection_list) {
> - struct its_collection *coll;
> -
> - coll = list_entry(cur, struct its_collection, coll_list);
> - list_del(cur);
> - kfree(coll);
> - }
> - mutex_unlock(>its_lock);
>  
> + vgic_its_free_device_list(kvm, its);
> + vgic_its_free_collection_list(kvm, its);
>   kfree(its);
>  }
>  
> -- 
> 2.5.5
> 

If we're really sure the original check was just a misunderstanding,
then this patch looks ok, given the fixes to the commit message.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 07/11] KVM: arm/arm64: vgic-its: Always attempt to save/restore device and collection tables

On Tue, Oct 17, 2017 at 09:10:05AM +0200, Eric Auger wrote:
> In case the device table save fails, we currently do not
> attempt to save the collection table. However it may
> happen that the device table fails because the structures
> in memory are inconsistent with device GITS_BASER however
> this does not mean collection backup can't and shouldn't
> be performed. Same must happen on restore path.
> 
> Without this patch, after a reset and early state backup,
> the device table restore may fail due to L1 entry not valid.
> If we don't restore the collection table the guest does
> not reboot properly.

I don't really understand.  After the previous patches, why would a
properly configured ITS return an error in its_save_device_tables?

If that's not possible, are we not trying to support partially migrating
half-way broken state, and is that something we care about?
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> candidate to be CC'ed stable
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index e18f1e4..1c3e83f 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -2381,12 +2381,9 @@ static int vgic_its_save_tables_v0(struct vgic_its 
> *its)
>   }
>  
>   ret = vgic_its_save_device_tables(its);
> - if (ret)
> - goto out;
>  
> - ret = vgic_its_save_collection_table(its);
> + ret |= vgic_its_save_collection_table(its);

What if the two functions return two different error codes, is the
binary OR of these error codes going to tell userspace anything
meaningful?


>  
> -out:
>   unlock_all_vcpus(kvm);
>   mutex_unlock(>its_lock);
>   mutex_unlock(>lock);
> @@ -2413,11 +2410,9 @@ static int vgic_its_restore_tables_v0(struct vgic_its 
> *its)
>   }
>  
>   ret = vgic_its_restore_collection_table(its);
> - if (ret)
> - goto out;
>  
> - ret = vgic_its_restore_device_tables(its);
> -out:
> + ret |= vgic_its_restore_device_tables(its);
> +
>   unlock_all_vcpus(kvm);
>   mutex_unlock(>its_lock);
>   mutex_unlock(>lock);
> -- 
> 2.5.5
> 

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 06/11] KVM: arm/arm64: vgic-its: Check GITS_CBASER validity before processing commands

On Tue, Oct 17, 2017 at 09:10:04AM +0200, Eric Auger wrote:
> At the moment vgic_its_process_commands() does not
> check the CBASER is valid before processing any command.
> Let's fix that.
> 
> Also rename cbaser local variable into cbaser_pa to avoid
> any confusion with the full register.
> 
> Signed-off-by: Eric Auger 
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index 3b539d4..e18f1e4 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -1301,17 +1301,20 @@ static void vgic_mmio_write_its_cbaser(struct kvm 
> *kvm, struct vgic_its *its,
>  /* Must be called with the cmd_lock held. */
>  static void vgic_its_process_commands(struct kvm *kvm, struct vgic_its *its)
>  {
> - gpa_t cbaser;
> + gpa_t cbaser_pa;
>   u64 cmd_buf[4];
>  
> - /* Commands are only processed when the ITS is enabled. */
> - if (!its->enabled)
> + /*
> +  * Commands are only processed when the ITS is enabled and
> +  * CBASER is valid
> +  */
> + if (!its->enabled || (!(its->cbaser & GITS_CBASER_VALID)))

Is it valid to clear the CBASER valid bit after having enabled the ITS?
If not, I think you should check changes to CBASER and then this
shouldn't be necessary.



>   return;
>  
> - cbaser = CBASER_ADDRESS(its->cbaser);
> + cbaser_pa = CBASER_ADDRESS(its->cbaser);
>  
>   while (its->cwriter != its->creadr) {
> - int ret = kvm_read_guest(kvm, cbaser + its->creadr,
> + int ret = kvm_read_guest(kvm, cbaser_pa + its->creadr,
>cmd_buf, ITS_CMD_SIZE);
>   /*
>* If kvm_read_guest() fails, this could be due to the guest
> -- 
> 2.5.5
> 

Otherwise, this looks fine.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 04/11] KVM: arm/arm64: vgic-its: Check CBASER/BASER validity before enabling the ITS

On Tue, Oct 17, 2017 at 09:10:02AM +0200, Eric Auger wrote:
> The spec says it is UNPREDICTABLE to enable the ITS
> if any of the following conditions are true:
> 
> - GITS_CBASER.Valid == 0.
> - GITS_BASER.Valid == 0, for any GITS_BASER register
>   where the Type field indicates Device.
> - GITS_BASER.Valid == 0, for any GITS_BASER register
>   where the Type field indicates Interrupt Collection and
>   GITS_TYPER.HCC == 0.
> 
> In that case, let's keep the ITS disabled.
> 
> Signed-off-by: Eric Auger 
> Reported-by: Andre Przywara 
> 
> ---
> 
> v3: creation
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index e59363e..e61736b 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -1488,6 +1488,16 @@ static void vgic_mmio_write_its_ctlr(struct kvm *kvm, 
> struct vgic_its *its,
>   its->enabled = !!(val & GITS_CTLR_ENABLE);
>  
>   /*
> +  * It is UNPREDICTABLE to enable the ITS if any of the CBASER or
> +  * device/collection BASER are invalid
> +  */
> + if (its->enabled &&
> + (!(its->baser_device_table & GITS_BASER_VALID) ||
> +  !(its->baser_coll_table & GITS_BASER_VALID) ||
> +  !(its->cbaser && GITS_CBASER_VALID)))
> + its->enabled = false;
> +

uh, why don't we check these conditions before we enable the ITS instead
of enabling, checking, and then disabling?


Thanks,
-Christoffer

> + /*
>* Try to process any pending commands. This function bails out early
>* if the ITS is disabled or no commands have been queued.
>*/
> -- 
> 2.5.5
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 03/11] KVM: arm/arm64: vgic-its: Improve error reporting on device table save

On Tue, Oct 17, 2017 at 09:10:01AM +0200, Eric Auger wrote:
> At the moment the device table save() returns -EINVAL if
> vgic_its_check_id() fails to return the gpa of the entry
> associated to the device/collection id. Let vgic_its_check_id()
> return an int instead of a bool and return a more precised
> error value:
> - EINVAL in case the id is out of range
> - EFAULT if the gpa is not provisionned or is not valid
> 

I think this commit message fails to explain the crucial bit, which is
that a guest can somehow create a situation that brings down the VM when
trying to migrate the VM, which we shouldn't allow.

Can you explain what case that is, and how that is then solved?

(I can't tell from looking at this patch if the EINVAL or the EFAULT are
the things that userspace can just ignore with a warning or, and which
is the one that really should crash the VM and why).

> We also check first the GITS_BASER Valid bit is set.
> 
> This allows the userspace to discriminate failure reasons.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> need to CC stable

I'm still not convinced that this is a candidate for stable:

Documentation/process/stable-kernel-rules.rst

It's about error codes and is pretty large, has documentation changes
etc.

Thanks,
-Christoffer

> 
> v2 -> v3:
> - correct kerneldoc comment
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 55 
> +---
>  1 file changed, 37 insertions(+), 18 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index a4ff8f7..e59363e 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -687,15 +687,25 @@ static int vgic_its_cmd_handle_movi(struct kvm *kvm, 
> struct vgic_its *its,
>   return 0;
>  }
>  
> -/*
> - * Check whether an ID can be stored into the corresponding guest table.
> +/**
> + * vgic_its_check_id - Check whether an ID can be stored into
> + * the corresponding guest table.
> + *
> + * @its: its handle
> + * @baser: GITS_BASER register
> + * @id: id of the device/collection
> + * @eaddr: output gpa of the corresponding table entry
> + *
>   * For a direct table this is pretty easy, but gets a bit nasty for
>   * indirect tables. We check whether the resulting guest physical address
>   * is actually valid (covered by a memslot and guest accessible).
>   * For this we have to read the respective first level entry.
> + *
> + * Return: 0 on success, -EINVAL if @id is out of range, -EFAULT if
> + * the address cannot be computed or is not valid
>   */
> -static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
> -   gpa_t *eaddr)
> +static int vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
> +  gpa_t *eaddr)
>  {
>   int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
>   u64 indirect_ptr, type = GITS_BASER_TYPE(baser);
> @@ -703,50 +713,56 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 
> baser, u32 id,
>   int index;
>   gfn_t gfn;
>  
> + if (!(baser & GITS_BASER_VALID))
> + return -EFAULT;
> +
>   switch (type) {
>   case GITS_BASER_TYPE_DEVICE:
>   if (id >= BIT_ULL(VITS_TYPER_DEVBITS))
> - return false;
> + return -EINVAL;
>   break;
>   case GITS_BASER_TYPE_COLLECTION:
>   /* as GITS_TYPER.CIL == 0, ITS supports 16-bit collection ID */
>   if (id >= BIT_ULL(16))
> - return false;
> + return -EINVAL;
>   break;
>   default:
> - return false;
> + return -EINVAL;
>   }
>  
>   if (!(baser & GITS_BASER_INDIRECT)) {
>   phys_addr_t addr;
>  
>   if (id >= (l1_tbl_size / esz))
> - return false;
> + return -EINVAL;
>  
>   addr = BASER_ADDRESS(baser) + id * esz;
>   gfn = addr >> PAGE_SHIFT;
>  
>   if (eaddr)
>   *eaddr = addr;
> - return kvm_is_visible_gfn(its->dev->kvm, gfn);
> + if (kvm_is_visible_gfn(its->dev->kvm, gfn))
> + return 0;
> + else
> + return -EFAULT;
>   }
>  
>   /* calculate and check the index into the 1st level */
>   index = id / (SZ_64K / esz);
>   if (index >= (l1_tbl_size / sizeof(u64)))
> - return false;
> + return -EINVAL;
>  
>   /* Each 1st level entry is represented by a 64-bit value. */
>   if (kvm_read_guest(its->dev->kvm,
>  BASER_ADDRESS(baser) + index * sizeof(indirect_ptr),
>  _ptr, sizeof(indirect_ptr)))
> - return false;
> + return -EFAULT;
>  
>   indirect_ptr = le64_to_cpu(indirect_ptr);
>  
>   /* check the valid bit of the first level entry */
>   if

Re: [PATCH v4 02/11] KVM: arm/arm64: vgic-its: fix vgic_its_restore_collection_table returned value

On Tue, Oct 17, 2017 at 09:10:00AM +0200, Eric Auger wrote:
> vgic_its_restore_cte returns +1 if the collection table entry
> is valid and properly decoded. As a consequence, if the
> collection table is fully filled with valid data that are
> decoded without error, vgic_its_restore_collection_table()
> returns +1.  This is wrong.
> 
> Let's use the standard C convention for both vgic_its_restore_cte
> and vgic_its_restore_collection_table. vgic_its_restore_cte now
> returns whether we have reached the end of the table in the @last
> output parameter. vgic_its_restore_collection_table aborts in case
> of error or if the end is found. Otherwise, it now returns 0.
> 
> Fixes: ea1ad53e1e31a3 (KVM: arm64: vgic-its: Collection table save/restore)
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v2 -> v3: creation
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 40 +++-
>  1 file changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index eea14a1..a4ff8f7 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -2204,7 +2204,19 @@ static int vgic_its_save_cte(struct vgic_its *its,
>   return kvm_write_guest(its->dev->kvm, gpa, , esz);
>  }
>  
> -static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz)
> +/**
> + * vgic_its_restore_cte - Restore a collection table entry
> + *
> + * @its: its handle
> + * @gpa: guest physical address of the entry
> + * @esz: entry size
> + * @last: output boolean indicating whether we have reached the
> + *   end of the collection table (ie. an invalid entry was decoded)
> + *
> + * Return: 0 upon success, < 0 on error
> + */
> +static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz,
> + bool *last)
>  {
>   struct its_collection *collection;
>   struct kvm *kvm = its->dev->kvm;
> @@ -2217,7 +2229,8 @@ static int vgic_its_restore_cte(struct vgic_its *its, 
> gpa_t gpa, int esz)
>   if (ret)
>   return ret;
>   val = le64_to_cpu(val);
> - if (!(val & KVM_ITS_CTE_VALID_MASK))
> + *last = !(val & KVM_ITS_CTE_VALID_MASK);
> + if (*last)
>   return 0;
>  
>   target_addr = (u32)(val >> KVM_ITS_CTE_RDBASE_SHIFT);
> @@ -2233,7 +2246,7 @@ static int vgic_its_restore_cte(struct vgic_its *its, 
> gpa_t gpa, int esz)
>   if (ret)
>   return ret;
>   collection->target_addr = target_addr;
> - return 1;
> + return 0;
>  }
>  
>  /**
> @@ -2278,8 +2291,13 @@ static int vgic_its_save_collection_table(struct 
> vgic_its *its)
>  
>  /**
>   * vgic_its_restore_collection_table - reads the collection table
> - * in guest memory and restores the ITS internal state. Requires the
> - * BASER registers to be restored before.
> + * in guest memory and restores the ITS collection cache.
> + *
> + * @its: its handle
> + *
> + * Requires the Collection BASER to be previously restored.
> + *
> + * Returns 0 on success or < 0 on error
>   */
>  static int vgic_its_restore_collection_table(struct vgic_its *its)
>  {
> @@ -2297,13 +2315,17 @@ static int vgic_its_restore_collection_table(struct 
> vgic_its *its)
>   max_size = GITS_BASER_NR_PAGES(its->baser_coll_table) * SZ_64K;
>  
>   while (read < max_size) {
> - ret = vgic_its_restore_cte(its, gpa, cte_esz);
> - if (ret <= 0)
> - break;
> + bool last;
> +
> + ret = vgic_its_restore_cte(its, gpa, cte_esz, );
> + if (ret < 0 || last)
> + return ret;
>   gpa += cte_esz;
>   read += cte_esz;
>   }
> - return ret;
> +
> + /* table was fully filled with valid entries, decoded without error */
> + return 0;
>  }
>  
>  /**
> -- 
> 2.5.5
> 

It's not that I find this looking particularly elegant, but I'm not sure
I have a better alternative, perhaps it's just the natur of the beast:

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 01/11] KVM: arm/arm64: vgic-its: fix return value for device table restore

On Tue, Oct 17, 2017 at 09:09:59AM +0200, Eric Auger wrote:
> AT the moment if ITT only contains invalid entries,
> vgic_its_restore_itt returns 1 and this is considered as
> an an error in vgic_its_restore_dte.
> 
> Also in case the device table only contains invalid entries,
> the table restore fails and this is not correct.
> 
> This patch fully revisits the errror handling while fixing those
> 2 bugs.
> 
> - entry_fn_t now takes a valid output paraleter

parameter

> - scan_its_table() now returns <= 0 values and output 2 booleans,
 outputs
>   valid and last.
> - vgic_its_restore_itt() now returns <= 0 values.
> - vgic_its_restore_device_tables() also returns <= 0 values.
> 
> With that patch we are able to properly handle the case where
> all data are invalid but we still are able to detect the case
> where a next entry was referenced by some valid entry and
> never found.
> 
> Fixes: 57a9a117154c93 (KVM: arm64: vgic-its: Device table save/restore)
> Fixes: eff484e0298da5 (KVM: arm64: vgic-its: ITT save and restore)
> Signed-off-by: Eric Auger 
> Reported-by: wanghaibin 
> 
> ---
> 
> need to CC stable
> 
> v3 -> v4:
> - set *valid at beginning of handle_l1_dte
> 
> v2 -> v3:
> - add comments
> - added valid parameter
> - vgic_its_restore_itt don't return +1 anymore
> - reword the commit message
> 
> v1 -> v2:
> - if (ret > 0) ret = 0
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 95 
> 
>  1 file changed, 70 insertions(+), 25 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index f51c1e1..eea14a1 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -1772,16 +1772,20 @@ static u32 compute_next_eventid_offset(struct 
> list_head *h, struct its_ite *ite)
>  
>  /**
>   * entry_fn_t - Callback called on a table entry restore path
> + *
>   * @its: its handle
>   * @id: id of the entry
>   * @entry: pointer to the entry
>   * @opaque: pointer to an opaque data
> + * @valid: indicates whether valid data is associated to this entry
> + * (the entry itself in case of linear table or entries in the next level,
> + * in case of hierachical tables)
>   *
>   * Return: < 0 on error, 0 if last element was identified, id offset to next
>   * element otherwise
>   */
>  typedef int (*entry_fn_t)(struct vgic_its *its, u32 id, void *entry,
> -   void *opaque);
> +   void *opaque, bool *valid);
>  
>  /**
>   * scan_its_table - Scan a contiguous table in guest RAM and applies a 
> function
> @@ -1794,29 +1798,34 @@ typedef int (*entry_fn_t)(struct vgic_its *its, u32 
> id, void *entry,
>   * @start_id: the ID of the first entry in the table
>   * (non zero for 2d level tables)
>   * @fn: function to apply on each entry
> + * @opaque: opaque data passed to @fn
> + * @valid: indicates whether the table contains any valid data
> + * @last: returns whether the last valid entry was decoded
>   *
> - * Return: < 0 on error, 0 if last element was identified, 1 otherwise
> - * (the last element may not be found on second level tables)
> + * Return: < 0 on error, 0 on success
>   */
>  static int scan_its_table(struct vgic_its *its, gpa_t base, int size, int 
> esz,
> -   int start_id, entry_fn_t fn, void *opaque)
> +   int start_id, entry_fn_t fn, void *opaque,
> +   bool *valid, bool *last)
>  {
>   void *entry = kzalloc(esz, GFP_KERNEL);
>   struct kvm *kvm = its->dev->kvm;
>   unsigned long len = size;
>   int id = start_id;
>   gpa_t gpa = base;
> + int next_offset = 0;
>   int ret;
>  
>   while (len > 0) {
> - int next_offset;
>   size_t byte_offset;
> + bool entry_valid;
>  
>   ret = kvm_read_guest(kvm, gpa, entry, esz);
>   if (ret)
>   goto out;
>  
> - next_offset = fn(its, id, entry, opaque);
> + next_offset = fn(its, id, entry, opaque, _valid);
> + *valid |= entry_valid;
>   if (next_offset <= 0) {
>   ret = next_offset;
>   goto out;
> @@ -1827,9 +1836,15 @@ static int scan_its_table(struct vgic_its *its, gpa_t 
> base, int size, int esz,
>   gpa += byte_offset;
>   len -= byte_offset;
>   }
> - ret =  1;
> -
> + /*
> +  * the table lookup was completed without identifying the
> +  * last valid entry (ie. next_offset > 0).
> +  */

but you never set last to false?  If you require the caller to set the
variable to false, that should be documented, but it's weird.

> + ret = 0;
>  out:
> + if (!next_offset)
> + *last = true;
> +

so if we scan the entire table to the end we won't set last?  Isn't that
a

[PATCH v4 09/13] arm64: kernel: Add arch-specific SDEI entry code and CPU masking

The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.

Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),

Add an sdei-entry.S asm file to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.

This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.

Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.

Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.

When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stack.

Signed-off-by: James Morse 

---
Changes since v3:
 * Added const to clobbered_registers,
 * Removed extern from C function declarations.
 * Header file names changed

Changes since v2:
 * Added PAN-setting as we didn't take an exception to get here.
 * Added VMAP stack allocation and switching.
 * overwrite x29 on entry from not-the-kernel.
 * Added a pe-mask to crash_smp_send_stop()s 'no secondaries' path (oops).
 * Added ELR-hasn't-changed test. Any exception in the handler will panic KVM
   as its switched VBAR_EL1, but go silently undetected otherwise. Generate a
   warning.

 arch/arm64/Kconfig  |   2 +-
 arch/arm64/include/asm/sdei.h   |  46 ++-
 arch/arm64/include/asm/stacktrace.h |   3 +
 arch/arm64/kernel/Makefile  |   1 +
 arch/arm64/kernel/asm-offsets.c |   5 +
 arch/arm64/kernel/sdei-entry.S  | 122 +++
 arch/arm64/kernel/sdei.c| 235 
 arch/arm64/kernel/smp.c |  11 +-
 8 files changed, 421 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/kernel/sdei-entry.S
 create mode 100644 arch/arm64/kernel/sdei.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..70dfe4e9ccc5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -98,7 +98,7 @@ config ARM64
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
-   select HAVE_NMI if ACPI_APEI_SEA
+   select HAVE_NMI
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
diff --git a/arch/arm64/include/asm/sdei.h b/arch/arm64/include/asm/sdei.h
index b60381a05787..bbebdbc0b754 100644
--- a/arch/arm64/include/asm/sdei.h
+++ b/arch/arm64/include/asm/sdei.h
@@ -16,6 +16,48 @@
 #ifndef __ASM_SDEI_H
 #define __ASM_SDEI_H
 
-/* Later patches add the arch specific bits */
+/* Values for sdei_exit_mode */
+#define SDEI_EXIT_HVC  0
+#define SDEI_EXIT_SMC  1
 
-#endif /* __ASM_SDEI_H */
+#define SDEI_STACK_SIZEIRQ_STACK_SIZE
+
+#ifndef __ASSEMBLY__
+
+#include 
+#include 
+
+#include 
+
+extern unsigned long sdei_exit_mode;
+
+/* Software Delegated Exception entry point from firmware*/
+asmlinkage void __sdei_asm_handler(unsigned long event_num, unsigned long arg,
+  unsigned long pc, unsigned long pstate);
+
+/*
+ * The above entry point does the minimum to call C code. This function does
+ * anything else, before calling the driver.
+ */
+struct sdei_registered_event;
+asmlinkage unsigned long __sdei_handler(struct pt_regs *regs,
+   struct sdei_registered_event *arg);
+
+unsigned long sdei_arch_get_entry_point(int conduit);
+#define sdei_arch_get_entry_point(x)   sdei_arch_get_entry_point(x)
+
+bool _on_sdei_stack(unsigned long sp);
+static inline bool on_sdei_stack(unsigned long sp)
+{
+   if (!IS_ENABLED(CONFIG_VMAP_STACK))
+   return false;
+   if (!IS_ENABLED(CONFIG_ARM_SDE_INTERFACE))
+   return false;
+   if (in_nmi())
+   return _on_sdei_stack(sp);
+
+   return false;
+}
+
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_SDEI_H */
diff --git a/arch/arm64/include/asm/stacktrace.h 
b/arch/arm64/include/asm/stacktrace.h
index 6ad30776e984..472ef944e932 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 
 struct stackframe {
unsigned long

[PATCH v4 10/13] firmware: arm_sdei: Add support for CPU and system power states

When a CPU enters an idle lower-power state or is powering off, we
need to mask SDE events so that no events can be delivered while we
are messing with the MMU as the registered entry points won't be valid.

If the system reboots, we want to unregister all events and mask the CPUs.
For kexec this allows us to hand a clean slate to the next kernel
instead of relying on it to call sdei_{private,system}_data_reset().

For hibernate we unregister all events and re-register them on restore,
in case we restored with the SDE code loaded at a different address.
(e.g. KASLR).

Add all the notifiers necessary to do this. We only support shared events
so all events are left registered and enabled over CPU hotplug.

Signed-off-by: James Morse 

---
Changes since v3:
 * Renamed CPUHP enum entry to have an ARM_ prefix.

 drivers/firmware/arm_sdei.c | 228 +++-
 include/linux/arm_sdei.h|   3 +
 include/linux/cpuhotplug.h  |   1 +
 3 files changed, 231 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 676659976c3c..28e4c4cbb16d 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -30,12 +32,15 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -259,6 +264,11 @@ static void _ipi_mask_cpu(void *ignored)
sdei_mask_local_cpu();
 }
 
+static int sdei_cpuhp_down(unsigned int ignored)
+{
+   return sdei_mask_local_cpu();
+}
+
 int sdei_unmask_local_cpu(void)
 {
int err;
@@ -280,6 +290,11 @@ static void _ipi_unmask_cpu(void *ignored)
sdei_unmask_local_cpu();
 }
 
+static int sdei_cpuhp_up(unsigned int ignored)
+{
+   return sdei_unmask_local_cpu();
+}
+
 static void _ipi_private_reset(void *ignored)
 {
int err;
@@ -318,6 +333,12 @@ static int sdei_platform_reset(void)
return err;
 }
 
+static int sdei_api_event_status(u32 event_num, u64 *result)
+{
+   return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_STATUS, event_num, 0, 0, 0,
+ 0, result);
+}
+
 static int sdei_api_event_enable(u32 event_num)
 {
return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_ENABLE, event_num, 0, 0, 0,
@@ -372,8 +393,23 @@ static int sdei_api_event_unregister(u32 event_num)
 
 static int _sdei_event_unregister(struct sdei_event *event)
 {
-   if (event->type == SDEI_EVENT_TYPE_SHARED)
+   int err;
+   u64 result;
+   bool enabled;
+
+   if (event->type == SDEI_EVENT_TYPE_SHARED) {
+   /*
+* We re-register events after power-management, remember if
+* this event was previously enabled.
+*/
+   err = sdei_api_event_status(event->event_num, );
+   if (err)
+   return err;
+   enabled = !!(result & BIT(SDEI_EVENT_STATUS_ENABLED));
+   event->registered->was_enabled = enabled;
+
return sdei_api_event_unregister(event->event_num);
+   }
 
return -EINVAL;
 }
@@ -406,6 +442,26 @@ int sdei_event_unregister(u32 event_num)
 }
 EXPORT_SYMBOL(sdei_event_unregister);
 
+/*
+ * unregister events, but don't destroy them as they are re-registered by
+ * sdei_reregister_events().
+ */
+static int sdei_event_unregister_all(void)
+{
+   int err = 0;
+   struct sdei_event *event;
+
+   spin_lock(_events_lock);
+   list_for_each_entry(event, _events, list) {
+   err = _sdei_event_unregister(event);
+   if (err)
+   break;
+   }
+   spin_unlock(_events_lock);
+
+   return err;
+}
+
 static int sdei_api_event_register(u32 event_num, unsigned long entry_point,
   void *arg, u64 flags, u64 affinity)
 {
@@ -462,6 +518,148 @@ int sdei_event_register(u32 event_num, 
sdei_event_callback *cb, void *arg,
 }
 EXPORT_SYMBOL(sdei_event_register);
 
+static int sdei_reregister_event(struct sdei_event *event)
+{
+   int err;
+
+   lockdep_assert_held(_events_lock);
+
+   err = _sdei_event_register(event);
+   if (err) {
+   pr_err("Failed to re-register event %u\n", event->event_num);
+   sdei_event_destroy(event);
+   return err;
+   }
+
+   if (event->type == SDEI_EVENT_TYPE_SHARED) {
+   if (event->registered->was_enabled)
+   err = sdei_api_event_enable(event->event_num);
+   }
+
+   if (err)
+   pr_err("Failed to re-enable event %u\n", event->event_num);
+
+   return err;
+}
+
+static int sdei_reregister_events(void)
+{
+   int err = 0;
+   struct sdei_event *event;
+
+   spin_lock(_events_lock);
+   list_for_each_entry(event, _events, list)

[PATCH v4 12/13] arm64: acpi: Remove __init from acpi_psci_use_hvc() for use by SDEI

SDEI inherits the 'use hvc' bit that is also used by PSCI. PSCI does all
its initialisation early, SDEI does its late.

Remove the __init annotation from acpi_psci_use_hvc().

Signed-off-by: James Morse 

---
The function name is unchanged as this bit is named 'PSCI_USE_HVC'
in table 5-37 of ACPIv6.2.

 arch/arm64/kernel/acpi.c | 2 +-
 include/linux/psci.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index b3162715ed78..252396a96c78 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -117,7 +117,7 @@ bool __init acpi_psci_present(void)
 }
 
 /* Whether HVC must be used instead of SMC as the PSCI conduit */
-bool __init acpi_psci_use_hvc(void)
+bool acpi_psci_use_hvc(void)
 {
return acpi_gbl_FADT.arm_boot_flags & ACPI_FADT_PSCI_USE_HVC;
 }
diff --git a/include/linux/psci.h b/include/linux/psci.h
index bdea1cb5e1db..e25a2a82 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -46,10 +46,11 @@ static inline int psci_dt_init(void) { return 0; }
 #if defined(CONFIG_ARM_PSCI_FW) && defined(CONFIG_ACPI)
 int __init psci_acpi_init(void);
 bool __init acpi_psci_present(void);
-bool __init acpi_psci_use_hvc(void);
+bool acpi_psci_use_hvc(void);
 #else
 static inline int psci_acpi_init(void) { return 0; }
 static inline bool acpi_psci_present(void) { return false; }
+static inline bool acpi_psci_use_hvc(void) {return false; }
 #endif
 
 #endif /* __LINUX_PSCI_H */
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 05/13] KVM: arm64: Stop save/restoring host tpidr_el1 on VHE

Now that a VHE host uses tpidr_el2 for the cpu offset we no longer
need KVM to save/restore tpidr_el1. Move this from the 'common' code
into the non-vhe code. While we're at it, on VHE we don't need to
save the ELR or SPSR as kernel_entry in entry.S will have pushed these
onto the kernel stack, and will restore them from there. Move these
to the non-vhe code as we need them to get back to the host.

Finally remove the always-copy-tpidr we hid in the stage2 setup
code, cpufeature's enable callback will do this for VHE, we only
need KVM to do it for non-vhe. Add the copy into kvm-init instead.

Signed-off-by: James Morse 
Reviewed-by: Christoffer Dall 

---
Changes since v1:
 * Switched KVM<->arm64 in the subject.

 arch/arm64/kvm/hyp-init.S  |  4 
 arch/arm64/kvm/hyp/s2-setup.c  |  3 ---
 arch/arm64/kvm/hyp/sysreg-sr.c | 16 
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index 3f9615582377..fbf259893f6a 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -122,6 +122,10 @@ CPU_BE(orr x4, x4, #SCTLR_ELx_EE)
kern_hyp_va x2
msr vbar_el2, x2
 
+   /* copy tpidr_el1 into tpidr_el2 for use by HYP */
+   mrs x1, tpidr_el1
+   msr tpidr_el2, x1
+
/* Hello, World! */
eret
 ENDPROC(__kvm_hyp_init)
diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
index 7fb88274eba1..a81f5e10fc8c 100644
--- a/arch/arm64/kvm/hyp/s2-setup.c
+++ b/arch/arm64/kvm/hyp/s2-setup.c
@@ -84,8 +84,5 @@ u32 __hyp_text __init_stage2_translation(void)
 
write_sysreg(val, vtcr_el2);
 
-   /* copy tpidr_el1 into tpidr_el2 for use by HYP */
-   write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
-
return parange;
 }
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 934137647837..c54cc2afb92b 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -27,8 +27,8 @@ static void __hyp_text __sysreg_do_nothing(struct 
kvm_cpu_context *ctxt) { }
 /*
  * Non-VHE: Both host and guest must save everything.
  *
- * VHE: Host must save tpidr*_el[01], actlr_el1, mdscr_el1, sp0, pc,
- * pstate, and guest must save everything.
+ * VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0,
+ * and guest must save everything.
  */
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
@@ -36,11 +36,8 @@ static void __hyp_text __sysreg_save_common_state(struct 
kvm_cpu_context *ctxt)
ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
ctxt->sys_regs[TPIDR_EL0]   = read_sysreg(tpidr_el0);
ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
-   ctxt->sys_regs[TPIDR_EL1]   = read_sysreg(tpidr_el1);
ctxt->sys_regs[MDSCR_EL1]   = read_sysreg(mdscr_el1);
ctxt->gp_regs.regs.sp   = read_sysreg(sp_el0);
-   ctxt->gp_regs.regs.pc   = read_sysreg_el2(elr);
-   ctxt->gp_regs.regs.pstate   = read_sysreg_el2(spsr);
 }
 
 static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
@@ -62,10 +59,13 @@ static void __hyp_text __sysreg_save_state(struct 
kvm_cpu_context *ctxt)
ctxt->sys_regs[AMAIR_EL1]   = read_sysreg_el1(amair);
ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
+   ctxt->sys_regs[TPIDR_EL1]   = read_sysreg(tpidr_el1);
 
ctxt->gp_regs.sp_el1= read_sysreg(sp_el1);
ctxt->gp_regs.elr_el1   = read_sysreg_el1(elr);
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
+   ctxt->gp_regs.regs.pc   = read_sysreg_el2(elr);
+   ctxt->gp_regs.regs.pstate   = read_sysreg_el2(spsr);
 }
 
 static hyp_alternate_select(__sysreg_call_save_host_state,
@@ -89,11 +89,8 @@ static void __hyp_text __sysreg_restore_common_state(struct 
kvm_cpu_context *ctx
write_sysreg(ctxt->sys_regs[ACTLR_EL1],   actlr_el1);
write_sysreg(ctxt->sys_regs[TPIDR_EL0],   tpidr_el0);
write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
-   write_sysreg(ctxt->sys_regs[TPIDR_EL1],   tpidr_el1);
write_sysreg(ctxt->sys_regs[MDSCR_EL1],   mdscr_el1);
write_sysreg(ctxt->gp_regs.regs.sp,   sp_el0);
-   write_sysreg_el2(ctxt->gp_regs.regs.pc,   elr);
-   write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
 }
 
 static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
@@ -115,10 +112,13 @@ static void __hyp_text __sysreg_restore_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair);
write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1],   cntkctl);
write_sysreg(ctxt->sys_regs[PAR_EL1],   par_el1);
+   write_sysreg(ctxt->sys_regs[TPIDR_EL1],

[PATCH v4 11/13] firmware: arm_sdei: add support for CPU private events

Private SDE events are per-cpu, and need to be registered and enabled
on each CPU.

Hide this detail from the caller by adapting our {,un}register and
{en,dis}able calls to send an IPI to each CPU if the event is private.

CPU private events are unregistered when the CPU is powered-off, and
re-registered when the CPU is brought back online. This saves bringing
secondary cores back online to call private_reset() on shutdown, kexec
and resume from hibernate.

Signed-off-by: James Morse 
---
 drivers/firmware/arm_sdei.c | 242 +---
 1 file changed, 228 insertions(+), 14 deletions(-)

diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 28e4c4cbb16d..5598d9ba8b5d 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -21,9 +21,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -64,12 +66,49 @@ struct sdei_event {
u8  priority;
 
/* This pointer is handed to firmware as the event argument. */
-   struct sdei_registered_event *registered;
+   union {
+   /* Shared events */
+   struct sdei_registered_event *registered;
+
+   /* CPU private events */
+   struct sdei_registered_event __percpu *private_registered;
+   };
 };
 
 static LIST_HEAD(sdei_events);
 static DEFINE_SPINLOCK(sdei_events_lock);
 
+/* When frozen, cpu-hotplug notifiers shouldn't unregister/re-register events 
*/
+static bool frozen;
+
+/* Private events are registered/enabled via IPI passing one of these */
+struct sdei_crosscall_args {
+   struct sdei_event *event;
+   atomic_t errors;
+   int first_error;
+};
+
+#define CROSSCALL_INIT(arg, event) (arg.event = event, \
+arg.first_error = 0, \
+atomic_set(, 0))
+
+static inline int sdei_do_cross_call(void *fn, struct sdei_event *event)
+{
+   struct sdei_crosscall_args arg;
+
+   CROSSCALL_INIT(arg, event);
+   on_each_cpu(fn, , true);
+
+   return arg.first_error;
+}
+
+static inline void
+sdei_cross_call_return(struct sdei_crosscall_args *arg, int err)
+{
+   if (err && (atomic_inc_return(>errors) == 1))
+   arg->first_error = err;
+}
+
 static int sdei_to_linux_errno(unsigned long sdei_err)
 {
switch (sdei_err) {
@@ -213,6 +252,26 @@ static struct sdei_event *sdei_event_create(u32 event_num,
reg->callback = cb;
reg->callback_arg = cb_arg;
event->registered = reg;
+   } else {
+   int cpu;
+   struct sdei_registered_event __percpu *regs;
+
+   regs = alloc_percpu(struct sdei_registered_event);
+   if (!regs) {
+   kfree(event);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   for_each_possible_cpu(cpu) {
+   reg = per_cpu_ptr(regs, cpu);
+
+   reg->event_num = event->event_num;
+   reg->priority = event->priority;
+   reg->callback = cb;
+   reg->callback_arg = cb_arg;
+   }
+
+   event->private_registered = regs;
}
 
if (sdei_event_find(event_num)) {
@@ -234,6 +293,8 @@ static void sdei_event_destroy(struct sdei_event *event)
 
if (event->type == SDEI_EVENT_TYPE_SHARED)
kfree(event->registered);
+   else
+   free_percpu(event->private_registered);
 
kfree(event);
 }
@@ -264,11 +325,6 @@ static void _ipi_mask_cpu(void *ignored)
sdei_mask_local_cpu();
 }
 
-static int sdei_cpuhp_down(unsigned int ignored)
-{
-   return sdei_mask_local_cpu();
-}
-
 int sdei_unmask_local_cpu(void)
 {
int err;
@@ -290,11 +346,6 @@ static void _ipi_unmask_cpu(void *ignored)
sdei_unmask_local_cpu();
 }
 
-static int sdei_cpuhp_up(unsigned int ignored)
-{
-   return sdei_unmask_local_cpu();
-}
-
 static void _ipi_private_reset(void *ignored)
 {
int err;
@@ -345,6 +396,16 @@ static int sdei_api_event_enable(u32 event_num)
  0, NULL);
 }
 
+static void _ipi_event_enable(void *data)
+{
+   int err;
+   struct sdei_crosscall_args *arg = data;
+
+   err = sdei_api_event_enable(arg->event->event_num);
+
+   sdei_cross_call_return(arg, err);
+}
+
 int sdei_event_enable(u32 event_num)
 {
int err = -EINVAL;
@@ -356,6 +417,8 @@ int sdei_event_enable(u32 event_num)
err = -ENOENT;
else if (event->type == SDEI_EVENT_TYPE_SHARED)
err = sdei_api_event_enable(event->event_num);
+   else
+   err = sdei_do_cross_call(_ipi_event_enable, event);
spin_unlock(_events_lock);
 
return err;
@@ -368,6 +431,16 @@ static int

[PATCH v4 00/13] arm64/firmware: Software Delegated Exception Interface

Hello!

The Software Delegated Exception Interface (SDEI) is an ARM specification
for registering callbacks from the platform firmware into the OS.
This is intended to be used to implement firmware-first RAS notifications,
but also supports vendor-defined events and binding IRQs as events.

The document is here:
http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf

I anticpate once reviewed this series will go via the arm-soc tree, as the
bulk of the code is under /drivers/firmware/.

The major change in v4 is file renames.
The major change in v3 was due to VMAP stacks. We can't trust sp when we take
an event, so need our own SDEI stack. Events can nest, so we need two. Private
events can be delivered per-cpu, so we need two per-cpu SDEI stacks.

Future patches may try to reduce this stack size, or push the stacks to be
allocated when events are registered.

The APCICA support for the SDEI table is now in mainline, so this series has
grown the detection code for non-DT systems. The APEI/GHES wire-up is still
being held back as APEI's ghes_proc() isn't (yet) safe for multiple sources
of NMI.

Exposing the HVC range through KVM to user-space is gone, this will be
re-incarnated as something that better fits KVMs architecture emulation,
as opposed to assuming all-the-world's-SMC-CC.

This series (juggles some registers with KVM+VHE, then) adds a DT binding to
trigger probing of the interface and support for the SDEI API.

SDEI runs between adjacent exception levels, so events will always be delivered
to EL2 if firmware is at EL3. For VHE hosts we run the SDEI event handler
behind KVM's back with all exceptions masked. Once the handler has done its
work we return to the appropriate vbar+irq entry. This allows KVM to
world-switch and deliver any signals sent by the handler to Qemu/kvmtool. We
do the same thing if we interrupt host EL0. If we interrupted code with
interrupts masked, we use a different API call to return to the interrupted
context.

What about non-VHE KVM? If you don't have VHE support and boot at EL2, the
kernel drops to EL1. This driver will print an error message then give up. This
is because events would still be delivered to EL2 hitting either KVM, or the
hyp-stub. Supporting this is complicated, but because the main use-case is
RAS, and ARM v8.2's RAS extensions imply v8.1's Virtual Host Extensions, we
can assume all platforms with SDEI will support VHE too. (I have some ideas
on how to support non-VHE).

Running the event handler behind VHE-KVM's back has some side effects: The
event handler will blindly use any registers that are shared between the host
and guest. The two that I think matter are TPIDR_EL1, and the debug state. The
guest may have set MDSCR_EL1 so debug exceptions must remain masked. The
guest's TPIDR_EL1 will be used by the event handler if it accesses per-cpu
variables. This needs fixing. The first part of this series juggles KVMs use
of TPIDR_EL2 so that we share it with the host on VHE systems. An equivalent
change for 32bit is (still) on my todo list. (the alternative to this is to
have a parody world switch in the SDEI event handler, but this would mean
special casing interrupted guests, and be an ABI link to KVM.)

Is this another begins-with-S RAS mechanism for arm64? Yes.
Why? Any notification delivered as an exception will overwrite the exception
registers. This is fatal for the running thread if it happens during entry.S's
kernel_enter or kernel_exit. Instead of adding masking and routing controls,
events are delivered to a registered address at a fixed exception level and
don't change the exception registers when delivered.

This series can be retrieved from:
git://linux-arm.org/linux-jm.git -b sdei/v4/base

Questions and contradictions welcome!

Thanks,

James

[Changes since previous versions are noted on each patch]

James Morse (13):
KVM: arm64: Store vcpu on the stack during __guest_enter()
KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu
allocation
KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
arm64: alternatives: use tpidr_el2 on VHE hosts
KVM: arm64: Stop save/restoring host tpidr_el1 on VHE
Docs: dt: add devicetree binding for describing arm64 SDEI firmware
firmware: arm_sdei: Add driver for Software Delegated Exceptions
arm64: Add vmap_stack header file
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
firmware: arm_sdei: Add support for CPU and system power states
firmware: arm_sdei: add support for CPU private events
arm64: acpi: Remove __init from acpi_psci_use_hvc() for use by SDEI
firmware: arm_sdei: Discover SDEI support via ACPI

.../devicetree/bindings/arm/firmware/sdei.txt | 42 +
MAINTAINERS|9 +
arch/arm64/Kconfig |2 +-
arch/arm64/include/asm/alternative.h |2 +
arch/arm64/include/asm/assembler.h

[PATCH v4 02/13] KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation

kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init()
used to store the host EL1 registers when KVM switches to a guest.

Make it easier for ASM to generate pointers into this per-cpu memory
by making it a static allocation.

Signed-off-by: James Morse 
Acked-by: Christoffer Dall 
---
 virt/kvm/arm/arm.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index b9f68e4add71..3786dd7e277d 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -51,8 +51,8 @@
 __asm__(".arch_extension   virt");
 #endif
 
+DEFINE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
-static kvm_cpu_context_t __percpu *kvm_host_cpu_state;
 
 /* Per-CPU variable containing the currently running vcpu. */
 static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_arm_running_vcpu);
@@ -351,7 +351,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
}
 
vcpu->cpu = cpu;
-   vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
+   vcpu->arch.host_cpu_context = this_cpu_ptr(_host_cpu_state);
 
kvm_arm_set_running_vcpu(vcpu);
 
@@ -1255,19 +1255,8 @@ static inline void hyp_cpu_pm_exit(void)
 }
 #endif
 
-static void teardown_common_resources(void)
-{
-   free_percpu(kvm_host_cpu_state);
-}
-
 static int init_common_resources(void)
 {
-   kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t);
-   if (!kvm_host_cpu_state) {
-   kvm_err("Cannot allocate host CPU state\n");
-   return -ENOMEM;
-   }
-
/* set size of VMID supported by CPU */
kvm_vmid_bits = kvm_get_vmid_bits();
kvm_info("%d-bit VMID\n", kvm_vmid_bits);
@@ -1412,7 +1401,7 @@ static int init_hyp_mode(void)
for_each_possible_cpu(cpu) {
kvm_cpu_context_t *cpu_ctxt;
 
-   cpu_ctxt = per_cpu_ptr(kvm_host_cpu_state, cpu);
+   cpu_ctxt = per_cpu_ptr(_host_cpu_state, cpu);
err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP);
 
if (err) {
@@ -1490,7 +1479,6 @@ int kvm_arch_init(void *opaque)
 out_hyp:
teardown_hyp_mode();
 out_err:
-   teardown_common_resources();
return err;
 }
 
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 08/13] arm64: Add vmap_stack header file

Today the arm64 arch code allocates an extra IRQ stack per-cpu. If we
also have SDEI and VMAP stacks we need two extra per-cpu VMAP stacks.

Move the VMAP stack allocation out to a helper in a new header file.
This avoids missing THREADINFO_GFP, or getting the all-important alignment
wrong.

Signed-off-by: James Morse 

---
Changes since v3:
 * Added BUILD_BUG() instead of a spooky link error

 arch/arm64/include/asm/vmap_stack.h | 38 +
 arch/arm64/kernel/irq.c | 13 ++---
 2 files changed, 40 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm64/include/asm/vmap_stack.h

diff --git a/arch/arm64/include/asm/vmap_stack.h 
b/arch/arm64/include/asm/vmap_stack.h
new file mode 100644
index ..8fc1ef846b15
--- /dev/null
+++ b/arch/arm64/include/asm/vmap_stack.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef __ASM_VMAP_STACK_H
+#define __ASM_VMAP_STACK_H
+
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * To ensure that VMAP'd stack overflow detection works correctly, all VMAP'd
+ * stacks need to have the same alignment.
+ */
+static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node)
+{
+   BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
+
+   return __vmalloc_node_range(stack_size, THREAD_ALIGN,
+   VMALLOC_START, VMALLOC_END,
+   THREADINFO_GFP, PAGE_KERNEL, 0, node,
+   __builtin_return_address(0));
+}
+
+#endif /* __ASM_VMAP_STACK_H */
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 713561e5bcab..60e5fc661f74 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned long irq_err_count;
 
@@ -58,17 +59,7 @@ static void init_irq_stacks(void)
unsigned long *p;
 
for_each_possible_cpu(cpu) {
-   /*
-   * To ensure that VMAP'd stack overflow detection works
-   * correctly, the IRQ stacks need to have the same
-   * alignment as other stacks.
-   */
-   p = __vmalloc_node_range(IRQ_STACK_SIZE, THREAD_ALIGN,
-VMALLOC_START, VMALLOC_END,
-THREADINFO_GFP, PAGE_KERNEL,
-0, cpu_to_node(cpu),
-__builtin_return_address(0));
-
+   p = arch_alloc_vmap_stack(IRQ_STACK_SIZE, cpu_to_node(cpu));
per_cpu(irq_stack_ptr, cpu) = p;
}
 }
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 13/13] firmware: arm_sdei: Discover SDEI support via ACPI

SDEI defines a new ACPI table to indicate the presence of the interface.
The conduit is discovered in the same way as PSCI.

For ACPI we need to create the platform device ourselves as SDEI doesn't
have an entry in the DSDT.

The SDEI platform device should be created after ACPI has been initialised
so that we can parse the table, but before GHES devices are created, which
may register SDE events if they use SDEI as their notification type.

Signed-off-by: James Morse 
---
 drivers/firmware/arm_sdei.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 5598d9ba8b5d..f764f48144c2 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -911,6 +911,14 @@ static int sdei_get_conduit(struct platform_device *pdev)
}
 
pr_warn("invalid \"method\" property: %s\n", method);
+   } else if (IS_ENABLED(CONFIG_ACPI) && !acpi_disabled) {
+   if (acpi_psci_use_hvc()) {
+   sdei_firmware_call = _smccc_hvc;
+   return CONDUIT_HVC;
+   } else {
+   sdei_firmware_call = _smccc_smc;
+   return CONDUIT_SMC;
+   }
}
 
return CONDUIT_INVALID;
@@ -1024,14 +1032,45 @@ static bool __init sdei_present_dt(void)
return true;
 }
 
+static bool __init sdei_present_acpi(void)
+{
+   acpi_status status;
+   struct platform_device *pdev;
+   struct acpi_table_header *sdei_table_header;
+
+   if (acpi_disabled)
+   return false;
+
+   status = acpi_get_table(ACPI_SIG_SDEI, 0, _table_header);
+   if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
+   const char *msg = acpi_format_exception(status);
+
+   pr_info("Failed to get ACPI:SDEI table, %s\n", msg);
+   }
+   if (ACPI_FAILURE(status))
+   return false;
+
+   pdev = platform_device_register_simple(sdei_driver.driver.name, 0, NULL,
+  0);
+   if (IS_ERR(pdev))
+   return false;
+
+   return true;
+}
+
 static int __init sdei_init(void)
 {
-   if (sdei_present_dt())
+   if (sdei_present_dt() || sdei_present_acpi())
platform_driver_register(_driver);
 
return 0;
 }
 
+/*
+ * On an ACPI system SDEI needs to be ready before HEST:GHES tries to register
+ * its events. ACPI is initialised from a subsys_initcall(), GHES is 
initialised
+ * by device_initcall(). We want to be called in the middle.
+ */
 subsys_initcall_sync(sdei_init);
 
 int sdei_event_handler(struct pt_regs *regs,
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 07/13] firmware: arm_sdei: Add driver for Software Delegated Exceptions

The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement firmware notifications (such as
firmware-first RAS) or promote an IRQ that has been promoted to a
firmware-assisted NMI.

Add the code for detecting the SDEI version and the framework for
registering and unregistering events. Subsequent patches will add the
arch-specific backend code and the necessary power management hooks.

Only shared events are supported, power management, private events and
discovery for ACPI systems will be added by later patches.

Signed-off-by: James Morse 

---
Changes since v3:
 * Depend on arm64 from the beginning, add a placeholder arch asm file.
 * Added MAINTAINER record
 * Renamed sdei.h files to arm_sdei.h
 * Removed IS_SDEI_CALL(), KVM won't need this...

Changes since v2:
 * Copy the priority into the structure the arch asm handler gets. This
   is used for VMAP stacks where we can't know if a critical event interrupted
   a normal priority event, thus they need separate stacks.

Changes since v1:
 * Changed entry point to unsigned long, if we support non-vhe systems this
   won't be a valid pointer
 * Made invoke_sdei_fn() pass the only register we are interested in, instead
   of the whole arm_smccc_res
 * Made all the locking WARN_ON()s lockdep_assert_held()s.
 * Moved more messages from 'err' to 'warn'.
 * Made IS_SDEI_CALL() not depend on whether the config option is selected.
 * Made 'event failed' messages rate limited.

 MAINTAINERS   |   9 +
 arch/arm64/include/asm/sdei.h |  21 ++
 drivers/firmware/Kconfig  |   8 +
 drivers/firmware/Makefile |   1 +
 drivers/firmware/arm_sdei.c   | 616 ++
 include/linux/arm_sdei.h  |  99 +++
 include/uapi/linux/arm_sdei.h |  91 +++
 7 files changed, 845 insertions(+)
 create mode 100644 arch/arm64/include/asm/sdei.h
 create mode 100644 drivers/firmware/arm_sdei.c
 create mode 100644 include/linux/arm_sdei.h
 create mode 100644 include/uapi/linux/arm_sdei.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 65b0c88d5ee0..6c9c7483e2b3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12441,6 +12441,15 @@ L: linux-me...@vger.kernel.org
 S: Supported
 F: drivers/media/pci/solo6x10/
 
+SOFTWARE DELEGATED EXCEPTION INTERFACE (SDEI)
+M: James Morse 
+L: linux-arm-ker...@lists.infradead.org
+S: Maintained
+F: Documentation/devicetree/bindings/arm/firmware/sdei.txt
+F: drivers/firmware/arm_sdei.c
+F: include/linux/sdei.h
+F: include/uapi/linux/sdei.h
+
 SOFTWARE RAID (Multiple Disks) SUPPORT
 M: Shaohua Li 
 L: linux-r...@vger.kernel.org
diff --git a/arch/arm64/include/asm/sdei.h b/arch/arm64/include/asm/sdei.h
new file mode 100644
index ..b60381a05787
--- /dev/null
+++ b/arch/arm64/include/asm/sdei.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef __ASM_SDEI_H
+#define __ASM_SDEI_H
+
+/* Later patches add the arch specific bits */
+
+#endif /* __ASM_SDEI_H */
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 6e4ed5a9c6fd..1f6633b337aa 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -48,6 +48,14 @@ config ARM_SCPI_POWER_DOMAIN
  This enables support for the SCPI power domains which can be
  enabled or disabled via the SCP firmware
 
+config ARM_SDE_INTERFACE
+   bool "ARM Software Delegated Exception Interface (SDEI)"
+   depends on ARM64
+   help
+ The Software Delegated Exception Interface (SDEI) is an ARM
+ standard for registering callbacks from the platform firmware
+ into the OS. This is typically used to implement RAS notifications.
+
 config EDD
tristate "BIOS Enhanced Disk Drive calls determine boot disk"
depends on X86
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index a37f12e8d137..229984feca4a 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ARM_PSCI_FW)   += psci.o
 obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
 obj-$(CONFIG_ARM_SCPI_PROTOCOL)+= arm_scpi.o
 obj-$(CONFIG_ARM_SCPI_POWER_DOMAIN) += scpi_pm_domain.o
+obj-$(CONFIG_ARM_SDE_INTERFACE)+= arm_sdei.o

[PATCH v4 06/13] Docs: dt: add devicetree binding for describing arm64 SDEI firmware

The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications, or from an
IRQ that has been promoted to a firmware-assisted NMI.

Add a new devicetree binding to describe the SDE firmware interface.

Signed-off-by: James Morse 
Acked-by: Rob Herring 

---
Changes since v2:
 * Added Rob's Ack
 * Fixed 'childe node' typo

Changes since v1:
* Added bound IRQ description for binding,
* Reference SMC-CC, not 'AAPCS like'
* Move sdei node under firmware node (and the file path)

 .../devicetree/bindings/arm/firmware/sdei.txt  | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/firmware/sdei.txt

diff --git a/Documentation/devicetree/bindings/arm/firmware/sdei.txt 
b/Documentation/devicetree/bindings/arm/firmware/sdei.txt
new file mode 100644
index ..ee3f0ff49889
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/firmware/sdei.txt
@@ -0,0 +1,42 @@
+* Software Delegated Exception Interface (SDEI)
+
+Firmware implementing the SDEI functions described in ARM document number
+ARM DEN 0054A ("Software Delegated Exception Interface") can be used by
+Linux to receive notification of events such as those generated by
+firmware-first error handling, or from an IRQ that has been promoted to
+a firmware-assisted NMI.
+
+The interface provides a number of API functions for registering callbacks
+and enabling/disabling events. Functions are invoked by trapping to the
+privilege level of the SDEI firmware (specified as part of the binding
+below) and passing arguments in a manner specified by the "SMC Calling
+Convention (ARM DEN 0028B):
+
+r0 => 32-bit Function ID / return value
+   {r1 - r3}   => Parameters
+
+Note that the immediate field of the trapping instruction must be set
+to #0.
+
+The SDEI_EVENT_REGISTER function registers a callback in the kernel
+text to handle the specified event number.
+
+The sdei node should be a child node of '/firmware' and have required
+properties:
+
+ - compatible: should contain:
+   * "arm,sdei-1.0" : For implementations complying to SDEI version 1.x.
+
+ - method: The method of calling the SDEI firmware. Permitted
+   values are:
+   * "smc" : SMC #0, with the register assignments specified in this
+ binding.
+   * "hvc" : HVC #0, with the register assignments specified in this
+ binding.
+Example:
+   firmware {
+   sdei {
+   compatible  = "arm,sdei-1.0";
+   method  = "smc";
+   };
+   };
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 03/13] KVM: arm64: Change hyp_panic()s dependency on tpidr_el2

Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the
host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and
on VHE they can be the same register.

KVM calls hyp_panic() when anything unexpected happens. This may occur
while a guest owns the EL1 registers. KVM stashes the vcpu pointer in
tpidr_el2, which it uses to find the host context in order to restore
the host EL1 registers before parachuting into the host's panic().

The host context is a struct kvm_cpu_context allocated in the per-cpu
area, and mapped to hyp. Given the per-cpu offset for this CPU, this is
easy to find. Change hyp_panic() to take a pointer to the
struct kvm_cpu_context. Wrap these calls with an asm function that
retrieves the struct kvm_cpu_context from the host's per-cpu area.

Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during
kvm init. (Later patches will make this unnecessary for VHE hosts)

We print out the vcpu pointer as part of the panic message. Add a back
reference to the 'running vcpu' in the host cpu context to preserve this.

Signed-off-by: James Morse 
Reviewed-by: Christoffer Dall 

---
Changes since v1:
 * Added a comment explaining how =kvm_host_cpu_state gets from a host-va
   to a hyp va.
 * Added the first paragraph to the commit message.

 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/hyp/hyp-entry.S| 12 
 arch/arm64/kvm/hyp/s2-setup.c |  3 +++
 arch/arm64/kvm/hyp/switch.c   | 25 +
 4 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e923b58606e2..806ccef7470a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -191,6 +191,8 @@ struct kvm_cpu_context {
u64 sys_regs[NR_SYS_REGS];
u32 copro[NR_COPRO_REGS];
};
+
+   struct kvm_vcpu *__hyp_running_vcpu;
 };
 
 typedef struct kvm_cpu_context kvm_cpu_context_t;
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index fce7cc507e0a..e4f37b9dd47c 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -163,6 +163,18 @@ ENTRY(__hyp_do_panic)
eret
 ENDPROC(__hyp_do_panic)
 
+ENTRY(__hyp_panic)
+   /*
+* '=kvm_host_cpu_state' is a host VA from the constant pool, it may
+* not be accessible by this address from EL2, hyp_panic() converts
+* it with kern_hyp_va() before use.
+*/
+   ldr x0, =kvm_host_cpu_state
+   mrs x1, tpidr_el2
+   add x0, x0, x1
+   b   hyp_panic
+ENDPROC(__hyp_panic)
+
 .macro invalid_vector  label, target = __hyp_panic
.align  2
 \label:
diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
index a81f5e10fc8c..7fb88274eba1 100644
--- a/arch/arm64/kvm/hyp/s2-setup.c
+++ b/arch/arm64/kvm/hyp/s2-setup.c
@@ -84,5 +84,8 @@ u32 __hyp_text __init_stage2_translation(void)
 
write_sysreg(val, vtcr_el2);
 
+   /* copy tpidr_el1 into tpidr_el2 for use by HYP */
+   write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
+
return parange;
 }
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c641c4..235d615cee30 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -286,9 +286,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
u64 exit_code;
 
vcpu = kern_hyp_va(vcpu);
-   write_sysreg(vcpu, tpidr_el2);
 
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+   host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = >arch.ctxt;
 
__sysreg_save_host_state(host_ctxt);
@@ -393,7 +393,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx 
ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
 
-static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
+static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
+struct kvm_vcpu *vcpu)
 {
unsigned long str_va;
 
@@ -407,35 +408,35 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, 
u64 elr, u64 par)
__hyp_do_panic(str_va,
   spsr,  elr,
   read_sysreg(esr_el2),   read_sysreg_el2(far),
-  read_sysreg(hpfar_el2), par,
-  (void *)read_sysreg(tpidr_el2));
+  read_sysreg(hpfar_el2), par, vcpu);
 }
 
-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par)
+static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
+   struct kvm_vcpu *vcpu)
 {
panic(__hyp_panic_string,
  spsr,  elr,
  read_sysreg_el2(esr),   read_sysreg_el2(far),
-

[PATCH v4 04/13] arm64: alternatives: use tpidr_el2 on VHE hosts

Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in
tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1
on VHE hosts, and allows future code to blindly access per-cpu variables
without triggering world-switch.

Signed-off-by: James Morse 
Reviewed-by: Christoffer Dall 

---
Changes since v3:
 * Moved 'alternatives_applied' test all into C,
 * Made enable method static and dragged up before first-use.

Changes since v1:
 * cpu_copy_el2regs()'s 'have I been patched' test now always sets a register,
   just in case the compiler optimises out part of the logic.

 arch/arm64/include/asm/alternative.h |  2 ++
 arch/arm64/include/asm/assembler.h   |  8 
 arch/arm64/include/asm/percpu.h  | 11 +--
 arch/arm64/kernel/alternative.c  |  9 +
 arch/arm64/kernel/cpufeature.c   | 17 +
 arch/arm64/mm/proc.S |  8 
 6 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/alternative.h 
b/arch/arm64/include/asm/alternative.h
index 6e1cb8c5af4d..f9e2f69f296e 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+extern int alternatives_applied;
+
 struct alt_instr {
s32 orig_offset;/* offset to original instruction */
s32 alt_offset; /* offset to replacement instruction */
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d58a6253c6ab..1ba12f59dec0 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -242,7 +242,11 @@ lr .reqx30 // link register
 #else
adr_l   \dst, \sym
 #endif
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs \tmp, tpidr_el1
+alternative_else
+   mrs \tmp, tpidr_el2
+alternative_endif
add \dst, \dst, \tmp
.endm
 
@@ -253,7 +257,11 @@ lr .reqx30 // link register
 */
.macro ldr_this_cpu dst, sym, tmp
adr_l   \dst, \sym
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs \tmp, tpidr_el1
+alternative_else
+   mrs \tmp, tpidr_el2
+alternative_endif
ldr \dst, [\dst, \tmp]
.endm
 
diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index 3bd498e4de4c..43393208229e 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -16,11 +16,15 @@
 #ifndef __ASM_PERCPU_H
 #define __ASM_PERCPU_H
 
+#include 
 #include 
 
 static inline void set_my_cpu_offset(unsigned long off)
 {
-   asm volatile("msr tpidr_el1, %0" :: "r" (off) : "memory");
+   asm volatile(ALTERNATIVE("msr tpidr_el1, %0",
+"msr tpidr_el2, %0",
+ARM64_HAS_VIRT_HOST_EXTN)
+   :: "r" (off) : "memory");
 }
 
 static inline unsigned long __my_cpu_offset(void)
@@ -31,7 +35,10 @@ static inline unsigned long __my_cpu_offset(void)
 * We want to allow caching the value, so avoid using volatile and
 * instead use a fake stack read to hazard against barrier().
 */
-   asm("mrs %0, tpidr_el1" : "=r" (off) :
+   asm(ALTERNATIVE("mrs %0, tpidr_el1",
+   "mrs %0, tpidr_el2",
+   ARM64_HAS_VIRT_HOST_EXTN)
+   : "=r" (off) :
"Q" (*(const unsigned long *)current_stack_pointer));
 
return off;
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 6dd0a3a3e5c9..414288a558c8 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -32,6 +32,8 @@
 #define ALT_ORIG_PTR(a)__ALT_PTR(a, orig_offset)
 #define ALT_REPL_PTR(a)__ALT_PTR(a, alt_offset)
 
+int alternatives_applied;
+
 struct alt_region {
struct alt_instr *begin;
struct alt_instr *end;
@@ -143,7 +145,6 @@ static void __apply_alternatives(void *alt_region, bool 
use_linear_alias)
  */
 static int __apply_alternatives_multi_stop(void *unused)
 {
-   static int patched = 0;
struct alt_region region = {
.begin  = (struct alt_instr *)__alt_instructions,
.end= (struct alt_instr *)__alt_instructions_end,
@@ -151,14 +152,14 @@ static int __apply_alternatives_multi_stop(void *unused)
 
/* We always have a CPU 0 at this point (__init) */
if (smp_processor_id()) {
-   while (!READ_ONCE(patched))
+   while (!READ_ONCE(alternatives_applied))
cpu_relax();
isb();
} else {
-   BUG_ON(patched);
+   BUG_ON(alternatives_applied);
__apply_alternatives(, true);
/* Barriers provided by the cache flushing */
-   WRITE_ONCE(patched, 1);
+   WRITE_ONCE(alternatives_applied,

[PATCH v4 01/13] KVM: arm64: Store vcpu on the stack during __guest_enter()

KVM uses tpidr_el2 as its private vcpu register, which makes sense for
non-vhe world switch as only KVM can access this register. This means
vhe Linux has to use tpidr_el1, which KVM has to save/restore as part
of the host context.

If the SDEI handler code runs behind KVMs back, it mustn't access any
per-cpu variables. To allow this on systems with vhe we need to make
the host use tpidr_el2, saving KVM from save/restoring it.

__guest_enter() stores the host_ctxt on the stack, do the same with
the vcpu.

Signed-off-by: James Morse 
Reviewed-by: Christoffer Dall 

---
Changes since v2:
 * Added middle paragraph of commit message.

 arch/arm64/kvm/hyp/entry.S | 10 +++---
 arch/arm64/kvm/hyp/hyp-entry.S |  6 +++---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 12ee62d6d410..9a8ab59e 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -62,8 +62,8 @@ ENTRY(__guest_enter)
// Store the host regs
save_callee_saved_regs x1
 
-   // Store the host_ctxt for use at exit time
-   str x1, [sp, #-16]!
+   // Store host_ctxt and vcpu for use at exit time
+   stp x1, x0, [sp, #-16]!
 
add x18, x0, #VCPU_CONTEXT
 
@@ -159,6 +159,10 @@ abort_guest_exit_end:
 ENDPROC(__guest_exit)
 
 ENTRY(__fpsimd_guest_restore)
+   // x0: esr
+   // x1: vcpu
+   // x2-x29,lr: vcpu regs
+   // vcpu x0-x1 on the stack
stp x2, x3, [sp, #-16]!
stp x4, lr, [sp, #-16]!
 
@@ -173,7 +177,7 @@ alternative_else
 alternative_endif
isb
 
-   mrs x3, tpidr_el2
+   mov x3, x1
 
ldr x0, [x3, #VCPU_HOST_CONTEXT]
kern_hyp_va x0
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 5170ce1021da..fce7cc507e0a 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -104,6 +104,7 @@ el1_trap:
/*
 * x0: ESR_EC
 */
+   ldr x1, [sp, #16 + 8]   // vcpu stored by __guest_enter
 
/*
 * We trap the first access to the FP/SIMD to save the host context
@@ -116,19 +117,18 @@ alternative_if_not ARM64_HAS_NO_FPSIMD
b.eq__fpsimd_guest_restore
 alternative_else_nop_endif
 
-   mrs x1, tpidr_el2
mov x0, #ARM_EXCEPTION_TRAP
b   __guest_exit
 
 el1_irq:
stp x0, x1, [sp, #-16]!
-   mrs x1, tpidr_el2
+   ldr x1, [sp, #16 + 8]
mov x0, #ARM_EXCEPTION_IRQ
b   __guest_exit
 
 el1_error:
stp x0, x1, [sp, #-16]!
-   mrs x1, tpidr_el2
+   ldr x1, [sp, #16 + 8]
mov x0, #ARM_EXCEPTION_EL1_SERROR
b   __guest_exit
 
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 04/13] arm64: alternatives: use tpidr_el2 on VHE hosts

Hi Catalin,

On 16/10/17 10:58, Catalin Marinas wrote:
> On Fri, Oct 13, 2017 at 05:50:45PM +0100, James Morse wrote:
>> On 13/10/17 16:31, Catalin Marinas wrote:
>>> On Fri, Sep 22, 2017 at 07:26:05PM +0100, James Morse wrote:
 diff --git a/arch/arm64/kernel/cpufeature.c 
 b/arch/arm64/kernel/cpufeature.c
 index cd52d365d1f0..8e4c7da2b126 100644
 --- a/arch/arm64/kernel/cpufeature.c
 +++ b/arch/arm64/kernel/cpufeature.c
>>
 @@ -1308,3 +1309,25 @@ static int __init enable_mrs_emulation(void)
  }
  
  late_initcall(enable_mrs_emulation);
 +
 +int cpu_copy_el2regs(void *__unused)
 +{
 +  int do_copyregs = 0;
 +
 +  /*
 +   * Copy register values that aren't redirected by hardware.
 +   *
 +   * Before code patching, we only set tpidr_el1, all CPUs need to copy
 +   * this value to tpidr_el2 before we patch the code. Once we've done
 +   * that, freshly-onlined CPUs will set tpidr_el2, so we don't need to
 +   * do anything here.
 +   */
 +  asm volatile(ALTERNATIVE("mov %0, #1", "mov %0, #0",
 +   ARM64_HAS_VIRT_HOST_EXTN)
 +  : "=r" (do_copyregs) : : );
>>>
>>> Can you just do:
>>>
>>> if (cpu_have_const_cap(ARM64_HAS_VIRT_HOST_EXTN))
>>> write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
>>>
>>> At this point the capability bits should be set and the jump labels
>>> enabled.
>>
>> These are enabled too early, we haven't done patching yet.
>>
>> We need to copy tpidr_el1 -> tpidr_el2 on all CPUs that are online before 
>> code
>> patching.
>>
>> After code patching new CPUs set tpidr_el2 when they come online, so we don't
>> need to do the copy... but this enable method is still called. There is 
>> nothing
>> for us to do, and tpidr_el1 now contains junk, so the copy

> Ah, I get it now (should've read the comment but I usually expect the
> code to be obvious; it wasn't, at least to me, in this case ;)).

> You could have added the sysreg copying directly in the asm volatile.

I was trying to stick to the sysreg C accessors, and thought there would be more
registers that needed copying. (I discovered this VHE doesn't remap all the _ELx
registers quite late.)


> Anyway, I think it's better if we keep it entirely in C with this hunk
> (untested):

[...]

Yes, that looks much better. I got tangled up in 'which alternative', but you're
right, they are all applied in one go so it doesn't matter.


>   if (!READ_ONCE(alternatives_applied))
>   write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);

I don't think this READ_ONCE() is needed, that only matters within the
stop_machine()/alternatives-patching code that modifies the value on one CPU.


Thanks,

James

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 10/13] firmware: arm_sdei: Add support for CPU and system power states

Hi Catalin,

On 16/10/17 14:52, Catalin Marinas wrote:
> On Fri, Sep 22, 2017 at 07:26:11PM +0100, James Morse wrote:
>> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
>> index f24bfb2b9a2d..466b949474df 100644
>> --- a/include/linux/cpuhotplug.h
>> +++ b/include/linux/cpuhotplug.h
>> @@ -88,6 +88,7 @@ enum cpuhp_state {
>>  CPUHP_AP_PERF_XTENSA_STARTING,
>>  CPUHP_AP_PERF_METAG_STARTING,
>>  CPUHP_AP_MIPS_OP_LOONGSON3_STARTING,
>> +CPUHP_AP_SDEI_STARTING,
> 
> Nitpick: how generic is this as to apply to other architectures?
> Probably not, so shall we prefix this with ARM_?

This may get used on 32bit ARM. I blindly copied PSCI, which doesn't have a
prefix. But it makes sense to have one.


> Not actually a strong preferences but similar question for the
> definitions in the sdei.h files (and maybe the filenames themselves).


Thanks,

James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 08/13] arm64: Add vmap_stack header file

Hi Catalin,

On 13/10/17 16:42, Catalin Marinas wrote:
> On Fri, Sep 22, 2017 at 07:26:09PM +0100, James Morse wrote:
>> diff --git a/arch/arm64/include/asm/vmap_stack.h 
>> b/arch/arm64/include/asm/vmap_stack.h
>> new file mode 100644
>> index ..f41d043cac31
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/vmap_stack.h
>> @@ -0,0 +1,41 @@

>> +#ifndef __ASM_VMAP_STACK_H
>> +#define __ASM_VMAP_STACK_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#ifdef CONFIG_VMAP_STACK
>> +/*
>> + * To ensure that VMAP'd stack overflow detection works correctly, all 
>> VMAP'd
>> + * stacks need to have the same alignment.
>> + */
>> +static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int 
>> node)
>> +{
>> +return __vmalloc_node_range(stack_size, THREAD_ALIGN,
>> +VMALLOC_START, VMALLOC_END,
>> +THREADINFO_GFP, PAGE_KERNEL, 0, node,
>> +__builtin_return_address(0));
>> +}
>> +
>> +#else
>> +unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node); // link 
>> error

> Do you actually need this here? The calling site (at least in this
> patch), is surrounded by CONFIG_VMAP_STACK. Removing the dummy
> declaration should be fine (I haven't read the rest of the patches yet,
> maybe it's needed later;

This was to avoid having to #ifdef the calling site I add in a later patch.


> would a BUILD_BUG do?).

Ah, you can put those in header files, yes that would be clearer.


Thanks,

James





___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 09/13] arm64: kernel: Add arch-specific SDEI entry code and CPU masking

Hi Catalin,

On 16/10/17 14:41, Catalin Marinas wrote:
> On Fri, Sep 22, 2017 at 07:26:10PM +0100, James Morse wrote:
>> diff --git a/arch/arm64/kernel/sdei-entry.S b/arch/arm64/kernel/sdei-entry.S
>> new file mode 100644
>> index ..cf12f8da0789
>> --- /dev/null
>> +++ b/arch/arm64/kernel/sdei-entry.S
>> @@ -0,0 +1,122 @@
>> +/*
>> + * Software Delegated Exception entry point from firmware to the kernel.
>> + *
>> + * Copyright (C) 2017 ARM Ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +/*
>> + * Software Delegated Exception entry point.
>> + *
>> + * x0: Event number
> 
> Currently the only event number is 0. Shall we plan for having other
> events in the future or we just ignore this argument?

'0' is the only event number specified by the SDEI specification. For use with
RAS the event number is read from the HEST. (and there may me more than one of
them).



>> + * x1: struct sdei_registered_event argument from registration time.
>> + * x2: interrupted PC
>> + * x3: interrupted PSTATE
> [...]
>> +/*
>> + * __sdei_handler() returns one of:
>> + *  SDEI_EV_HANDLED -  success, return to the interrupted context.
>> + *  SDEI_EV_FAILED  -  failure, return this error code to firmare.
>> + *  virtual-address -  success, return to this address.
>> + */
>> +static __kprobes unsigned long _sdei_handler(struct pt_regs *regs,
>> + struct sdei_registered_event *arg)
>> +{
>> +u32 mode;
>> +int i, err = 0;
>> +int clobbered_registers = 4;

> Maybe const int here if you want to avoid magic numbers in the 'for'
> loop below. Otherwise it looks like some variable you intend to modify
> in this function.

Sure.

(this was modified in my various attempts to get this working on non-vhe systems
with KVM).


>> +u64 elr = read_sysreg(elr_el1);
>> +u32 kernel_mode = read_sysreg(CurrentEL) | 1;   /* +SPSel */
>> +unsigned long vbar = read_sysreg(vbar_el1);
>> +
>> +/* Retrieve the missing registers values */
>> +for (i = 0; i < clobbered_registers; i++) {
>> +/* from within the handler, this call always succeeds */
>> +sdei_api_event_context(i, >regs[i]);
>> +}
>> +
>> +/*
>> + * We didn't take an exception to get here, set PAN. UAO will be cleared
>> + * by sdei_event_handler()s set_fs(USER_DS) call.
>> + */
>> +asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,
>> +CONFIG_ARM64_PAN));

> Can you use uaccess_disable() directly?

Wouldn't this invoke sw-pan's __uaccess_ttbr0_disable():
> write_sysreg(ttbr, ttbr0_el1);

Probing depends on VHE if booted at EL2, can we guarantee to have PAN (and
therefore not-SW-PAN) too?

(otherwise I can add 'depends on !ARM64_SW_TTBR0_PAN' to the Kconfig)


>> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
>> index d8a9859f6102..1f6633b337aa 100644
>> --- a/drivers/firmware/Kconfig
>> +++ b/drivers/firmware/Kconfig
>> @@ -50,6 +50,7 @@ config ARM_SCPI_POWER_DOMAIN
>>  
>>  config ARM_SDE_INTERFACE
>>  bool "ARM Software Delegated Exception Interface (SDEI)"
>> +depends on ARM64
>>  help
>>The Software Delegated Exception Interface (SDEI) is an ARM
>>standard for registering callbacks from the platform firmware
>> diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
>> index 4b3c7fd99c34..3ea6a19ae439 100644
>> --- a/drivers/firmware/arm_sdei.c
>> +++ b/drivers/firmware/arm_sdei.c
>> @@ -154,6 +154,7 @@ int sdei_api_event_context(u32 query, u64 *result)
>>  return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_CONTEXT, query, 0, 0, 0, 0,
>>result);
>>  }
>> +NOKPROBE_SYMBOL(sdei_api_event_context);

> Should this be part of the patch introducing arm_sdei.c?

Yes.


>>  static int sdei_api_event_get_info(u32 event, u32 info, u64 *result)
>>  {
>> diff --git a/include/linux/sdei.h b/include/linux/sdei.h
>> index bb3dd000771e..df3fe6373e32 100644
>> --- a/include/linux/sdei.h
>> +++ b/include/linux/sdei.h
>> @@ -32,6 +32,8 @@ enum sdei_conduit_types {
>>  CONDUIT_HVC,
>>  };
>>  
>> +#include 
>> +
>>  /* Arch code should override this to set the entry point from firmware... */
>>  #ifndef sdei_arch_get_entry_point
>>  #define

Re: [PATCH v3 24/28] arm64/sve: KVM: Hide SVE from CPU features exposed to guests

2017-10-17 Thread Dave Martin

On Tue, Oct 17, 2017 at 03:29:36PM +0100, Marc Zyngier wrote:
> On 17/10/17 15:07, Dave Martin wrote:
> > On Tue, Oct 17, 2017 at 06:58:16AM -0700, Christoffer Dall wrote:
> >> On Tue, Oct 10, 2017 at 07:38:41PM +0100, Dave Martin wrote:
> >>> KVM guests cannot currently use SVE, because SVE is always
> >>> configured to trap to EL2.
> >>>
> >>> However, a guest that sees SVE reported as present in
> >>> ID_AA64PFR0_EL1 may legitimately expect that SVE works and try to
> >>> use it.  Instead of working, the guest will receive an injected
> >>> undef exception, which may cause the guest to oops or go into a
> >>> spin.
> >>>
> >>> To avoid misleading the guest into believing that SVE will work,
> >>> this patch masks out the SVE field from ID_AA64PFR0_EL1 when a
> >>> guest attempts to read this register.  No support is explicitly
> >>> added for ID_AA64ZFR0_EL1 either, so that is still emulated as
> >>> reading as zero, which is consistent with SVE not being
> >>> implemented.
> >>>
> >>> This is a temporary measure, and will be removed in a later series
> >>> when full KVM support for SVE is implemented.
> >>>
> >>> Signed-off-by: Dave Martin 
> >>> Reviewed-by: Alex Bennée 
> >>> Cc: Marc Zyngier 
> >>> ---
> >>>  arch/arm64/kvm/sys_regs.c | 12 +++-
> >>>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >>> index b1f7552..a0ee9b0 100644
> >>> --- a/arch/arm64/kvm/sys_regs.c
> >>> +++ b/arch/arm64/kvm/sys_regs.c
> >>> @@ -23,6 +23,7 @@
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>> +#include 
> >>>  #include 
> >>>  
> >>>  #include 
> >>> @@ -897,8 +898,17 @@ static u64 read_id_reg(struct sys_reg_desc const *r, 
> >>> bool raz)
> >>>  {
> >>>   u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
> >>>(u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> >>> + u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
> >>>  
> >>> - return raz ? 0 : read_sanitised_ftr_reg(id);
> >>> + if (id == SYS_ID_AA64PFR0_EL1) {
> >>> + if (val & (0xfUL << ID_AA64PFR0_SVE_SHIFT))
> >>> + pr_err_once("kvm [%i]: SVE unsupported for guests, 
> >>> suppressing\n",
> >>> + task_pid_nr(current));
> >>
> >> nit: does this really qualify as an error print?
> > 
> > I have no strong opinion on this: maz suggested I should add this --
> > his concern was to make it difficult to ignore.
> > 
> > This is transitional: the main purpose is to circumvent bug reports from
> > people who find that SVE doesn't work in their guests, in the interim
> > before proper KVM support lands upstream.
> > 
> > Marc, do you still agree with this position?
> 
> As long as this is transitional, I'm OK with this.

No argument from me, since it was your request in the first place ;)

Christoffer?

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads

2017-10-17 Thread Will Deacon

Hi Christoffer,

On Sat, Sep 23, 2017 at 02:41:49AM +0200, Christoffer Dall wrote:
> Using the physical counter allows KVM to retain the offset between the
> virtual and physical counter as long as it is actively running a VCPU.
> 
> As soon as a VCPU is released, another thread is scheduled or we start
> running userspace applications, we reset the offset to 0, so that
> userspace accessing the virtual timer can still read the cirtual counter
> and get the same view of time as the kernel.
> 
> This opens up potential improvements for KVM performance.
> 
> VHE kernels or kernels continuing to use the virtual timer are
> unaffected.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/include/asm/arch_timer.h  | 9 -
>  drivers/clocksource/arm_arch_timer.c | 3 +--
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index a652ce0..1859a1c 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  
>  static inline u64 arch_counter_get_cntpct(void)
>  {
> - /*
> -  * AArch64 kernel and user space mandate the use of CNTVCT.
> -  */
> - BUG();
> - return 0;
> + u64 cval;
> + isb();
> + asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> + return cval;
>  }
>  
>  static inline u64 arch_counter_get_cntvct(void)
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index fd4b7f6..9b3322a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
>  
>   /* Register the CP15 based counter if we have one */
>   if (type & ARCH_TIMER_TYPE_CP15) {
> - if (IS_ENABLED(CONFIG_ARM64) ||
> - arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> + if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)

Please can you add an is_hyp_mode_available() check here, as you suggested
last time?

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521542.html

Without it, I worry that the kernel timekeeper will be out of sync with the
vDSO (which uses the virtual counter) on systems where CNTVOFF is
initialised to a consistent non-zero offset and Linux was loaded at EL1.

Thanks,

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2

2017-10-17 Thread gengdongjiu

Hi James,

> Hi gengdongjiu,
> 
> On 14/09/17 12:12, gengdongjiu wrote:
> > On 2017/9/8 0:31, James Morse wrote:
> >> KVM already handles external aborts from lower exception levels, no
> >> more work needs doing for TEA.
> > If it is firmware first solution, that is SCR_EL3.EA=1, all SError
> > interrupt and synchronous External Abort exceptions are taken to EL3, so 
> > EL3 firmware will handle it, KVM no needs to handle it.
> >
> > HCR_EL3.TEA is only for EL3 to check its value to decide to jump to 
> > hypervisor or kernel.
> >
> >>
> >> What happens when a guest access the RAS-Error-Record registers?
> >>
> >> Before we can set HCR_EL2.TERR I think we need to add some minimal
> >> emulation for the registers it traps. Most of them should be RAZ/WI,
> >> so it should be straightforward. (I think KVMs default is to emulate an 
> >> undef for unknown traps).
> 
> > Today I added the support to do some minimal emulation for
> > RAS-Error-Record registers, thanks for the good suggestion.
> 
> Where can I find this patch?
> I'd like to repost it as part of the SError_rework/RAS/IESB series: this is 
> one of the bits KVM needs but I didn't touch as it looks like your
> updated version of this patch should cover it.

I have updated my series patches to a new version, and have sent it out, you 
can find it. Thank you very much.

> 
> 
> Thanks,
> 
> James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults

On 17/10/17 15:46, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 12:22:08PM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:08, Christoffer Dall wrote:
>>> On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
 So far, we loose the Exec property whenever we take permission
 faults, as we always reconstruct the PTE/PMD from scratch. This
 can be counter productive as we can end-up with the following
 fault sequence:

X -> RO -> ROX -> RW -> RWX

 Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
 new entry if it was already cleared in the old one, leadig to a much
 nicer fault sequence:

X -> ROX -> RWX

 Signed-off-by: Marc Zyngier 
 ---
  arch/arm/include/asm/kvm_mmu.h   | 10 ++
  arch/arm64/include/asm/kvm_mmu.h | 10 ++
  virt/kvm/arm/mmu.c   | 25 +
  3 files changed, 45 insertions(+)

 diff --git a/arch/arm/include/asm/kvm_mmu.h 
 b/arch/arm/include/asm/kvm_mmu.h
 index bf76150aad5f..ad442d86c23e 100644
 --- a/arch/arm/include/asm/kvm_mmu.h
 +++ b/arch/arm/include/asm/kvm_mmu.h
 @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
  }
  
 +static inline bool kvm_s2pte_exec(pte_t *pte)
 +{
 +  return !(pte_val(*pte) & L_PTE_XN);
 +}
 +
  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
  {
pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
 @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
  }
  
 +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
 +{
 +  return !(pmd_val(*pmd) & PMD_SECT_XN);
 +}
 +
  static inline bool kvm_page_empty(void *ptr)
  {
struct page *ptr_page = virt_to_page(ptr);
 diff --git a/arch/arm64/include/asm/kvm_mmu.h 
 b/arch/arm64/include/asm/kvm_mmu.h
 index 60c420a5ac0d..e7af74b8b51a 100644
 --- a/arch/arm64/include/asm/kvm_mmu.h
 +++ b/arch/arm64/include/asm/kvm_mmu.h
 @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
  }
  
 +static inline bool kvm_s2pte_exec(pte_t *pte)
 +{
 +  return !(pte_val(*pte) & PTE_S2_XN);
 +}
 +
  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
  {
kvm_set_s2pte_readonly((pte_t *)pmd);
 @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
return kvm_s2pte_readonly((pte_t *)pmd);
  }
  
 +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
 +{
 +  return !(pmd_val(*pmd) & PMD_S2_XN);
 +}
 +
  static inline bool kvm_page_empty(void *ptr)
  {
struct page *ptr_page = virt_to_page(ptr);
 diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
 index 1911fadde88b..ccc6106764a6 100644
 --- a/virt/kvm/arm/mmu.c
 +++ b/virt/kvm/arm/mmu.c
 @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, 
 struct kvm_mmu_memory_cache
return 0;
  }
  
 +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
 +{
 +  pmd_t *pmdp;
 +
 +  pmdp = stage2_get_pmd(kvm, NULL, addr);
 +  if (!pmdp || pmd_none(*pmdp))
 +  return NULL;
 +
 +  return pte_offset_kernel(pmdp, addr);
 +}
 +
>>>
>>> nit, couldn't you change this to be
>>>
>>> stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
>>>
>>> Which, if the pmd is a section mapping just checks that, and if we find
>>> a pte, we check that, and then we can have a simpler one-line call and
>>> check from both the pte and pmd paths below?
>>
>> Yes, that's pretty neat. I've folded that in.
>>
>>>
  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache 
 *cache,
  phys_addr_t addr, const pte_t *new_pte,
  unsigned long flags)
 @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
 phys_addr_t fault_ipa,
if (exec_fault) {
new_pmd = kvm_s2pmd_mkexec(new_pmd);
coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
 +  } else if (fault_status == FSC_PERM) {
 +  /* Preserve execute if XN was already cleared */
 +  pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
 +
 +  if (old_pmdp && pmd_present(*old_pmdp) &&
 +  kvm_s2pmd_exec(old_pmdp))
 +  new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>>
>>> Is the reverse case not also possible then?  That is, if we have an
>>> exec_fault, we could check if the entry is already writable and

Re: [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault

On 17/10/17 15:36, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 10:34:15AM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:08, Christoffer Dall wrote:
>>> On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
 The only case where we actually need to perform a dache maintenance
 is when we map the page for the first time, and subsequent permission
 faults do not require cache maintenance. Let's make it conditional
 on not being a permission fault (and thus a translation fault).
>>>
>>> Why do we actually need to do any dcache maintenance when faulting in a
>>> page?
>>>
>>> Is this for the case when the stage 1 MMU is disabled, or to support
>>> guest mappings using uncached attributes?
>>
>> These are indeed the two cases that require cleaning the dcache to PoC.
>>
>>> Can we do better, for example
>>> by only flushing the cache if the guest MMU is disabled?
>>
>> The guest MMU being disabled is easy. But the uncached mapping is much
>> trickier, and would involve parsing the guest page tables. Not something
>> I'm really eager to implement.
>>
> 
> Hmm, if the guest actually maps memory uncached, wouldn't it have to
> invalidate caches itself, or is this the annoying thing where disabling
> the MMU on hardware that doesn't have stage 2 would in fact always
> completely bypass the cache, and therefore we have to do this work?

The architecture is massively ambiguous about what is actually required
in terms of CMOs when using an uncached mapping (and whether you can hit
in the cache or not).

But even if the guest had done an invalidate in order not to hit in the
cache, the host could have evicted the page to disk, the guest faulted
the page back in, and it would still be the host's responsibility to
ensure that the guest sees something consistent.

> Sorry, I have forgotten all the details here, but wanted to make sure
> we're not being overly careful.

No worries, I'm always happy to revisit this particular rabbit hole... ;-)

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC

On Tue, Oct 17, 2017 at 01:40:00PM +0100, Marc Zyngier wrote:
> On 16/10/17 21:06, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
> >> We currently have no less than three implementations for the
> >> "flush to PoC" code. Let standardize on a single one. This
> >> requires a bit of unpleasant moving around, and relies on
> >> __kvm_flush_dcache_pte and co being #defines so that they can
> >> call into coherent_dcache_guest_page...
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h | 28 
> >>  virt/kvm/arm/mmu.c | 20 ++--
> >>  2 files changed, 14 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h 
> >> b/arch/arm/include/asm/kvm_mmu.h
> >> index 5f1ac88a5951..011b0db85c02 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -235,31 +235,11 @@ static inline void 
> >> __coherent_icache_guest_page(kvm_pfn_t pfn,
> >>}
> >>  }
> >>  
> >> -static inline void __kvm_flush_dcache_pte(pte_t pte)
> >> -{
> >> -  void *va = kmap_atomic(pte_page(pte));
> >> -
> >> -  kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >> -
> >> -  kunmap_atomic(va);
> >> -}
> >> -
> >> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
> >> -{
> >> -  unsigned long size = PMD_SIZE;
> >> -  kvm_pfn_t pfn = pmd_pfn(pmd);
> >> -
> >> -  while (size) {
> >> -  void *va = kmap_atomic_pfn(pfn);
> >> +#define __kvm_flush_dcache_pte(p) \
> >> +  coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
> >>  
> >> -  kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >> -
> >> -  pfn++;
> >> -  size -= PAGE_SIZE;
> >> -
> >> -  kunmap_atomic(va);
> >> -  }
> >> -}
> >> +#define __kvm_flush_dcache_pmd(p) \
> >> +  coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
> > 
> > Why can't these just be static inlines which call
> > __coherent_dcache_guest_page already in the header file directly?
> 
> Because if we do that, we get a significant code expansion in the
> resulting binary (all the call sites end up having a copy of that function.
> 
> > I'm really not too crazy about these #defines.
> 
> Neither am I. But actually, this patch is completely wrong. Using the
> same functions as the guest cleaning doesn't provide the guarantees
> documented next to unmap_stage2_ptes, as we need a clean+invalidate, not
> just a clean.
> 
> I'll rework this patch (or just drop it).
> 
> > In fact, why do we need the coherent_Xcache_guest_page static
> > indirection functions in mmu.c in the first place?
> 
> Code expansion. That's the only reason.
> 
Then maybe a reworked patch needs a function defined in some
arch-specific object file that we can just call.  The functions don't
look that complicated to me, but I suppose if they inline the things
they call, it could become a bit hairy.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults

On Tue, Oct 17, 2017 at 12:22:08PM +0100, Marc Zyngier wrote:
> On 16/10/17 21:08, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
> >> So far, we loose the Exec property whenever we take permission
> >> faults, as we always reconstruct the PTE/PMD from scratch. This
> >> can be counter productive as we can end-up with the following
> >> fault sequence:
> >>
> >>X -> RO -> ROX -> RW -> RWX
> >>
> >> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
> >> new entry if it was already cleared in the old one, leadig to a much
> >> nicer fault sequence:
> >>
> >>X -> ROX -> RWX
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h   | 10 ++
> >>  arch/arm64/include/asm/kvm_mmu.h | 10 ++
> >>  virt/kvm/arm/mmu.c   | 25 +
> >>  3 files changed, 45 insertions(+)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h 
> >> b/arch/arm/include/asm/kvm_mmu.h
> >> index bf76150aad5f..ad442d86c23e 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
> >>return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pte_exec(pte_t *pte)
> >> +{
> >> +  return !(pte_val(*pte) & L_PTE_XN);
> >> +}
> >> +
> >>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> >>  {
> >>pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> >> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> >>return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> >> +{
> >> +  return !(pmd_val(*pmd) & PMD_SECT_XN);
> >> +}
> >> +
> >>  static inline bool kvm_page_empty(void *ptr)
> >>  {
> >>struct page *ptr_page = virt_to_page(ptr);
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> >> b/arch/arm64/include/asm/kvm_mmu.h
> >> index 60c420a5ac0d..e7af74b8b51a 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
> >>return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pte_exec(pte_t *pte)
> >> +{
> >> +  return !(pte_val(*pte) & PTE_S2_XN);
> >> +}
> >> +
> >>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> >>  {
> >>kvm_set_s2pte_readonly((pte_t *)pmd);
> >> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> >>return kvm_s2pte_readonly((pte_t *)pmd);
> >>  }
> >>  
> >> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> >> +{
> >> +  return !(pmd_val(*pmd) & PMD_S2_XN);
> >> +}
> >> +
> >>  static inline bool kvm_page_empty(void *ptr)
> >>  {
> >>struct page *ptr_page = virt_to_page(ptr);
> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> index 1911fadde88b..ccc6106764a6 100644
> >> --- a/virt/kvm/arm/mmu.c
> >> +++ b/virt/kvm/arm/mmu.c
> >> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, 
> >> struct kvm_mmu_memory_cache
> >>return 0;
> >>  }
> >>  
> >> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
> >> +{
> >> +  pmd_t *pmdp;
> >> +
> >> +  pmdp = stage2_get_pmd(kvm, NULL, addr);
> >> +  if (!pmdp || pmd_none(*pmdp))
> >> +  return NULL;
> >> +
> >> +  return pte_offset_kernel(pmdp, addr);
> >> +}
> >> +
> > 
> > nit, couldn't you change this to be
> > 
> > stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> > 
> > Which, if the pmd is a section mapping just checks that, and if we find
> > a pte, we check that, and then we can have a simpler one-line call and
> > check from both the pte and pmd paths below?
> 
> Yes, that's pretty neat. I've folded that in.
> 
> > 
> >>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache 
> >> *cache,
> >>  phys_addr_t addr, const pte_t *new_pte,
> >>  unsigned long flags)
> >> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >> phys_addr_t fault_ipa,
> >>if (exec_fault) {
> >>new_pmd = kvm_s2pmd_mkexec(new_pmd);
> >>coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> >> +  } else if (fault_status == FSC_PERM) {
> >> +  /* Preserve execute if XN was already cleared */
> >> +  pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
> >> +
> >> +  if (old_pmdp && pmd_present(*old_pmdp) &&
> >> +  kvm_s2pmd_exec(old_pmdp))
> >> +  new_pmd = kvm_s2pmd_mkexec(new_pmd);
> > 
> > Is the reverse case not also possible then?  That is, if we have an
> > exec_fault, we could check if the entry is already writable and maintain
> > the property as well.  Not sure how

Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing

On 17/10/17 15:28, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 09:57:34AM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:07, Christoffer Dall wrote>>> unrelated: I went and read the 
>> comment in __kvm_tlb_flush_vmid_ipa, and
>>> I don't really understand why there is only a need to flush the icache
>>> if the host is running at EL1.
>>>
>>> The text seems to describe the problem of remapping executable pages
>>> within the guest.  That seems to me would require icache maintenance of
>>> the page that gets overwritten with new code, regardless of whether the
>>> host runs at EL1 or EL2.
>>>
>>> Of course it's easier done on VHE because we don't have to take a trap,
>>> but the code seems to not invalidate the icache at all for VHE systems
>>> that have VPIPT.  I'm confused.  Can you help?
>>
>> [+ Will, as he wrote that code and can reply if I say something stupid]
>>
>> Here's the trick: The VMID-tagged aspect of VPIPT only applies if the
>> CMO is used at EL0 or EL1. When used at EL2, it behaves exactly like a
>> VPIPT operation (see D4.10.2 in the ARMv8 ARM version B_b).
>>
>> So in the end, we deal with VPIPT the following way:
>>
>> - Without VHE, we perform the icache invalidation on unmap, blatting the
>> whole icache.
> 
> ok, but why can't we do the invalidation by jumping to EL2 like we do
> for some of the other CMOs ?

I don't think we have any other CMO requiring jumping to EL2. VPIPT
handling is the only one so far. I think that's why Will coupled it to
the TLB invalidation (we're already at EL2 for that).

Now, we could easily change that to follow the same flow as the rest of
the code. It only takes a separate entry point.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault

On Tue, Oct 17, 2017 at 10:34:15AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:08, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
> >> The only case where we actually need to perform a dache maintenance
> >> is when we map the page for the first time, and subsequent permission
> >> faults do not require cache maintenance. Let's make it conditional
> >> on not being a permission fault (and thus a translation fault).
> > 
> > Why do we actually need to do any dcache maintenance when faulting in a
> > page?
> > 
> > Is this for the case when the stage 1 MMU is disabled, or to support
> > guest mappings using uncached attributes?
> 
> These are indeed the two cases that require cleaning the dcache to PoC.
> 
> > Can we do better, for example
> > by only flushing the cache if the guest MMU is disabled?
> 
> The guest MMU being disabled is easy. But the uncached mapping is much
> trickier, and would involve parsing the guest page tables. Not something
> I'm really eager to implement.
> 

Hmm, if the guest actually maps memory uncached, wouldn't it have to
invalidate caches itself, or is this the annoying thing where disabling
the MMU on hardware that doesn't have stage 2 would in fact always
completely bypass the cache, and therefore we have to do this work?

Sorry, I have forgotten all the details here, but wanted to make sure
we're not being overly careful.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 22/28] arm64/sve: KVM: Prevent guests from using SVE

2017-10-17 Thread Dave Martin

On Tue, Oct 17, 2017 at 01:50:24PM +0200, Christoffer Dall wrote:
> On Tue, Oct 10, 2017 at 07:38:39PM +0100, Dave Martin wrote:
> > Until KVM has full SVE support, guests must not be allowed to
> > execute SVE instructions.
> > 
> > This patch enables the necessary traps, and also ensures that the
> > traps are disabled again on exit from the guest so that the host
> > can still use SVE if it wants to.
> > 
> > This patch introduces another instance of
> > __this_cpu_write(fpsimd_last_state, NULL), so this flush operation
> > is abstracted out as a separate helper fpsimd_flush_cpu_state().
> > Other instances are ported appropriately.
> 
> I don't understand this paragraph, beginning from ", so this...".
> 
> 
> From reading the code, what I think is the reason for having to flush
> the SVE state (and mark the host state invalid) is that even though we
> disallow SVE usage in the guest, the guest can use the normal FP state,
> and while we always fully preserve the host state, this could still
> corrupt some additional SVE state not properly preserved for the host.
> Is that correct?

Yes, that's right: the guest can't touch the SVE-specific registers
Pn/FFR, but FPSIMD accesses to Vn regs cause the high bits of the
corresponding SVE Zn registers to be clobbered.  In any case, the
FPSIMD restore done by KVM after guest exit is sufficient to clobber
those bits even if the guest didn't do it.

This is a band-aid for not making the KVM world switch code properly
SVE-aware yet.

Does the following wording sound better:

--8<--

On guest exit, high bits of the SVE Zn registers may have been
clobbered as a side-effect the execution of FPSIMD instructions in
the guest.  The existing KVM host FPSIMD restore code is not
sufficient to restore these bits, so this patch explicitly marks
the CPU as not containing cached vector state for any task, this
forcing a reload on the next return to userspace.  This is an
interim measure, in advance of adding full SVE awareness to KVM.

Because of the duplication of this operation
(__this_cpu_write(fpsimd_last_state, NULL)), it is factored out as
a new helper fpsimd_flush_cpu_state() to make the purpose clearer.

-->8--

> > 
> > As a side effect of this refactoring, a this_cpu_write() in
> > fpsimd_cpu_pm_notifier() is changed to __this_cpu_write().  This
> > should be fine, since cpu_pm_enter() is supposed to be called only
> > with interrupts disabled.
> 
> Otherwise the patch itself looks good to me.

Thanks, let me know about the above wording change though.

---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 24/28] arm64/sve: KVM: Hide SVE from CPU features exposed to guests

On 17/10/17 15:07, Dave Martin wrote:
> On Tue, Oct 17, 2017 at 06:58:16AM -0700, Christoffer Dall wrote:
>> On Tue, Oct 10, 2017 at 07:38:41PM +0100, Dave Martin wrote:
>>> KVM guests cannot currently use SVE, because SVE is always
>>> configured to trap to EL2.
>>>
>>> However, a guest that sees SVE reported as present in
>>> ID_AA64PFR0_EL1 may legitimately expect that SVE works and try to
>>> use it.  Instead of working, the guest will receive an injected
>>> undef exception, which may cause the guest to oops or go into a
>>> spin.
>>>
>>> To avoid misleading the guest into believing that SVE will work,
>>> this patch masks out the SVE field from ID_AA64PFR0_EL1 when a
>>> guest attempts to read this register.  No support is explicitly
>>> added for ID_AA64ZFR0_EL1 either, so that is still emulated as
>>> reading as zero, which is consistent with SVE not being
>>> implemented.
>>>
>>> This is a temporary measure, and will be removed in a later series
>>> when full KVM support for SVE is implemented.
>>>
>>> Signed-off-by: Dave Martin 
>>> Reviewed-by: Alex Bennée 
>>> Cc: Marc Zyngier 
>>> ---
>>>  arch/arm64/kvm/sys_regs.c | 12 +++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index b1f7552..a0ee9b0 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -23,6 +23,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  
>>>  #include 
>>> @@ -897,8 +898,17 @@ static u64 read_id_reg(struct sys_reg_desc const *r, 
>>> bool raz)
>>>  {
>>> u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
>>>  (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
>>> +   u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
>>>  
>>> -   return raz ? 0 : read_sanitised_ftr_reg(id);
>>> +   if (id == SYS_ID_AA64PFR0_EL1) {
>>> +   if (val & (0xfUL << ID_AA64PFR0_SVE_SHIFT))
>>> +   pr_err_once("kvm [%i]: SVE unsupported for guests, 
>>> suppressing\n",
>>> +   task_pid_nr(current));
>>
>> nit: does this really qualify as an error print?
> 
> I have no strong opinion on this: maz suggested I should add this --
> his concern was to make it difficult to ignore.
> 
> This is transitional: the main purpose is to circumvent bug reports from
> people who find that SVE doesn't work in their guests, in the interim
> before proper KVM support lands upstream.
> 
> Marc, do you still agree with this position?

As long as this is transitional, I'm OK with this.

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing

On Tue, Oct 17, 2017 at 09:57:34AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:07, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:23PM +0100, Marc Zyngier wrote:
> >> As we're about to introduce opportunistic invalidation of the icache,
> >> let's split dcache and icache flushing.
> > 
> > I'm a little confused abut the naming of these functions now,
> > because where I believe the current function ensures coherency between
> > the I-cache and D-cache (and overly so) if you just call one or the
> > other function after this change, what exactly is the coherency you get?
> 
> Yeah, in retrospect, this is a pretty stupid naming scheme. I guess I'll
> call them clean/invalidate, with the overarching caller still being
> called coherent_cache_guest for the time being.
> 

Sounds good.

> > 
> > 
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h   | 60 
> >> 
> >>  arch/arm64/include/asm/kvm_mmu.h | 13 +++--
> >>  virt/kvm/arm/mmu.c   | 20 ++
> >>  3 files changed, 67 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h 
> >> b/arch/arm/include/asm/kvm_mmu.h
> >> index fa6f2174276b..f553aa62d0c3 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct 
> >> kvm_vcpu *vcpu)
> >>return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
> >>  }
> >>  
> >> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >> - kvm_pfn_t pfn,
> >> - unsigned long size)
> >> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> >> +  kvm_pfn_t pfn,
> >> +  unsigned long size)
> >>  {
> >>/*
> >> -   * If we are going to insert an instruction page and the icache is
> >> -   * either VIPT or PIPT, there is a potential problem where the host
> >> -   * (or another VM) may have used the same page as this guest, and we
> >> -   * read incorrect data from the icache.  If we're using a PIPT cache,
> >> -   * we can invalidate just that page, but if we are using a VIPT cache
> >> -   * we need to invalidate the entire icache - damn shame - as written
> >> -   * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> >> -   *
> >> -   * VIVT caches are tagged using both the ASID and the VMID and doesn't
> >> -   * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> >> +   * Clean the dcache to the Point of Coherency.
> >> *
> >> * We need to do this through a kernel mapping (using the
> >> * user-space mapping has proved to be the wrong
> >> @@ -155,19 +146,52 @@ static inline void 
> >> __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >>  
> >>kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >>  
> >> -  if (icache_is_pipt())
> >> -  __cpuc_coherent_user_range((unsigned long)va,
> >> - (unsigned long)va + 
> >> PAGE_SIZE);
> >> -
> >>size -= PAGE_SIZE;
> >>pfn++;
> >>  
> >>kunmap_atomic(va);
> >>}
> >> +}
> >>  
> >> -  if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> >> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >> +  kvm_pfn_t pfn,
> >> +  unsigned long size)
> >> +{
> >> +  /*
> >> +   * If we are going to insert an instruction page and the icache is
> >> +   * either VIPT or PIPT, there is a potential problem where the host
> >> +   * (or another VM) may have used the same page as this guest, and we
> >> +   * read incorrect data from the icache.  If we're using a PIPT cache,
> >> +   * we can invalidate just that page, but if we are using a VIPT cache
> >> +   * we need to invalidate the entire icache - damn shame - as written
> >> +   * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> >> +   *
> >> +   * VIVT caches are tagged using both the ASID and the VMID and doesn't
> >> +   * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> >> +   */
> >> +
> >> +  VM_BUG_ON(size & ~PAGE_MASK);
> >> +
> >> +  if (icache_is_vivt_asid_tagged())
> >> +  return;
> >> +
> >> +  if (!icache_is_pipt()) {
> >>/* any kind of VIPT cache */
> >>__flush_icache_all();
> >> +  return;
> >> +  }
> >> +
> >> +  /* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> >> +  while (size) {
> >> +  void *va = kmap_atomic_pfn(pfn);
> >> +
> >> +  __cpuc_coherent_user_range((unsigned long)va,
> >> + (unsigned long)va + PAGE_SIZE);
> >> +
> >> +  size -= PAGE_SIZE;
> >> +  pfn++;
> >> +
> >> +  kunmap_atomic(va);

Re: [PATCH v3 02/28] arm64: KVM: Hide unsupported AArch64 CPU features from guests

On 17/10/17 14:51, Christoffer Dall wrote:
> On Tue, Oct 10, 2017 at 07:38:19PM +0100, Dave Martin wrote:
>> Currently, a guest kernel sees the true CPU feature registers
>> (ID_*_EL1) when it reads them using MRS instructions.  This means
>> that the guest will observe features that are present in the
>> hardware but the host doesn't understand or doesn't provide support
>> for.  A guest may legimitately try to use such a feature as per the
>> architecture, but use of the feature may trap instead of working
>> normally, triggering undef injection into the guest.
>>
>> This is not a problem for the host, but the guest may go wrong when
>> running on newer hardware than the host knows about.
>>
>> This patch hides from guest VMs any AArch64-specific CPU features
>> that the host doesn't support, by exposing to the guest the
>> sanitised versions of the registers computed by the cpufeatures
>> framework, instead of the true hardware registers.  To achieve
>> this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation
>> code is added to KVM to report the sanitised versions of the
>> affected registers in response to MRS and register reads from
>> userspace.
>>
>> The affected registers are removed from invariant_sys_regs[] (since
>> the invariant_sys_regs handling is no longer quite correct for
>> them) and added to sys_reg_desgs[], with appropriate access(),
>> get_user() and set_user() methods.  No runtime vcpu storage is
>> allocated for the registers: instead, they are read on demand from
>> the cpufeatures framework.  This may need modification in the
>> future if there is a need for userspace to customise the features
>> visible to the guest.
>>
>> Attempts by userspace to write the registers are handled similarly
>> to the current invariant_sys_regs handling: writes are permitted,
>> but only if they don't attempt to change the value.  This is
>> sufficient to support VM snapshot/restore from userspace.
>>
>> Because of the additional registers, restoring a VM on an older
>> kernel may not work unless userspace knows how to handle the extra
>> VM registers exposed to the KVM user ABI by this patch.
>>
>> Under the principle of least damage, this patch makes no attempt to
>> handle any of the other registers currently in
>> invariant_sys_regs[], or to emulate registers for AArch32: however,
>> these could be handled in a similar way in future, as necessary.
>>
>> Signed-off-by: Dave Martin 
>> Cc: Marc Zyngier 
>> ---
>>  arch/arm64/include/asm/sysreg.h |   3 +
>>  arch/arm64/kvm/hyp/switch.c |   6 +
>>  arch/arm64/kvm/sys_regs.c   | 282 
>> +---
>>  3 files changed, 246 insertions(+), 45 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index f707fed..480ecd6 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -149,6 +149,9 @@
>>  #define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0)
>>  #define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1)
>>  
>> +#define SYS_ID_AA64AFR0_EL1 sys_reg(3, 0, 0, 5, 4)
>> +#define SYS_ID_AA64AFR1_EL1 sys_reg(3, 0, 0, 5, 5)
>> +
>>  #define SYS_ID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>>  #define SYS_ID_AA64ISAR1_EL1sys_reg(3, 0, 0, 6, 1)
>>  
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index 945e79c..35a90b8 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -81,11 +81,17 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
>> *vcpu)
>>   * it will cause an exception.
>>   */
>>  val = vcpu->arch.hcr_el2;
>> +
>>  if (!(val & HCR_RW) && system_supports_fpsimd()) {
>>  write_sysreg(1 << 30, fpexc32_el2);
>>  isb();
>>  }
>> +
>> +if (val & HCR_RW) /* for AArch64 only: */
>> +val |= HCR_TID3; /* TID3: trap feature register accesses */
>> +
> 
> Since we're setting this for all 64-bit VMs, can we not set this in
> vcpu_reset_hcr instead?
> 
>>  write_sysreg(val, hcr_el2);
>> +
>>  /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
>>  write_sysreg(1 << 15, hstr_el2);
>>  /*
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 2e070d3..b1f7552 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -892,6 +892,137 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>>  return true;
>>  }
>>  
>> +/* Read a sanitised cpufeature ID register by sys_reg_desc */
>> +static u64 read_id_reg(struct sys_reg_desc const *r, bool raz)
>> +{
>> +u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
>> + (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
>> +
>> +return raz ? 0 : read_sanitised_ftr_reg(id);
>> +}
>> +
>> +/* cpufeature ID register access trap handlers */
>> +
>> +static bool __access_id_reg(struct kvm_vcpu *vcpu,

Re: [PATCH v3 24/28] arm64/sve: KVM: Hide SVE from CPU features exposed to guests

2017-10-17 Thread Dave Martin

On Tue, Oct 17, 2017 at 06:58:16AM -0700, Christoffer Dall wrote:
> On Tue, Oct 10, 2017 at 07:38:41PM +0100, Dave Martin wrote:
> > KVM guests cannot currently use SVE, because SVE is always
> > configured to trap to EL2.
> > 
> > However, a guest that sees SVE reported as present in
> > ID_AA64PFR0_EL1 may legitimately expect that SVE works and try to
> > use it.  Instead of working, the guest will receive an injected
> > undef exception, which may cause the guest to oops or go into a
> > spin.
> > 
> > To avoid misleading the guest into believing that SVE will work,
> > this patch masks out the SVE field from ID_AA64PFR0_EL1 when a
> > guest attempts to read this register.  No support is explicitly
> > added for ID_AA64ZFR0_EL1 either, so that is still emulated as
> > reading as zero, which is consistent with SVE not being
> > implemented.
> > 
> > This is a temporary measure, and will be removed in a later series
> > when full KVM support for SVE is implemented.
> > 
> > Signed-off-by: Dave Martin 
> > Reviewed-by: Alex Bennée 
> > Cc: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/sys_regs.c | 12 +++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index b1f7552..a0ee9b0 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -23,6 +23,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  #include 
> > @@ -897,8 +898,17 @@ static u64 read_id_reg(struct sys_reg_desc const *r, 
> > bool raz)
> >  {
> > u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
> >  (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> > +   u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
> >  
> > -   return raz ? 0 : read_sanitised_ftr_reg(id);
> > +   if (id == SYS_ID_AA64PFR0_EL1) {
> > +   if (val & (0xfUL << ID_AA64PFR0_SVE_SHIFT))
> > +   pr_err_once("kvm [%i]: SVE unsupported for guests, 
> > suppressing\n",
> > +   task_pid_nr(current));
> 
> nit: does this really qualify as an error print?

I have no strong opinion on this: maz suggested I should add this --
his concern was to make it difficult to ignore.

This is transitional: the main purpose is to circumvent bug reports from
people who find that SVE doesn't work in their guests, in the interim
before proper KVM support lands upstream.

Marc, do you still agree with this position?

> > +
> > +   val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
> > +   }
> > +
> > +   return val;
> >  }
> >  
> >  /* cpufeature ID register access trap handlers */
> > -- 
> > 2.1.4
> > 
> Otherwise:
> 
> Acked-by: Christoffer Dall 

Thanks -- I'll wait for Marc's response before applying.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-17 Thread gengdongjiu

Hi Christoffer

On 2017/10/17 3:59, Christoffer Dall wrote:
> On Mon, Oct 16, 2017 at 04:10:01PM +, gengdongjiu wrote:
>> Hi Marc,
>>
>>>
>>> Please also update the 32bit code accordingly, as it looks broken too.
>>
>> I have updated the 32 bit code according, in my hand, there is no arm32 host 
>> environment,
>> So there is no method to verify it in the arm32 host, only verify the patch 
>> in the arm64 host.
>>
>> Anyway I firstly send the patch out for review. Thanks.
>>
> In this case, if you just clearly specify in the patches you send out
> that the 32-bit one is untested, you can ask someone to test it for you.
> 

Thanks for your reminder, today I found a arm32 board and have tested it.
So I have tested in both arm32 and arm64. please review it. thanks.


> Thanks,
> -Christoffer
> 
> .
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 23/28] arm64/sve: KVM: Treat guest SVE use as undefined instruction execution

On Tue, Oct 10, 2017 at 07:38:40PM +0100, Dave Martin wrote:
> When trapping forbidden attempts by a guest to use SVE, we want the
> guest to see a trap consistent with SVE not being implemented.
> 
> This patch injects an undefined instruction exception into the
> guest in response to such an exception.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Alex Bennée 

Acked-by: Christoffer Dall 

> ---
>  arch/arm64/kvm/handle_exit.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 7debb74..b712479 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -147,6 +147,13 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   return 1;
>  }
>  
> +static int handle_sve(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> + /* Until SVE is supported for guests: */
> + kvm_inject_undefined(vcpu);
> + return 1;
> +}
> +
>  static exit_handle_fn arm_exit_handlers[] = {
>   [0 ... ESR_ELx_EC_MAX]  = kvm_handle_unknown_ec,
>   [ESR_ELx_EC_WFx]= kvm_handle_wfx,
> @@ -160,6 +167,7 @@ static exit_handle_fn arm_exit_handlers[] = {
>   [ESR_ELx_EC_HVC64]  = handle_hvc,
>   [ESR_ELx_EC_SMC64]  = handle_smc,
>   [ESR_ELx_EC_SYS64]  = kvm_handle_sys_reg,
> + [ESR_ELx_EC_SVE]= handle_sve,
>   [ESR_ELx_EC_IABT_LOW]   = kvm_handle_guest_abort,
>   [ESR_ELx_EC_DABT_LOW]   = kvm_handle_guest_abort,
>   [ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
> -- 
> 2.1.4
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH RESEND v2] arm/arm64: KVM: set right LR register value for 32 bit guest when inject abort

When a exception is trapped to EL2, hardware uses  ELR_ELx to hold
the current fault instruction address. If KVM wants to inject a
abort to 32 bit guest, it needs to set the LR register for the
guest to emulate this abort happened in the guest. Because ARM32
architecture is pipelined execution, so the LR value has an offset to
the fault instruction address.

The offsets applied to Link value for exceptions as shown below,
which should be added for the ARM32 link register(LR).

Table taken from ARMv8 ARM DDI0487B-B, table G1-10:
Exception   Offset, for PE state of:
A32   T32
Undefined Instruction   +4+2
Prefetch Abort  +4+4
Data Abort  +8+8
IRQ or FIQ  +4+4

Signed-off-by: Dongjiu Geng 
Tested-by: Haibin Zhang 

---
Have tested in both arm32 and arm64

For example, in the arm64 platform, to the undefined instruction injection:

1. Guest OS call SMC(Secure Monitor Call) instruction in the address
0xc025405c, then Guest traps to hypervisor

c0254050:   e59d5028ldr r5, [sp, #40]   ; 0x28
c0254054:   e3a03001mov r3, #1
c0254058:   e1a01003mov r1, r3
c025405c:   e1600070smc 0
c0254060:   e30a0270movwr0, #41584  ; 0xa270
c0254064:   e34c00bfmovtr0, #49343  ; 0xc0bf

2. KVM  injects undefined abort to guest
3. We will find the fault PC is 0xc0254058, not 0xc025405c.

[   12.348072] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
[   12.349786] Modules linked in:
[   12.350563] CPU: 1 PID: 71 Comm: cat Not tainted 4.1.0-dirty #25
[   12.352061] Hardware name: Generic DT based system
[   12.353275] task: d9d08000 ti: d9cfe000 task.ti: d9cfe000
[   12.354637] PC is at proc_dointvec+0x20/0x60
[   12.355717] LR is at proc_sys_call_handler+0xb0/0xc4
[   12.356972] pc : []lr : []psr: a0060013
[   12.356972] sp : d9cffe90  ip : c0254038  fp : 0001
[   12.359824] r10: d9cfff80  r9 : 0004  r8 : 
[   12.361132] r7 : bec21cb0  r6 : d9cffec4  r5 : d9cfff80  r4 : c0e82de0
[   12.362766] r3 : 0001  r2 : bec21cb0  r1 : 0001  r0 : c0e82de0
[   12.364400] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   12.366183] Control: 10c5383d  Table: 59d3406a  DAC: 0015
[   12.367623] Process cat (pid: 71, stack limit = 0xd9cfe220)

4. After correct the LR register, it will have right value

[  125.763370] Internal error: Oops - undefined instruction: 0 [#2] SMP ARM
[  125.767010] Modules linked in:
[  125.768472] CPU: 1 PID: 74 Comm: cat Tainted: G  D 4.1.0-dirty 
#25
[  125.771854] Hardware name: Generic DT based system
[  125.774053] task: db0bb900 ti: d9d1 task.ti: d9d1
[  125.776821] PC is at proc_dointvec+0x24/0x60
[  125.778919] LR is at proc_sys_call_handler+0xb0/0xc4
[  125.781269] pc : []lr : []psr: a0060013
[  125.781269] sp : d9d11e90  ip : c0254038  fp : 0001
[  125.786581] r10: d9d11f80  r9 : 0004  r8 : 
[  125.789673] r7 : be92ccb0  r6 : d9d11ec4  r5 : d9d11f80  r4 : c0e82de0
[  125.792828] r3 : 0001  r2 : be92ccb0  r1 : 0001  r0 : c0e82de0
[  125.795890] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user

For other exception injection, such as Data/Prefetch abort, also needs to 
correct
---
 arch/arm/kvm/emulate.c|  4 ++--
 arch/arm64/kvm/inject_fault.c | 16 +++-
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 0064b86..2419328 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -227,7 +227,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
u32 return_offset = (is_thumb) ? 2 : 4;
 
kvm_update_psr(vcpu, UND_MODE);
-   *vcpu_reg(vcpu, 14) = *vcpu_pc(vcpu) - return_offset;
+   *vcpu_reg(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
 
/* Branch to exception vector */
*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
@@ -242,7 +242,7 @@ static void inject_abt(struct kvm_vcpu *vcpu, bool is_pabt, 
unsigned long addr)
unsigned long cpsr = *vcpu_cpsr(vcpu);
bool is_thumb = (cpsr & PSR_T_BIT);
u32 vect_offset;
-   u32 return_offset = (is_thumb) ? 4 : 0;
+   u32 return_offset = (is_pabt) ? 4 : 8;
bool is_lpae;
 
kvm_update_psr(vcpu, ABT_MODE);
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index da6a8cf..0416f18 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -33,12 +33,26 @@
 #define LOWER_EL_AArch64_VECTOR0x400
 #define LOWER_EL_AArch32_VECTOR0x600
 
+/*
+ * Table taken from ARMv8 ARM DDI0487B-B, table G1-10.
+ */
+static u8 return_offsets[8][2] = {
+   [0] = { 0, 0 }, /* Reset, unused */
+   [1] = { 4, 2 }, /* Undefined */
+   [2] = { 0, 0 }, /*

Re: [PATCH v3 24/28] arm64/sve: KVM: Hide SVE from CPU features exposed to guests

On Tue, Oct 10, 2017 at 07:38:41PM +0100, Dave Martin wrote:
> KVM guests cannot currently use SVE, because SVE is always
> configured to trap to EL2.
> 
> However, a guest that sees SVE reported as present in
> ID_AA64PFR0_EL1 may legitimately expect that SVE works and try to
> use it.  Instead of working, the guest will receive an injected
> undef exception, which may cause the guest to oops or go into a
> spin.
> 
> To avoid misleading the guest into believing that SVE will work,
> this patch masks out the SVE field from ID_AA64PFR0_EL1 when a
> guest attempts to read this register.  No support is explicitly
> added for ID_AA64ZFR0_EL1 either, so that is still emulated as
> reading as zero, which is consistent with SVE not being
> implemented.
> 
> This is a temporary measure, and will be removed in a later series
> when full KVM support for SVE is implemented.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Alex Bennée 
> Cc: Marc Zyngier 
> ---
>  arch/arm64/kvm/sys_regs.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index b1f7552..a0ee9b0 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -897,8 +898,17 @@ static u64 read_id_reg(struct sys_reg_desc const *r, 
> bool raz)
>  {
>   u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
>(u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> + u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
>  
> - return raz ? 0 : read_sanitised_ftr_reg(id);
> + if (id == SYS_ID_AA64PFR0_EL1) {
> + if (val & (0xfUL << ID_AA64PFR0_SVE_SHIFT))
> + pr_err_once("kvm [%i]: SVE unsupported for guests, 
> suppressing\n",
> + task_pid_nr(current));

nit: does this really qualify as an error print?

> +
> + val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
> + }
> +
> + return val;
>  }
>  
>  /* cpufeature ID register access trap handlers */
> -- 
> 2.1.4
> 
Otherwise:

Acked-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 02/28] arm64: KVM: Hide unsupported AArch64 CPU features from guests

On Tue, Oct 10, 2017 at 07:38:19PM +0100, Dave Martin wrote:
> Currently, a guest kernel sees the true CPU feature registers
> (ID_*_EL1) when it reads them using MRS instructions.  This means
> that the guest will observe features that are present in the
> hardware but the host doesn't understand or doesn't provide support
> for.  A guest may legimitately try to use such a feature as per the
> architecture, but use of the feature may trap instead of working
> normally, triggering undef injection into the guest.
> 
> This is not a problem for the host, but the guest may go wrong when
> running on newer hardware than the host knows about.
> 
> This patch hides from guest VMs any AArch64-specific CPU features
> that the host doesn't support, by exposing to the guest the
> sanitised versions of the registers computed by the cpufeatures
> framework, instead of the true hardware registers.  To achieve
> this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation
> code is added to KVM to report the sanitised versions of the
> affected registers in response to MRS and register reads from
> userspace.
> 
> The affected registers are removed from invariant_sys_regs[] (since
> the invariant_sys_regs handling is no longer quite correct for
> them) and added to sys_reg_desgs[], with appropriate access(),
> get_user() and set_user() methods.  No runtime vcpu storage is
> allocated for the registers: instead, they are read on demand from
> the cpufeatures framework.  This may need modification in the
> future if there is a need for userspace to customise the features
> visible to the guest.
> 
> Attempts by userspace to write the registers are handled similarly
> to the current invariant_sys_regs handling: writes are permitted,
> but only if they don't attempt to change the value.  This is
> sufficient to support VM snapshot/restore from userspace.
> 
> Because of the additional registers, restoring a VM on an older
> kernel may not work unless userspace knows how to handle the extra
> VM registers exposed to the KVM user ABI by this patch.
> 
> Under the principle of least damage, this patch makes no attempt to
> handle any of the other registers currently in
> invariant_sys_regs[], or to emulate registers for AArch32: however,
> these could be handled in a similar way in future, as necessary.
> 
> Signed-off-by: Dave Martin 
> Cc: Marc Zyngier 
> ---
>  arch/arm64/include/asm/sysreg.h |   3 +
>  arch/arm64/kvm/hyp/switch.c |   6 +
>  arch/arm64/kvm/sys_regs.c   | 282 
> +---
>  3 files changed, 246 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index f707fed..480ecd6 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -149,6 +149,9 @@
>  #define SYS_ID_AA64DFR0_EL1  sys_reg(3, 0, 0, 5, 0)
>  #define SYS_ID_AA64DFR1_EL1  sys_reg(3, 0, 0, 5, 1)
>  
> +#define SYS_ID_AA64AFR0_EL1  sys_reg(3, 0, 0, 5, 4)
> +#define SYS_ID_AA64AFR1_EL1  sys_reg(3, 0, 0, 5, 5)
> +
>  #define SYS_ID_AA64ISAR0_EL1 sys_reg(3, 0, 0, 6, 0)
>  #define SYS_ID_AA64ISAR1_EL1 sys_reg(3, 0, 0, 6, 1)
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c..35a90b8 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -81,11 +81,17 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
> *vcpu)
>* it will cause an exception.
>*/
>   val = vcpu->arch.hcr_el2;
> +
>   if (!(val & HCR_RW) && system_supports_fpsimd()) {
>   write_sysreg(1 << 30, fpexc32_el2);
>   isb();
>   }
> +
> + if (val & HCR_RW) /* for AArch64 only: */
> + val |= HCR_TID3; /* TID3: trap feature register accesses */
> +

Since we're setting this for all 64-bit VMs, can we not set this in
vcpu_reset_hcr instead?

>   write_sysreg(val, hcr_el2);
> +
>   /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
>   write_sysreg(1 << 15, hstr_el2);
>   /*
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2e070d3..b1f7552 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -892,6 +892,137 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>   return true;
>  }
>  
> +/* Read a sanitised cpufeature ID register by sys_reg_desc */
> +static u64 read_id_reg(struct sys_reg_desc const *r, bool raz)
> +{
> + u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
> +  (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> +
> + return raz ? 0 : read_sanitised_ftr_reg(id);
> +}
> +
> +/* cpufeature ID register access trap handlers */
> +
> +static bool __access_id_reg(struct kvm_vcpu *vcpu,
> + struct sys_reg_params *p,
> + const struct sys_reg_desc *r,
> + bool raz)
>

[PATCH v7 1/4] arm64: kvm: route synchronous external abort exceptions to EL2

ARMv8.2 adds a new bit HCR_EL2.TEA which controls to
route synchronous external aborts to EL2, and adds a
trap control bit HCR_EL2.TERR which controls to
trap all Non-secure EL1&0 error record accesses to EL2.

This patch enables the two bits for the guest OS.
when an synchronous abort is generated in the guest OS,
it will trap to EL3 firmware, EL3 firmware will check the
HCR_EL2.TEA value to decide to jump to hypervisor or host
OS. Enabling HCR_EL2.TERR makes error record access
from guest trap to EL2.

Add some minimal emulation for RAS-Error-Record registers.
In the emulation, ERRIDR_EL1 and ERRSELR_EL1 are zero.
Then, the others ERX* registers are RAZ/WI.

Signed-off-by: Dongjiu Geng 
---
 arch/arm64/include/asm/kvm_arm.h |  2 ++
 arch/arm64/include/asm/kvm_emulate.h |  7 +++
 arch/arm64/include/asm/kvm_host.h|  2 ++
 arch/arm64/include/asm/sysreg.h  | 10 +
 arch/arm64/kvm/sys_regs.c| 40 
 5 files changed, 61 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 61d694c..1188272 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,8 @@
 #include 
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_TEA(UL(1) << 37)
+#define HCR_TERR   (UL(1) << 36)
 #define HCR_E2H(UL(1) << 34)
 #define HCR_ID (UL(1) << 33)
 #define HCR_CD (UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index fe39e68..47983db 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
if (is_kernel_in_hyp_mode())
vcpu->arch.hcr_el2 |= HCR_E2H;
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+   /* route synchronous external abort exceptions to EL2 */
+   vcpu->arch.hcr_el2 |= HCR_TEA;
+   /* trap error record accesses */
+   vcpu->arch.hcr_el2 |= HCR_TERR;
+   }
+
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
 }
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index d686300..af55b3bc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -105,6 +105,8 @@ enum vcpu_sysreg {
TTBR1_EL1,  /* Translation Table Base Register 1 */
TCR_EL1,/* Translation Control Register */
ESR_EL1,/* Exception Syndrome Register */
+   ERRIDR_EL1, /* Error Record ID Register */
+   ERRSELR_EL1,/* Error Record Select Register */
AFSR0_EL1,  /* Auxiliary Fault Status Register 0 */
AFSR1_EL1,  /* Auxiliary Fault Status Register 1 */
FAR_EL1,/* Fault Address Register */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 35b786b..bd11ca0 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -169,6 +169,16 @@
 #define SYS_AFSR0_EL1  sys_reg(3, 0, 5, 1, 0)
 #define SYS_AFSR1_EL1  sys_reg(3, 0, 5, 1, 1)
 #define SYS_ESR_EL1sys_reg(3, 0, 5, 2, 0)
+
+#define SYS_ERRIDR_EL1 sys_reg(3, 0, 5, 3, 0)
+#define SYS_ERRSELR_EL1sys_reg(3, 0, 5, 3, 1)
+#define SYS_ERXFR_EL1  sys_reg(3, 0, 5, 4, 0)
+#define SYS_ERXCTLR_EL1sys_reg(3, 0, 5, 4, 1)
+#define SYS_ERXSTATUS_EL1  sys_reg(3, 0, 5, 4, 2)
+#define SYS_ERXADDR_EL1sys_reg(3, 0, 5, 4, 3)
+#define SYS_ERXMISC0_EL1   sys_reg(3, 0, 5, 5, 0)
+#define SYS_ERXMISC1_EL1   sys_reg(3, 0, 5, 5, 1)
+
 #define SYS_FAR_EL1sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1sys_reg(3, 0, 7, 4, 0)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..a74617b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -775,6 +775,36 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+static bool access_error_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+const struct sys_reg_desc *r)
+{
+   /* accessing ERRIDR_EL1 */
+   if (r->CRm == 3 && r->Op2 == 0) {
+   if (p->is_write)
+   vcpu_sys_reg(vcpu, ERRIDR_EL1) = 0;
+
+   return trap_raz_wi(vcpu, p, r);
+   }
+
+   /* accessing ERRSELR_EL1 */
+   if (r->CRm == 3 && r->Op2 == 1) {
+   if (p->is_write)
+   vcpu_sys_reg(vcpu, ERRSELR_EL1) = 0;
+
+   return trap_raz_wi(vcpu, p, r);
+   }
+
+   /*
+* If

[PATCH v7 4/4] arm64: kvm: handle SEI notification for guest

when KVM received SError, it firstly classified the error
according to [ESR_ELx.AET]. If the SEI is Uncategorized or
IMPLEMENTATION DEFINED directly inject virtual SError with
IMPLEMENTATION DEFINED syndrome.

If the SError error is not propagated, then let host to handle
it. Because usually the address recorded by APEI table is not
accurate, so can not identify the address to hwpoison memory and
can not notify guest to do the recovery, so at the same time, let
user space specify a valid ESR and inject virtual SError.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 
---
 arch/arm64/include/asm/esr.h | 15 +
 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/include/asm/kvm_host.h|  6 
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kvm/handle_exit.c | 62 +---
 arch/arm64/mm/fault.c| 16 ++
 6 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 8cabd57..e5d23a8f 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -95,6 +95,7 @@
 #define ESR_ELx_FSC_ACCESS (0x08)
 #define ESR_ELx_FSC_FAULT  (0x04)
 #define ESR_ELx_FSC_PERM   (0x0C)
+#define ESR_ELx_FSC_SERROR (0x11)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV(UL(1) << 24)
@@ -107,6 +108,20 @@
 #define ESR_ELx_AR (UL(1) << 14)
 #define ESR_ELx_CM (UL(1) << 8)
 
+/* ISS field definitions for SError interrupt */
+#define ESR_ELx_AET_SHIFT  (10)
+#define ESR_ELx_AET(UL(0x7) << ESR_ELx_AET_SHIFT)
+/* Uncontainable error */
+#define ESR_ELx_AET_UC (UL(0) << ESR_ELx_AET_SHIFT)
+/* Unrecoverable error */
+#define ESR_ELx_AET_UEU(UL(1) << ESR_ELx_AET_SHIFT)
+/* Restartable error */
+#define ESR_ELx_AET_UEO(UL(2) << ESR_ELx_AET_SHIFT)
+/* Recoverable error */
+#define ESR_ELx_AET_UER(UL(3) << ESR_ELx_AET_SHIFT)
+/* Corrected */
+#define ESR_ELx_AET_CE (UL(6) << ESR_ELx_AET_SHIFT)
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT (20)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..d167ed3 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,7 @@
 #define ARM_EXCEPTION_IRQ0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP   2
+
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE   HVC_STUB_ERR
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 1a0c07e..c0da350 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -395,4 +395,10 @@ static inline void __cpu_init_stage2(void)
  "PARange is %d bits, unsupported configuration!", parange);
 }
 
+enum {
+   KVM_SEI_SEV_RECOVERABLE = 1,
+   KVM_SEI_SEV_FATAL,
+   KVM_SEI_SEV_CORRECTED,
+};
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/system_misc.h 
b/arch/arm64/include/asm/system_misc.h
index e70cb61..73b45fb 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -57,6 +57,7 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, 
const char *cmd);
 })
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr);
+int handle_guest_sei(unsigned int esr);
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 17d8a16..d546d55 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -178,6 +179,61 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu 
*vcpu)
return arm_exit_handlers[hsr_ec];
 }
 
+/**
+ * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
+ * @vcpu:  the VCPU pointer
+ *
+ * For RAS SError interrupt, if the AET is [ESR_ELx_AET_UEU] or
+ * [ESR_ELx_AET_UER], then let host user space do the recovery,
+ * otherwise, directly inject virtual SError to guest or panic.
+ */
+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+   bool impdef_syndrome =  esr & ESR_ELx_ISV;  /* aka IDS */
+   unsigned int aet = esr & ESR_ELx_AET;
+
+   run->exit_reason = KVM_EXIT_EXCEPTION;
+   run->ex.exception = ARM_EXCEPTION_EL1_SERROR;
+
+   /*
+* In below three conditions, it will directly inject the
+* virtual SError:
+* 1. Not support RAS extension
+* 2. The Syndrome is IMPLEMENTATION DEFINED
+*

[PATCH v7 2/4] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

On ARM64 SError syndrome value is specific to the model being
emulated for the guest and user space needs a way to tell the
kernel this value. userspace can specify different value to
affect guest OS error recovery behaviour.

We make this API ARM-specific as we haven't yet reached a consensus
for a generic API for all KVM architectures that will allow us to
do something like this.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 
---
 Documentation/virtual/kvm/api.txt | 11 +++
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/guest.c  |  9 +
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/guest.c|  5 +
 arch/arm64/kvm/reset.c|  3 +++
 include/uapi/linux/kvm.h  |  3 +++
 virt/kvm/arm/arm.c|  7 +++
 8 files changed, 41 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index e63a35f..b076fc8 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4347,3 +4347,14 @@ This capability indicates that userspace can load 
HV_X64_MSR_VP_INDEX msr.  Its
 value is used to denote the target vcpu for a SynIC interrupt.  For
 compatibilty, KVM initializes this msr to KVM's internal vcpu index.  When this
 capability is absent, userspace can still query this msr's value.
+
+8.13 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify syndrome value reported to
+software on taking a virtual SError interrupt exception.
+Userspace can only specify the ISS field for the ESR syndrome, EC field is set
+by hardware when the virtual SError interrupt is taken. If this virtual SError
+is taken to EL1 using AArch64, this value will be reported in ESR_EL1, 
otherwise
+if it is taken to EL1 using AArch32, this value will be reported in DFSR.{AET, 
ExT}.
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 127e2dd..3723138 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -211,6 +211,7 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u64 *syndrome);
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 1e0784e..309b236 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -248,6 +248,15 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
+/*
+ * we only support SEI injection with specified synchronous
+ * in ARM64, not support it in ARM32.
+ */
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u64 *syndrome)
+{
+   return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index af55b3bc..dd4cb25 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -320,6 +320,8 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 
__user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u64 *syndrome);
+
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_unmap_hva_range(struct kvm *kvm,
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..0a08b05 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u64 *syndrome)
+{
+   return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..9163628 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_ARM_PMU_V3:
r = kvm_arm_support_pmu_v3();
break;
+   case KVM_CAP_ARM_SET_SERROR_ESR:
+   r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+   break;
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6cd63c1..aece611 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -929,6 +929,7 @@ struct

[PATCH v7 3/4] arm64: kvm: Set Virtual SError Exception Syndrome for guest

RAS Extension add a VSESR_EL2 register which can provides
the syndrome value reported to software on taking a virtual
SError interrupt exception. This patch supports to specify
this Syndrome.

In the RAS Extensions we can not set all-zero syndrome value
for SError, which means 'RAS error: Uncategorized' instead of
'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
by default.

we also need to support userspace to specify a valid syndrome
value, Because in some case, the recovery is driven by userspace.
This patch can support that serspace can specify it.

In the guest/host world switch, restore this value to VSESR_EL2
only when HCR_EL2.VSE is set. This value no need to be saved
because it is stale vale when guest exit.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 
[Set an impdef ESR for Virtual-SError]
Signed-off-by: James Morse 
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++
 arch/arm64/include/asm/kvm_host.h|  1 +
 arch/arm64/include/asm/sysreg.h  |  3 +++
 arch/arm64/include/asm/system_misc.h |  1 -
 arch/arm64/kvm/guest.c   | 11 ++-
 arch/arm64/kvm/hyp/switch.c  | 15 +++
 arch/arm64/kvm/inject_fault.c| 13 -
 7 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 47983db..74213bd 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu 
*vcpu)
return vcpu->arch.fault.esr_el2;
 }
 
+static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.fault.vsesr_el2;
+}
+
+static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
+{
+   vcpu->arch.fault.vsesr_el2 = val;
+}
+
 static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
u32 esr = kvm_vcpu_get_hsr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index dd4cb25..1a0c07e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_fault_info {
u32 esr_el2;/* Hyp Syndrom Register */
u64 far_el2;/* Hyp Fault Address Register */
u64 hpfar_el2;  /* Hyp IPA Fault Address Register */
+   u32 vsesr_el2;  /* Virtual SError Exception Syndrome Register */
 };
 
 /*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index bd11ca0..e6c7fb8 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -86,6 +86,9 @@
 #define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4)
 #define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3)
 
+/* virtual SError exception syndrome register in armv8.2 */
+#define REG_VSESR_EL2  sys_reg(3, 4, 5, 2, 3)
+
 #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM |
\
  (!!x)<<8 | 0x1f)
 #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM |
\
diff --git a/arch/arm64/include/asm/system_misc.h 
b/arch/arm64/include/asm/system_misc.h
index 07aa8e3..e70cb61 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -57,7 +57,6 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, 
const char *cmd);
 })
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr);
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 0a08b05..a9acbfd 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
 int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u64 *syndrome)
 {
-   return -EINVAL;
+   u64 reg = *syndrome;
+
+   /* inject virtual system Error or asynchronous abort */
+   kvm_inject_vabt(vcpu);
+
+   if (reg)
+   /* set vsesr_el2[24:0] with value that user space specified */
+   kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
+
+   return 0;
 }
 
 int __attribute_const__ kvm_target_cpu(void)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c6f17c7..7c0915e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -67,6 +67,14 @@ static hyp_alternate_select(__activate_traps_arch,
__activate_traps_nvhe, __activate_traps_vhe,
ARM64_HAS_VIRT_HOST_EXTN);
 
+static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu)
+{
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+   (vcpu->arch.hcr_el2 & HCR_VSE))
+   write_sysreg_s(vcpu->arch.fault.vsesr_el2, REG_VSESR_EL2);
+}
+
+

Re: [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC

On 16/10/17 21:06, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
>> We currently have no less than three implementations for the
>> "flush to PoC" code. Let standardize on a single one. This
>> requires a bit of unpleasant moving around, and relies on
>> __kvm_flush_dcache_pte and co being #defines so that they can
>> call into coherent_dcache_guest_page...
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm/include/asm/kvm_mmu.h | 28 
>>  virt/kvm/arm/mmu.c | 20 ++--
>>  2 files changed, 14 insertions(+), 34 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5f1ac88a5951..011b0db85c02 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -235,31 +235,11 @@ static inline void 
>> __coherent_icache_guest_page(kvm_pfn_t pfn,
>>  }
>>  }
>>  
>> -static inline void __kvm_flush_dcache_pte(pte_t pte)
>> -{
>> -void *va = kmap_atomic(pte_page(pte));
>> -
>> -kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>> -
>> -kunmap_atomic(va);
>> -}
>> -
>> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
>> -{
>> -unsigned long size = PMD_SIZE;
>> -kvm_pfn_t pfn = pmd_pfn(pmd);
>> -
>> -while (size) {
>> -void *va = kmap_atomic_pfn(pfn);
>> +#define __kvm_flush_dcache_pte(p)   \
>> +coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
>>  
>> -kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>> -
>> -pfn++;
>> -size -= PAGE_SIZE;
>> -
>> -kunmap_atomic(va);
>> -}
>> -}
>> +#define __kvm_flush_dcache_pmd(p)   \
>> +coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
> 
> Why can't these just be static inlines which call
> __coherent_dcache_guest_page already in the header file directly?

Because if we do that, we get a significant code expansion in the
resulting binary (all the call sites end up having a copy of that function.

> I'm really not too crazy about these #defines.

Neither am I. But actually, this patch is completely wrong. Using the
same functions as the guest cleaning doesn't provide the guarantees
documented next to unmap_stage2_ptes, as we need a clean+invalidate, not
just a clean.

I'll rework this patch (or just drop it).

> In fact, why do we need the coherent_Xcache_guest_page static
> indirection functions in mmu.c in the first place?

Code expansion. That's the only reason.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 22/28] arm64/sve: KVM: Prevent guests from using SVE

On Tue, Oct 10, 2017 at 07:38:39PM +0100, Dave Martin wrote:
> Until KVM has full SVE support, guests must not be allowed to
> execute SVE instructions.
> 
> This patch enables the necessary traps, and also ensures that the
> traps are disabled again on exit from the guest so that the host
> can still use SVE if it wants to.
> 
> This patch introduces another instance of
> __this_cpu_write(fpsimd_last_state, NULL), so this flush operation
> is abstracted out as a separate helper fpsimd_flush_cpu_state().
> Other instances are ported appropriately.

I don't understand this paragraph, beginning from ", so this...".


>From reading the code, what I think is the reason for having to flush
the SVE state (and mark the host state invalid) is that even though we
disallow SVE usage in the guest, the guest can use the normal FP state,
and while we always fully preserve the host state, this could still
corrupt some additional SVE state not properly preserved for the host.
Is that correct?

> 
> As a side effect of this refactoring, a this_cpu_write() in
> fpsimd_cpu_pm_notifier() is changed to __this_cpu_write().  This
> should be fine, since cpu_pm_enter() is supposed to be called only
> with interrupts disabled.

Otherwise the patch itself looks good to me.

Thanks,
-Christoffer

> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Alex Bennée 
> Cc: Marc Zyngier 
> Cc: Ard Biesheuvel 
> ---
>  arch/arm/include/asm/kvm_host.h   |  3 +++
>  arch/arm64/include/asm/fpsimd.h   |  1 +
>  arch/arm64/include/asm/kvm_arm.h  |  4 +++-
>  arch/arm64/include/asm/kvm_host.h | 11 +++
>  arch/arm64/kernel/fpsimd.c| 31 +--
>  arch/arm64/kvm/hyp/switch.c   |  6 +++---
>  virt/kvm/arm/arm.c|  3 +++
>  7 files changed, 53 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 4a879f6..242151e 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -293,4 +293,7 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  struct kvm_device_attr *attr);
>  
> +/* All host FP/SIMD state is restored on guest exit, so nothing to save: */
> +static inline void kvm_fpsimd_flush_cpu_state(void) {}
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 3cfdfbe..10b2824 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -75,6 +75,7 @@ extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct fpsimd_state *state);
>  
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void sve_flush_cpu_state(void);
>  
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
>  #define SVE_VL_ARCH_MAX 0x100
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index dbf0537..7f069ff 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -186,7 +186,8 @@
>  #define CPTR_EL2_TTA (1 << 20)
>  #define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT)
>  #define CPTR_EL2_TZ  (1 << 8)
> -#define CPTR_EL2_DEFAULT 0x33ff
> +#define CPTR_EL2_RES10x32ff /* known RES1 bits in CPTR_EL2 */
> +#define CPTR_EL2_DEFAULT CPTR_EL2_RES1
>  
>  /* Hyp Debug Configuration Register bits */
>  #define MDCR_EL2_TPMS(1 << 14)
> @@ -237,5 +238,6 @@
>  
>  #define CPACR_EL1_FPEN   (3 << 20)
>  #define CPACR_EL1_TTA(1 << 28)
> +#define CPACR_EL1_DEFAULT(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
>  
>  #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e923b58..674912d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -384,4 +385,14 @@ static inline void __cpu_init_stage2(void)
> "PARange is %d bits, unsupported configuration!", parange);
>  }
>  
> +/*
> + * All host FP/SIMD state is restored on guest exit, so nothing needs
> + * doing here except in the SVE case:
> +*/
> +static inline void kvm_fpsimd_flush_cpu_state(void)
> +{
> + if (system_supports_sve())
> + sve_flush_cpu_state();
> +}
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index a9cb794..6ae3703 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1073,6 +1073,33 @@ void fpsimd_flush_task_state(struct task_struct *t)
>   t->thread.fpsimd_state.cpu = NR_CPUS;
>  }
>  
> +static inline void

Re: [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults

On 16/10/17 21:08, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
>> So far, we loose the Exec property whenever we take permission
>> faults, as we always reconstruct the PTE/PMD from scratch. This
>> can be counter productive as we can end-up with the following
>> fault sequence:
>>
>>  X -> RO -> ROX -> RW -> RWX
>>
>> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
>> new entry if it was already cleared in the old one, leadig to a much
>> nicer fault sequence:
>>
>>  X -> ROX -> RWX
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm/include/asm/kvm_mmu.h   | 10 ++
>>  arch/arm64/include/asm/kvm_mmu.h | 10 ++
>>  virt/kvm/arm/mmu.c   | 25 +
>>  3 files changed, 45 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index bf76150aad5f..ad442d86c23e 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>  return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>> +{
>> +return !(pte_val(*pte) & L_PTE_XN);
>> +}
>> +
>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>  {
>>  pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>  return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>> +{
>> +return !(pmd_val(*pmd) & PMD_SECT_XN);
>> +}
>> +
>>  static inline bool kvm_page_empty(void *ptr)
>>  {
>>  struct page *ptr_page = virt_to_page(ptr);
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index 60c420a5ac0d..e7af74b8b51a 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>  return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>> +{
>> +return !(pte_val(*pte) & PTE_S2_XN);
>> +}
>> +
>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>  {
>>  kvm_set_s2pte_readonly((pte_t *)pmd);
>> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>  return kvm_s2pte_readonly((pte_t *)pmd);
>>  }
>>  
>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>> +{
>> +return !(pmd_val(*pmd) & PMD_S2_XN);
>> +}
>> +
>>  static inline bool kvm_page_empty(void *ptr)
>>  {
>>  struct page *ptr_page = virt_to_page(ptr);
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 1911fadde88b..ccc6106764a6 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct 
>> kvm_mmu_memory_cache
>>  return 0;
>>  }
>>  
>> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
>> +{
>> +pmd_t *pmdp;
>> +
>> +pmdp = stage2_get_pmd(kvm, NULL, addr);
>> +if (!pmdp || pmd_none(*pmdp))
>> +return NULL;
>> +
>> +return pte_offset_kernel(pmdp, addr);
>> +}
>> +
> 
> nit, couldn't you change this to be
> 
> stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> 
> Which, if the pmd is a section mapping just checks that, and if we find
> a pte, we check that, and then we can have a simpler one-line call and
> check from both the pte and pmd paths below?

Yes, that's pretty neat. I've folded that in.

> 
>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache 
>> *cache,
>>phys_addr_t addr, const pte_t *new_pte,
>>unsigned long flags)
>> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  if (exec_fault) {
>>  new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>  coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
>> +} else if (fault_status == FSC_PERM) {
>> +/* Preserve execute if XN was already cleared */
>> +pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
>> +
>> +if (old_pmdp && pmd_present(*old_pmdp) &&
>> +kvm_s2pmd_exec(old_pmdp))
>> +new_pmd = kvm_s2pmd_mkexec(new_pmd);
> 
> Is the reverse case not also possible then?  That is, if we have an
> exec_fault, we could check if the entry is already writable and maintain
> the property as well.  Not sure how often that would get hit though, as
> a VM would only execute instructions on a page that has been written to,
> but is somehow read-only at stage2, meaning the host must have marked
> the page as read-only since content was written.  I think this could be
> a

Re: [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault

On 16/10/17 21:08, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
>> The only case where we actually need to perform a dache maintenance
>> is when we map the page for the first time, and subsequent permission
>> faults do not require cache maintenance. Let's make it conditional
>> on not being a permission fault (and thus a translation fault).
> 
> Why do we actually need to do any dcache maintenance when faulting in a
> page?
> 
> Is this for the case when the stage 1 MMU is disabled, or to support
> guest mappings using uncached attributes?

These are indeed the two cases that require cleaning the dcache to PoC.

> Can we do better, for example
> by only flushing the cache if the guest MMU is disabled?

The guest MMU being disabled is easy. But the uncached mapping is much
trickier, and would involve parsing the guest page tables. Not something
I'm really eager to implement.

> 
> Beyond that:
> 
> Reviewed-by: Christoffer Dall 
Thanks!

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 07/11] KVM: arm/arm64: vgic-its: Always attempt to save/restore device and collection tables

In case the device table save fails, we currently do not
attempt to save the collection table. However it may
happen that the device table fails because the structures
in memory are inconsistent with device GITS_BASER however
this does not mean collection backup can't and shouldn't
be performed. Same must happen on restore path.

Without this patch, after a reset and early state backup,
the device table restore may fail due to L1 entry not valid.
If we don't restore the collection table the guest does
not reboot properly.

Signed-off-by: Eric Auger 

---

candidate to be CC'ed stable
---
 virt/kvm/arm/vgic/vgic-its.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index e18f1e4..1c3e83f 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -2381,12 +2381,9 @@ static int vgic_its_save_tables_v0(struct vgic_its *its)
}
 
ret = vgic_its_save_device_tables(its);
-   if (ret)
-   goto out;
 
-   ret = vgic_its_save_collection_table(its);
+   ret |= vgic_its_save_collection_table(its);
 
-out:
unlock_all_vcpus(kvm);
mutex_unlock(>its_lock);
mutex_unlock(>lock);
@@ -2413,11 +2410,9 @@ static int vgic_its_restore_tables_v0(struct vgic_its 
*its)
}
 
ret = vgic_its_restore_collection_table(its);
-   if (ret)
-   goto out;
 
-   ret = vgic_its_restore_device_tables(its);
-out:
+   ret |= vgic_its_restore_device_tables(its);
+
unlock_all_vcpus(kvm);
mutex_unlock(>its_lock);
mutex_unlock(>lock);
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 11/11] KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET

On reset we clear the valid bits of GITS_CBASER and GITS_BASER.
We also clear command queue registers and free the cache (device,
collection, and lpi lists).

Signed-off-by: Eric Auger 
Reviewed-by: Christoffer Dall 

---

v2 -> v3:
- added Christoffer's R-b
---
 arch/arm/include/uapi/asm/kvm.h   |  1 +
 arch/arm64/include/uapi/asm/kvm.h |  1 +
 virt/kvm/arm/vgic/vgic-its.c  | 18 ++
 3 files changed, 20 insertions(+)

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 5db2d4c..7ef0c06 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -215,6 +215,7 @@ struct kvm_arch_memory_slot {
 #define   KVM_DEV_ARM_ITS_SAVE_TABLES  1
 #define   KVM_DEV_ARM_ITS_RESTORE_TABLES   2
 #define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
+#define   KVM_DEV_ARM_ITS_CTRL_RESET   4
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 9f3ca24..b5306ce 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -227,6 +227,7 @@ struct kvm_arch_memory_slot {
 #define   KVM_DEV_ARM_ITS_SAVE_TABLES   1
 #define   KVM_DEV_ARM_ITS_RESTORE_TABLES2
 #define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
+#define   KVM_DEV_ARM_ITS_CTRL_RESET   4
 
 /* Device Control API on vcpu fd */
 #define KVM_ARM_VCPU_PMU_V3_CTRL   0
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 084239c..9ceb348 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -2457,6 +2457,19 @@ static int vgic_its_commit_v0(struct vgic_its *its)
return 0;
 }
 
+static void vgic_its_reset(struct kvm *kvm, struct vgic_its *its)
+{
+   /* We need to keep the ABI specific field values */
+   its->baser_coll_table &= ~GITS_BASER_VALID;
+   its->baser_device_table &= ~GITS_BASER_VALID;
+   its->cbaser = 0;
+   its->creadr = 0;
+   its->cwriter = 0;
+   its->enabled = 0;
+   vgic_its_free_device_list(kvm, its);
+   vgic_its_free_collection_list(kvm, its);
+}
+
 static int vgic_its_has_attr(struct kvm_device *dev,
 struct kvm_device_attr *attr)
 {
@@ -2471,6 +2484,8 @@ static int vgic_its_has_attr(struct kvm_device *dev,
switch (attr->attr) {
case KVM_DEV_ARM_VGIC_CTRL_INIT:
return 0;
+   case KVM_DEV_ARM_ITS_CTRL_RESET:
+   return 0;
case KVM_DEV_ARM_ITS_SAVE_TABLES:
return 0;
case KVM_DEV_ARM_ITS_RESTORE_TABLES:
@@ -2515,6 +2530,9 @@ static int vgic_its_set_attr(struct kvm_device *dev,
case KVM_DEV_ARM_VGIC_CTRL_INIT:
/* Nothing to do */
return 0;
+   case KVM_DEV_ARM_ITS_CTRL_RESET:
+   vgic_its_reset(dev->kvm, its);
+   return 0;
case KVM_DEV_ARM_ITS_SAVE_TABLES:
return abi->save_tables(its);
case KVM_DEV_ARM_ITS_RESTORE_TABLES:
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 09/11] KVM: arm/arm64: vgic-its: free caches when GITS_BASER Valid bit is cleared

When the GITS_BASER.Valid gets cleared, the data structures in
guest RAM are not valid anymore. The device, collection
and LPI lists stored in the in-kernel ITS represent the same
information in some form of cache. So let's void the cache.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- add a comment and clear cache in if block
---
 virt/kvm/arm/vgic/vgic-its.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index f3f0026f..084239c 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1471,8 +1471,9 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm,
  unsigned long val)
 {
const struct vgic_its_abi *abi = vgic_its_get_abi(its);
-   u64 entry_size, device_type;
+   u64 entry_size;
u64 reg, *regptr, clearbits = 0;
+   int type;
 
/* When GITS_CTLR.Enable is 1, we ignore write accesses. */
if (its->enabled)
@@ -1482,12 +1483,12 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm,
case 0:
regptr = >baser_device_table;
entry_size = abi->dte_esz;
-   device_type = GITS_BASER_TYPE_DEVICE;
+   type = GITS_BASER_TYPE_DEVICE;
break;
case 1:
regptr = >baser_coll_table;
entry_size = abi->cte_esz;
-   device_type = GITS_BASER_TYPE_COLLECTION;
+   type = GITS_BASER_TYPE_COLLECTION;
clearbits = GITS_BASER_INDIRECT;
break;
default:
@@ -1499,10 +1500,24 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm,
reg &= ~clearbits;
 
reg |= (entry_size - 1) << GITS_BASER_ENTRY_SIZE_SHIFT;
-   reg |= device_type << GITS_BASER_TYPE_SHIFT;
+   reg |= (u64)type << GITS_BASER_TYPE_SHIFT;
reg = vgic_sanitise_its_baser(reg);
 
*regptr = reg;
+
+   /* Table no longer valid: clear cached data */
+   if (!(reg & GITS_BASER_VALID)) {
+   switch (type) {
+   case GITS_BASER_TYPE_DEVICE:
+   vgic_its_free_device_list(kvm, its);
+   break;
+   case GITS_BASER_TYPE_COLLECTION:
+   vgic_its_free_collection_list(kvm, its);
+   break;
+   default:
+   break;
+   }
+   }
 }
 
 static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 10/11] KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET

At the moment, the in-kernel emulated ITS is not properly reset.
On guest restart/reset some registers keep their old values and
internal structures like device, ITE, and collection lists are not
freed.

This may lead to various bugs. Among them, we can have incorrect state
backup or failure when saving the ITS state at early guest boot stage.

This patch documents a new attribute, KVM_DEV_ARM_ITS_CTRL_RESET in
the KVM_DEV_ARM_VGIC_GRP_CTRL group.

Upon this action, we can reset registers and especially those
pointing to tables previously allocated by the guest and free
the internal data structures storing the list of devices, collections
and lpis.

The usual approach for device reset of having userspace write
the reset values of the registers to the kernel via the register
read/write APIs doesn't work for the ITS because it has some
internal state (caches) which is not exposed as registers,
and there is no register interface for "drop cached data without
writing it back to RAM". So we need a KVM API which mimics the
hardware's reset line, to provide the equivalent behaviour to
a "pull the power cord out of the back of the machine" reset.

Signed-off-by: Eric Auger 
Reported-by: wanghaibin 

---
v2 -> v3:
- reword commit message, credit to Peter Maydell.
- take into account Christoffer rewording comments but still
  kept details. Added Peter's comment but still kept details.
  Peter may disagree.

v1 -> v2:
- Describe architecturally-defined reset values
---
 Documentation/virtual/kvm/devices/arm-vgic-its.txt | 20 
 1 file changed, 20 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic-its.txt 
b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
index eb06beb..4e9bb6f 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic-its.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
@@ -33,6 +33,10 @@ Groups:
   request the initialization of the ITS, no additional parameter in
   kvm_device_attr.addr.
 
+KVM_DEV_ARM_ITS_CTRL_RESET
+  reset the ITS, no additional parameter in kvm_device_attr.addr.
+  See "ITS Reset State" section.
+
 KVM_DEV_ARM_ITS_SAVE_TABLES
   save the ITS table data into guest RAM, at the location provisioned
   by the guest in corresponding registers/table entries.
@@ -157,3 +161,19 @@ Then vcpus can be started.
  - pINTID is the physical LPI ID; if zero, it means the entry is not valid
and other fields are not meaningful.
  - ICID is the collection ID
+
+ ITS Reset State:
+ 
+
+RESET returns the ITS to the same state that it was when first created and
+inited:
+
+- The ITS is not enabled and quiescent
+  GITS_CTLR.Enabled = 0 .Quiescent=1
+- There is no internally cache state
+- No collection or device table are used
+  GITS_BASER.Valid = 0
+- The command queue is not allocated:
+  GITS_CBASER = 0, GITS_CREADR = 0, GITS_CWRITER = 0
+- The ABI version is unchanged and remains the one set when the ITS
+  device was first created.
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 08/11] KVM: arm/arm64: vgic-its: new helper functions to free the caches

From: wanghaibin 

We create 2 new functions that frees the device and
collection lists. this is currently called by vgic_its_destroy()
and we will add other callers in subsequent patches.

We also remove the check on its->device_list.next as it looks
unnecessary. Indeed, the device list always is initialized
when vgic_its_destroy gets called: the kvm device is removed
by kvm_destroy_devices() which loops on all the devices
added to kvm->devices. kvm_ioctl_create_device() only adds
the device to kvm_devices once the lists have been initialized
(in vgic_create_its).

We also move vgic_its_free_device to prepare for new callers.

Signed-off-by: wanghaibin 
Signed-off-by: Eric Auger 

---
[Eric] removed its->device_list.next which is not needed as
pointed out by Wanghaibin. Reword the commit message
---
 virt/kvm/arm/vgic/vgic-its.c | 76 
 1 file changed, 41 insertions(+), 35 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 1c3e83f..f3f0026f 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -611,6 +611,45 @@ static void its_free_ite(struct kvm *kvm, struct its_ite 
*ite)
kfree(ite);
 }
 
+static void vgic_its_free_device(struct kvm *kvm, struct its_device *dev)
+{
+   struct its_ite *ite, *tmp;
+
+   list_for_each_entry_safe(ite, tmp, >itt_head, ite_list)
+   its_free_ite(kvm, ite);
+   list_del(>dev_list);
+   kfree(dev);
+}
+
+static void vgic_its_free_device_list(struct kvm *kvm, struct vgic_its *its)
+{
+   struct list_head *cur, *temp;
+
+   mutex_lock(>its_lock);
+   list_for_each_safe(cur, temp, >device_list) {
+   struct its_device *dev;
+
+   dev = list_entry(cur, struct its_device, dev_list);
+   vgic_its_free_device(kvm, dev);
+   }
+   mutex_unlock(>its_lock);
+}
+
+static void vgic_its_free_collection_list(struct kvm *kvm, struct vgic_its 
*its)
+{
+   struct list_head *cur, *temp;
+
+   list_for_each_safe(cur, temp, >collection_list) {
+   struct its_collection *coll;
+
+   coll = list_entry(cur, struct its_collection, coll_list);
+   list_del(cur);
+   kfree(coll);
+   }
+   mutex_unlock(>its_lock);
+}
+
+
 static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
 {
return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
@@ -1644,46 +1683,13 @@ static int vgic_its_create(struct kvm_device *dev, u32 
type)
return vgic_its_set_abi(its, NR_ITS_ABIS - 1);
 }
 
-static void vgic_its_free_device(struct kvm *kvm, struct its_device *dev)
-{
-   struct its_ite *ite, *tmp;
-
-   list_for_each_entry_safe(ite, tmp, >itt_head, ite_list)
-   its_free_ite(kvm, ite);
-   list_del(>dev_list);
-   kfree(dev);
-}
-
 static void vgic_its_destroy(struct kvm_device *kvm_dev)
 {
struct kvm *kvm = kvm_dev->kvm;
struct vgic_its *its = kvm_dev->private;
-   struct list_head *cur, *temp;
-
-   /*
-* We may end up here without the lists ever having been initialized.
-* Check this and bail out early to avoid dereferencing a NULL pointer.
-*/
-   if (!its->device_list.next)
-   return;
-
-   mutex_lock(>its_lock);
-   list_for_each_safe(cur, temp, >device_list) {
-   struct its_device *dev;
-
-   dev = list_entry(cur, struct its_device, dev_list);
-   vgic_its_free_device(kvm, dev);
-   }
-
-   list_for_each_safe(cur, temp, >collection_list) {
-   struct its_collection *coll;
-
-   coll = list_entry(cur, struct its_collection, coll_list);
-   list_del(cur);
-   kfree(coll);
-   }
-   mutex_unlock(>its_lock);
 
+   vgic_its_free_device_list(kvm, its);
+   vgic_its_free_collection_list(kvm, its);
kfree(its);
 }
 
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 06/11] KVM: arm/arm64: vgic-its: Check GITS_CBASER validity before processing commands

At the moment vgic_its_process_commands() does not
check the CBASER is valid before processing any command.
Let's fix that.

Also rename cbaser local variable into cbaser_pa to avoid
any confusion with the full register.

Signed-off-by: Eric Auger 
---
 virt/kvm/arm/vgic/vgic-its.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 3b539d4..e18f1e4 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1301,17 +1301,20 @@ static void vgic_mmio_write_its_cbaser(struct kvm *kvm, 
struct vgic_its *its,
 /* Must be called with the cmd_lock held. */
 static void vgic_its_process_commands(struct kvm *kvm, struct vgic_its *its)
 {
-   gpa_t cbaser;
+   gpa_t cbaser_pa;
u64 cmd_buf[4];
 
-   /* Commands are only processed when the ITS is enabled. */
-   if (!its->enabled)
+   /*
+* Commands are only processed when the ITS is enabled and
+* CBASER is valid
+*/
+   if (!its->enabled || (!(its->cbaser & GITS_CBASER_VALID)))
return;
 
-   cbaser = CBASER_ADDRESS(its->cbaser);
+   cbaser_pa = CBASER_ADDRESS(its->cbaser);
 
while (its->cwriter != its->creadr) {
-   int ret = kvm_read_guest(kvm, cbaser + its->creadr,
+   int ret = kvm_read_guest(kvm, cbaser_pa + its->creadr,
 cmd_buf, ITS_CMD_SIZE);
/*
 * If kvm_read_guest() fails, this could be due to the guest
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 03/11] KVM: arm/arm64: vgic-its: Improve error reporting on device table save

At the moment the device table save() returns -EINVAL if
vgic_its_check_id() fails to return the gpa of the entry
associated to the device/collection id. Let vgic_its_check_id()
return an int instead of a bool and return a more precised
error value:
- EINVAL in case the id is out of range
- EFAULT if the gpa is not provisionned or is not valid

We also check first the GITS_BASER Valid bit is set.

This allows the userspace to discriminate failure reasons.

Signed-off-by: Eric Auger 

---

need to CC stable

v2 -> v3:
- correct kerneldoc comment
---
 virt/kvm/arm/vgic/vgic-its.c | 55 +---
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index a4ff8f7..e59363e 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -687,15 +687,25 @@ static int vgic_its_cmd_handle_movi(struct kvm *kvm, 
struct vgic_its *its,
return 0;
 }
 
-/*
- * Check whether an ID can be stored into the corresponding guest table.
+/**
+ * vgic_its_check_id - Check whether an ID can be stored into
+ * the corresponding guest table.
+ *
+ * @its: its handle
+ * @baser: GITS_BASER register
+ * @id: id of the device/collection
+ * @eaddr: output gpa of the corresponding table entry
+ *
  * For a direct table this is pretty easy, but gets a bit nasty for
  * indirect tables. We check whether the resulting guest physical address
  * is actually valid (covered by a memslot and guest accessible).
  * For this we have to read the respective first level entry.
+ *
+ * Return: 0 on success, -EINVAL if @id is out of range, -EFAULT if
+ * the address cannot be computed or is not valid
  */
-static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
- gpa_t *eaddr)
+static int vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
+gpa_t *eaddr)
 {
int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
u64 indirect_ptr, type = GITS_BASER_TYPE(baser);
@@ -703,50 +713,56 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 
baser, u32 id,
int index;
gfn_t gfn;
 
+   if (!(baser & GITS_BASER_VALID))
+   return -EFAULT;
+
switch (type) {
case GITS_BASER_TYPE_DEVICE:
if (id >= BIT_ULL(VITS_TYPER_DEVBITS))
-   return false;
+   return -EINVAL;
break;
case GITS_BASER_TYPE_COLLECTION:
/* as GITS_TYPER.CIL == 0, ITS supports 16-bit collection ID */
if (id >= BIT_ULL(16))
-   return false;
+   return -EINVAL;
break;
default:
-   return false;
+   return -EINVAL;
}
 
if (!(baser & GITS_BASER_INDIRECT)) {
phys_addr_t addr;
 
if (id >= (l1_tbl_size / esz))
-   return false;
+   return -EINVAL;
 
addr = BASER_ADDRESS(baser) + id * esz;
gfn = addr >> PAGE_SHIFT;
 
if (eaddr)
*eaddr = addr;
-   return kvm_is_visible_gfn(its->dev->kvm, gfn);
+   if (kvm_is_visible_gfn(its->dev->kvm, gfn))
+   return 0;
+   else
+   return -EFAULT;
}
 
/* calculate and check the index into the 1st level */
index = id / (SZ_64K / esz);
if (index >= (l1_tbl_size / sizeof(u64)))
-   return false;
+   return -EINVAL;
 
/* Each 1st level entry is represented by a 64-bit value. */
if (kvm_read_guest(its->dev->kvm,
   BASER_ADDRESS(baser) + index * sizeof(indirect_ptr),
   _ptr, sizeof(indirect_ptr)))
-   return false;
+   return -EFAULT;
 
indirect_ptr = le64_to_cpu(indirect_ptr);
 
/* check the valid bit of the first level entry */
if (!(indirect_ptr & BIT_ULL(63)))
-   return false;
+   return -EFAULT;
 
/*
 * Mask the guest physical address and calculate the frame number.
@@ -762,7 +778,10 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 
baser, u32 id,
 
if (eaddr)
*eaddr = indirect_ptr;
-   return kvm_is_visible_gfn(its->dev->kvm, gfn);
+   if (kvm_is_visible_gfn(its->dev->kvm, gfn))
+   return 0;
+   else
+   return -EFAULT;
 }
 
 static int vgic_its_alloc_collection(struct vgic_its *its,
@@ -771,7 +790,7 @@ static int vgic_its_alloc_collection(struct vgic_its *its,
 {
struct its_collection *collection;
 
-   if (!vgic_its_check_id(its, its->baser_coll_table, coll_id, NULL))
+   if (vgic_its_check_id(its, its->baser_coll_table, coll_id, NULL))

[PATCH v4 02/11] KVM: arm/arm64: vgic-its: fix vgic_its_restore_collection_table returned value

vgic_its_restore_cte returns +1 if the collection table entry
is valid and properly decoded. As a consequence, if the
collection table is fully filled with valid data that are
decoded without error, vgic_its_restore_collection_table()
returns +1.  This is wrong.

Let's use the standard C convention for both vgic_its_restore_cte
and vgic_its_restore_collection_table. vgic_its_restore_cte now
returns whether we have reached the end of the table in the @last
output parameter. vgic_its_restore_collection_table aborts in case
of error or if the end is found. Otherwise, it now returns 0.

Fixes: ea1ad53e1e31a3 (KVM: arm64: vgic-its: Collection table save/restore)
Signed-off-by: Eric Auger 

---

v2 -> v3: creation
---
 virt/kvm/arm/vgic/vgic-its.c | 40 +++-
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index eea14a1..a4ff8f7 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -2204,7 +2204,19 @@ static int vgic_its_save_cte(struct vgic_its *its,
return kvm_write_guest(its->dev->kvm, gpa, , esz);
 }
 
-static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz)
+/**
+ * vgic_its_restore_cte - Restore a collection table entry
+ *
+ * @its: its handle
+ * @gpa: guest physical address of the entry
+ * @esz: entry size
+ * @last: output boolean indicating whether we have reached the
+ *   end of the collection table (ie. an invalid entry was decoded)
+ *
+ * Return: 0 upon success, < 0 on error
+ */
+static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz,
+   bool *last)
 {
struct its_collection *collection;
struct kvm *kvm = its->dev->kvm;
@@ -2217,7 +2229,8 @@ static int vgic_its_restore_cte(struct vgic_its *its, 
gpa_t gpa, int esz)
if (ret)
return ret;
val = le64_to_cpu(val);
-   if (!(val & KVM_ITS_CTE_VALID_MASK))
+   *last = !(val & KVM_ITS_CTE_VALID_MASK);
+   if (*last)
return 0;
 
target_addr = (u32)(val >> KVM_ITS_CTE_RDBASE_SHIFT);
@@ -2233,7 +2246,7 @@ static int vgic_its_restore_cte(struct vgic_its *its, 
gpa_t gpa, int esz)
if (ret)
return ret;
collection->target_addr = target_addr;
-   return 1;
+   return 0;
 }
 
 /**
@@ -2278,8 +2291,13 @@ static int vgic_its_save_collection_table(struct 
vgic_its *its)
 
 /**
  * vgic_its_restore_collection_table - reads the collection table
- * in guest memory and restores the ITS internal state. Requires the
- * BASER registers to be restored before.
+ * in guest memory and restores the ITS collection cache.
+ *
+ * @its: its handle
+ *
+ * Requires the Collection BASER to be previously restored.
+ *
+ * Returns 0 on success or < 0 on error
  */
 static int vgic_its_restore_collection_table(struct vgic_its *its)
 {
@@ -2297,13 +2315,17 @@ static int vgic_its_restore_collection_table(struct 
vgic_its *its)
max_size = GITS_BASER_NR_PAGES(its->baser_coll_table) * SZ_64K;
 
while (read < max_size) {
-   ret = vgic_its_restore_cte(its, gpa, cte_esz);
-   if (ret <= 0)
-   break;
+   bool last;
+
+   ret = vgic_its_restore_cte(its, gpa, cte_esz, );
+   if (ret < 0 || last)
+   return ret;
gpa += cte_esz;
read += cte_esz;
}
-   return ret;
+
+   /* table was fully filled with valid entries, decoded without error */
+   return 0;
 }
 
 /**
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 04/11] KVM: arm/arm64: vgic-its: Check CBASER/BASER validity before enabling the ITS

The spec says it is UNPREDICTABLE to enable the ITS
if any of the following conditions are true:

- GITS_CBASER.Valid == 0.
- GITS_BASER.Valid == 0, for any GITS_BASER register
  where the Type field indicates Device.
- GITS_BASER.Valid == 0, for any GITS_BASER register
  where the Type field indicates Interrupt Collection and
  GITS_TYPER.HCC == 0.

In that case, let's keep the ITS disabled.

Signed-off-by: Eric Auger 
Reported-by: Andre Przywara 

---

v3: creation
---
 virt/kvm/arm/vgic/vgic-its.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index e59363e..e61736b 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1488,6 +1488,16 @@ static void vgic_mmio_write_its_ctlr(struct kvm *kvm, 
struct vgic_its *its,
its->enabled = !!(val & GITS_CTLR_ENABLE);
 
/*
+* It is UNPREDICTABLE to enable the ITS if any of the CBASER or
+* device/collection BASER are invalid
+*/
+   if (its->enabled &&
+   (!(its->baser_device_table & GITS_BASER_VALID) ||
+!(its->baser_coll_table & GITS_BASER_VALID) ||
+!(its->cbaser && GITS_CBASER_VALID)))
+   its->enabled = false;
+
+   /*
 * Try to process any pending commands. This function bails out early
 * if the ITS is disabled or no commands have been queued.
 */
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 05/11] KVM: arm/arm64: vgic-its: Check GITS_BASER Valid bit before saving tables

At the moment we don't properly check the GITS_BASER.Valid
bit before saving the collection and device tables.

On vgic_its_save_collection_table() we use the GITS_BASER gpa
field whereas the Valid bit should be used.

On vgic_its_save_device_tables() there is no check. This can
cause various bugs, among which a subsequent fault when accessing
the table in guest memory.

Let's systematically check the Valid bit before doing anything.

We also uniformize the code between save and restore.

Signed-off-by: Eric Auger 
Reviewed-by: Andre Przywara 
Reviewed-by: Christoffer Dall 

---

v2 -> v3:
- slight rewording of the commit message
- added Andre's and Christoffer's R-b
---
 virt/kvm/arm/vgic/vgic-its.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index e61736b..3b539d4 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -2110,11 +2110,12 @@ static int vgic_its_device_cmp(void *priv, struct 
list_head *a,
 static int vgic_its_save_device_tables(struct vgic_its *its)
 {
const struct vgic_its_abi *abi = vgic_its_get_abi(its);
+   u64 baser = its->baser_device_table;
struct its_device *dev;
int dte_esz = abi->dte_esz;
-   u64 baser;
 
-   baser = its->baser_device_table;
+   if (!(baser & GITS_BASER_VALID))
+   return 0;
 
list_sort(NULL, >device_list, vgic_its_device_cmp);
 
@@ -2285,17 +2286,17 @@ static int vgic_its_restore_cte(struct vgic_its *its, 
gpa_t gpa, int esz,
 static int vgic_its_save_collection_table(struct vgic_its *its)
 {
const struct vgic_its_abi *abi = vgic_its_get_abi(its);
+   u64 baser = its->baser_coll_table;
+   gpa_t gpa = BASER_ADDRESS(baser);
struct its_collection *collection;
u64 val;
-   gpa_t gpa;
size_t max_size, filled = 0;
int ret, cte_esz = abi->cte_esz;
 
-   gpa = BASER_ADDRESS(its->baser_coll_table);
-   if (!gpa)
+   if (!(baser & GITS_BASER_VALID))
return 0;
 
-   max_size = GITS_BASER_NR_PAGES(its->baser_coll_table) * SZ_64K;
+   max_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
 
list_for_each_entry(collection, >collection_list, coll_list) {
ret = vgic_its_save_cte(its, collection, gpa, cte_esz);
@@ -2331,17 +2332,18 @@ static int vgic_its_save_collection_table(struct 
vgic_its *its)
 static int vgic_its_restore_collection_table(struct vgic_its *its)
 {
const struct vgic_its_abi *abi = vgic_its_get_abi(its);
+   u64 baser = its->baser_coll_table;
int cte_esz = abi->cte_esz;
size_t max_size, read = 0;
gpa_t gpa;
int ret;
 
-   if (!(its->baser_coll_table & GITS_BASER_VALID))
+   if (!(baser & GITS_BASER_VALID))
return 0;
 
-   gpa = BASER_ADDRESS(its->baser_coll_table);
+   gpa = BASER_ADDRESS(baser);
 
-   max_size = GITS_BASER_NR_PAGES(its->baser_coll_table) * SZ_64K;
+   max_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
 
while (read < max_size) {
bool last;
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH v4 01/11] KVM: arm/arm64: vgic-its: fix return value for device table restore

AT the moment if ITT only contains invalid entries,
vgic_its_restore_itt returns 1 and this is considered as
an an error in vgic_its_restore_dte.

Also in case the device table only contains invalid entries,
the table restore fails and this is not correct.

This patch fully revisits the errror handling while fixing those
2 bugs.

- entry_fn_t now takes a valid output paraleter
- scan_its_table() now returns <= 0 values and output 2 booleans,
  valid and last.
- vgic_its_restore_itt() now returns <= 0 values.
- vgic_its_restore_device_tables() also returns <= 0 values.

With that patch we are able to properly handle the case where
all data are invalid but we still are able to detect the case
where a next entry was referenced by some valid entry and
never found.

Fixes: 57a9a117154c93 (KVM: arm64: vgic-its: Device table save/restore)
Fixes: eff484e0298da5 (KVM: arm64: vgic-its: ITT save and restore)
Signed-off-by: Eric Auger 
Reported-by: wanghaibin 

---

need to CC stable

v3 -> v4:
- set *valid at beginning of handle_l1_dte

v2 -> v3:
- add comments
- added valid parameter
- vgic_its_restore_itt don't return +1 anymore
- reword the commit message

v1 -> v2:
- if (ret > 0) ret = 0
---
 virt/kvm/arm/vgic/vgic-its.c | 95 
 1 file changed, 70 insertions(+), 25 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index f51c1e1..eea14a1 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1772,16 +1772,20 @@ static u32 compute_next_eventid_offset(struct list_head 
*h, struct its_ite *ite)
 
 /**
  * entry_fn_t - Callback called on a table entry restore path
+ *
  * @its: its handle
  * @id: id of the entry
  * @entry: pointer to the entry
  * @opaque: pointer to an opaque data
+ * @valid: indicates whether valid data is associated to this entry
+ * (the entry itself in case of linear table or entries in the next level,
+ * in case of hierachical tables)
  *
  * Return: < 0 on error, 0 if last element was identified, id offset to next
  * element otherwise
  */
 typedef int (*entry_fn_t)(struct vgic_its *its, u32 id, void *entry,
- void *opaque);
+ void *opaque, bool *valid);
 
 /**
  * scan_its_table - Scan a contiguous table in guest RAM and applies a function
@@ -1794,29 +1798,34 @@ typedef int (*entry_fn_t)(struct vgic_its *its, u32 id, 
void *entry,
  * @start_id: the ID of the first entry in the table
  * (non zero for 2d level tables)
  * @fn: function to apply on each entry
+ * @opaque: opaque data passed to @fn
+ * @valid: indicates whether the table contains any valid data
+ * @last: returns whether the last valid entry was decoded
  *
- * Return: < 0 on error, 0 if last element was identified, 1 otherwise
- * (the last element may not be found on second level tables)
+ * Return: < 0 on error, 0 on success
  */
 static int scan_its_table(struct vgic_its *its, gpa_t base, int size, int esz,
- int start_id, entry_fn_t fn, void *opaque)
+ int start_id, entry_fn_t fn, void *opaque,
+ bool *valid, bool *last)
 {
void *entry = kzalloc(esz, GFP_KERNEL);
struct kvm *kvm = its->dev->kvm;
unsigned long len = size;
int id = start_id;
gpa_t gpa = base;
+   int next_offset = 0;
int ret;
 
while (len > 0) {
-   int next_offset;
size_t byte_offset;
+   bool entry_valid;
 
ret = kvm_read_guest(kvm, gpa, entry, esz);
if (ret)
goto out;
 
-   next_offset = fn(its, id, entry, opaque);
+   next_offset = fn(its, id, entry, opaque, _valid);
+   *valid |= entry_valid;
if (next_offset <= 0) {
ret = next_offset;
goto out;
@@ -1827,9 +1836,15 @@ static int scan_its_table(struct vgic_its *its, gpa_t 
base, int size, int esz,
gpa += byte_offset;
len -= byte_offset;
}
-   ret =  1;
-
+   /*
+* the table lookup was completed without identifying the
+* last valid entry (ie. next_offset > 0).
+*/
+   ret = 0;
 out:
+   if (!next_offset)
+   *last = true;
+
kfree(entry);
return ret;
 }
@@ -1854,12 +1869,14 @@ static int vgic_its_save_ite(struct vgic_its *its, 
struct its_device *dev,
 
 /**
  * vgic_its_restore_ite - restore an interrupt translation entry
+ *
  * @event_id: id used for indexing
  * @ptr: pointer to the ITE entry
  * @opaque: pointer to the its_device
+ * @valid: indicates whether the ite is valid
  */
 static int vgic_its_restore_ite(struct vgic_its *its, u32 event_id,
-   void *ptr, void *opaque)
+   void *ptr, void *opaque, bool

[PATCH v4 00/11] vITS Migration fixes and reset

This series fixes various bugs observed when saving/restoring the
ITS state before the guest writes the ITS registers (on first boot or
after reset/reboot).

This is a follow up of Wanghaibin's series [1] plus additional
patches following additional code review. It also proposes one
ITS reset implementation.

Currently, the in-kernel emulated ITS is not reset. After a
reset/reboot, the ITS register values and caches are left
unchanged. Registers may point to some tables in guest memory
which do not exist anymore. If an ITS state backup is initiated
before the guest re-writes the registers, the save fails
because inconsistencies are detected. Also restore of data saved
as such moment is failing.

Patches [1-7] attempt to fix the migration issues without
implementing the reset.
As such they may be candidate for stable:
- handle case where all collection, device and ITT entries are
  invalid on restore (which shouldn't be an error)
- limit the cases where we return -EINVAL in table save()
- never attempt to use GITS_BASER and GITS_CBASER if they are
  not valid.
- Check GITS_BASER and GITS_CBASER on ITS enable
- systematically do both device and collection save/restore even if
  one fails.

Patches [8-11] implement a new ITS reset IOCTL

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/v4.14-rc5-its-reset-v4

* Testing:
- on Cavium using a virtio-net-pci guest and various sequences of
  guest shutdown -r now, virsh reset, virsh suspend/resume,
  virsh reboot, virsh save.restore, virsh shutdown

References:
[1] [RFC PATCH 0/3] fix migrate failed when vm is in booting
https://www.spinics.net/lists/kvm-arm/msg27121.html

History:
v3 -> v4:
- fixes a bug in indirect mode: in handle_l1_dte, set
  *valid at the beginning of the function

v2 -> v3:
- Revisited error handling in restore functions
- Added "KVM: arm/arm64: vgic-its: fix
vgic_its_restore_collection_table returned value"
- Added "KVM: arm/arm64: vgic-its: Check CBASER/BASER validity
  before enabling the ITS"
- Removed KVM: arm/arm64: vgic-its: Always allow clearing
  GITS_CREADR/CWRITER
- Reworded documentation according to Christoffer's comments

v1 -> v2:
- added KVM: arm/arm64: vgic-its: Always attempt to save/restore
  device and collection tables

PATCH v1
- series including 2 modified patches of Wanghaibin


Eric Auger (10):
  KVM: arm/arm64: vgic-its: fix return value for device table restore
  KVM: arm/arm64: vgic-its: fix vgic_its_restore_collection_table
returned value
  KVM: arm/arm64: vgic-its: Improve error reporting on device table save
  KVM: arm/arm64: vgic-its: Check CBASER/BASER validity before enabling
the ITS
  KVM: arm/arm64: vgic-its: Check GITS_BASER Valid bit before saving
tables
  KVM: arm/arm64: vgic-its: Check GITS_CBASER validity before processing
commands
  KVM: arm/arm64: vgic-its: Always attempt to save/restore device and
collection tables
  KVM: arm/arm64: vgic-its: free caches when GITS_BASER Valid bit is
cleared
  KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
  KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET

wanghaibin (1):
  KVM: arm/arm64: vgic-its: new helper functions to free the caches

 Documentation/virtual/kvm/devices/arm-vgic-its.txt |  20 ++
 arch/arm/include/uapi/asm/kvm.h|   1 +
 arch/arm64/include/uapi/asm/kvm.h  |   1 +
 virt/kvm/arm/vgic/vgic-its.c   | 361 ++---
 4 files changed, 270 insertions(+), 113 deletions(-)

-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing