Re: [PATCH] vfio-pci: Fix MSI IRQ forwarding for without per-vector masking

2019-03-21 Thread Leo Yan
On Fri, Mar 22, 2019 at 01:23:08PM +0800, Leo Yan wrote:
> If MSI doesn't support per-vector masking capability and
> PCI_MSI_FLAGS_MASKBIT isn't set in message control field, the function
> vfio_pci_msi_vector_write() will directly bail out for this case and
> every vector's 'virt_state' keeps setting bit VFIO_PCI_MSI_STATE_MASKED.
> 
> This results in the state maintained in 'virt_state' cannot really
> reflect the MSI hardware state; finally it will mislead the function
> vfio_pci_update_msi_entry() to skip IRQ forwarding with below flow:
> 
> vfio_pci_update_msi_entry() {
> 
>   [...]
> 
>   if (msi_is_masked(entry->virt_state) == msi_is_masked(entry->phys_state))
>   return 0;  ==> skip IRQ forwarding
> 
>   [...]
> }
> 
> To fix this issue, when detect PCI_MSI_FLAGS_MASKBIT is not set in the
> message control field, this patch simply clears bit
> VFIO_PCI_MSI_STATE_MASKED for all vectors 'virt_state'; at the end
> vfio_pci_update_msi_entry() can forward MSI IRQ successfully.

Just remind, this patch is for kvmtool but not for kernel.  Sorry I
forget to add it in subject.

Thanks,
Leo Yan

> Signed-off-by: Leo Yan 
> ---
>  vfio/pci.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index ba971eb..4fd24ac 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -363,8 +363,18 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, 
> struct vfio_device *vdev,
>   struct vfio_pci_device *pdev = >pci;
>   struct msi_cap_64 *msi_cap_64 = PCI_CAP(>hdr, pdev->msi.pos);
>  
> - if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT))
> + if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT)) {
> + /*
> +  * If MSI doesn't support per-vector masking capability,
> +  * simply unmask for all vectors.
> +  */
> + for (i = 0; i < pdev->msi.nr_entries; i++) {
> + entry = >msi.entries[i];
> + msi_set_masked(entry->virt_state, false);
> + }
> +
>   return 0;
> + }
>  
>   if (msi_cap_64->ctrl & PCI_MSI_FLAGS_64BIT)
>   mask_pos = PCI_MSI_MASK_64;
> -- 
> 2.19.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] vfio-pci: Fix MSI IRQ forwarding for without per-vector masking

2019-03-21 Thread Leo Yan
If MSI doesn't support per-vector masking capability and
PCI_MSI_FLAGS_MASKBIT isn't set in message control field, the function
vfio_pci_msi_vector_write() will directly bail out for this case and
every vector's 'virt_state' keeps setting bit VFIO_PCI_MSI_STATE_MASKED.

This results in the state maintained in 'virt_state' cannot really
reflect the MSI hardware state; finally it will mislead the function
vfio_pci_update_msi_entry() to skip IRQ forwarding with below flow:

vfio_pci_update_msi_entry() {

  [...]

  if (msi_is_masked(entry->virt_state) == msi_is_masked(entry->phys_state))
  return 0;  ==> skip IRQ forwarding

  [...]
}

To fix this issue, when detect PCI_MSI_FLAGS_MASKBIT is not set in the
message control field, this patch simply clears bit
VFIO_PCI_MSI_STATE_MASKED for all vectors 'virt_state'; at the end
vfio_pci_update_msi_entry() can forward MSI IRQ successfully.

Signed-off-by: Leo Yan 
---
 vfio/pci.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index ba971eb..4fd24ac 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -363,8 +363,18 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, 
struct vfio_device *vdev,
struct vfio_pci_device *pdev = >pci;
struct msi_cap_64 *msi_cap_64 = PCI_CAP(>hdr, pdev->msi.pos);
 
-   if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT))
+   if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT)) {
+   /*
+* If MSI doesn't support per-vector masking capability,
+* simply unmask for all vectors.
+*/
+   for (i = 0; i < pdev->msi.nr_entries; i++) {
+   entry = >msi.entries[i];
+   msi_set_masked(entry->virt_state, false);
+   }
+
return 0;
+   }
 
if (msi_cap_64->ctrl & PCI_MSI_FLAGS_64BIT)
mask_pos = PCI_MSI_MASK_64;
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 09/22] vfio: VFIO_IOMMU_BIND/UNBIND_MSI

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:19 +0100
Eric Auger  wrote:

> This patch adds the VFIO_IOMMU_BIND/UNBIND_MSI ioctl which aim
> to pass/withdraw the guest MSI binding to/from the host.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v3 -> v4:
> - add UNBIND
> - unwind on BIND error
> 
> v2 -> v3:
> - adapt to new proto of bind_guest_msi
> - directly use vfio_iommu_for_each_dev
> 
> v1 -> v2:
> - s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi
> ---
>  drivers/vfio/vfio_iommu_type1.c | 58 +
>  include/uapi/linux/vfio.h   | 29 +
>  2 files changed, 87 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 12a40b9db6aa..66513679081b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1710,6 +1710,25 @@ static int vfio_cache_inv_fn(struct device *dev, void 
> *data)
>   return iommu_cache_invalidate(d, dev, >info);
>  }
>  
> +static int vfio_bind_msi_fn(struct device *dev, void *data)
> +{
> + struct vfio_iommu_type1_bind_msi *ustruct =
> + (struct vfio_iommu_type1_bind_msi *)data;
> + struct iommu_domain *d = iommu_get_domain_for_dev(dev);
> +
> + return iommu_bind_guest_msi(d, dev, ustruct->iova,
> + ustruct->gpa, ustruct->size);
> +}
> +
> +static int vfio_unbind_msi_fn(struct device *dev, void *data)
> +{
> + dma_addr_t *iova = (dma_addr_t *)data;
> + struct iommu_domain *d = iommu_get_domain_for_dev(dev);

Same as previous, we can encapsulate domain in our own struct to avoid
a lookup.

> +
> + iommu_unbind_guest_msi(d, dev, *iova);

Is it strange that iommu-core is exposing these interfaces at a device
level if every one of them requires us to walk all the devices?  Thanks,

Alex

> + return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1814,6 +1833,45 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> vfio_cache_inv_fn);
>   mutex_unlock(>lock);
>   return ret;
> + } else if (cmd == VFIO_IOMMU_BIND_MSI) {
> + struct vfio_iommu_type1_bind_msi ustruct;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_bind_msi,
> + size);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + ret = vfio_iommu_for_each_dev(iommu, ,
> +   vfio_bind_msi_fn);
> + if (ret)
> + vfio_iommu_for_each_dev(iommu, ,
> + vfio_unbind_msi_fn);
> + mutex_unlock(>lock);
> + return ret;
> + } else if (cmd == VFIO_IOMMU_UNBIND_MSI) {
> + struct vfio_iommu_type1_unbind_msi ustruct;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_unbind_msi,
> + iova);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + ret = vfio_iommu_for_each_dev(iommu, ,
> +   vfio_unbind_msi_fn);
> + mutex_unlock(>lock);
> + return ret;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 29f0ef2d805d..6763389b6adc 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -789,6 +789,35 @@ struct vfio_iommu_type1_cache_invalidate {
>  };
>  #define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 24)
>  
> +/**
> + * VFIO_IOMMU_BIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 25,
> + *   struct vfio_iommu_type1_bind_msi)
> + *
> + * Pass a stage 1 MSI doorbell mapping to the host so that this
> + * latter can build a nested stage2 mapping
> + */
> +struct vfio_iommu_type1_bind_msi {
> + __u32   argsz;
> + __u32   flags;
> + __u64   iova;
> + __u64   gpa;
> + __u64   size;
> +};
> +#define VFIO_IOMMU_BIND_MSI  _IO(VFIO_TYPE, VFIO_BASE + 25)
> +
> +/**
> + * VFIO_IOMMU_UNBIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 26,
> + *   struct vfio_iommu_type1_unbind_msi)
> + *
> + * Unregister an MSI mapping
> + */
> +struct vfio_iommu_type1_unbind_msi {
> + __u32   argsz;
> + __u32   flags;
> + __u64   iova;
> +};
> +#define VFIO_IOMMU_UNBIND_MSI  _IO(VFIO_TYPE, VFIO_BASE + 26)
> +
>  /*  Additional API for SPAPR TCE (Server 

Re: [PATCH v6 08/22] vfio: VFIO_IOMMU_CACHE_INVALIDATE

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:18 +0100
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> When the guest "owns" the stage 1 translation structures,  the host
> IOMMU driver has no knowledge of caching structure updates unless
> the guest invalidation requests are trapped and passed down to the
> host.
> 
> This patch adds the VFIO_IOMMU_CACHE_INVALIDATE ioctl with aims
> at propagating guest stage1 IOMMU cache invalidations to the host.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v2 -> v3:
> - introduce vfio_iommu_for_each_dev back in this patch
> 
> v1 -> v2:
> - s/TLB/CACHE
> - remove vfio_iommu_task usage
> - commit message rewording
> ---
>  drivers/vfio/vfio_iommu_type1.c | 47 +
>  include/uapi/linux/vfio.h   | 13 +
>  2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 222e9199edbf..12a40b9db6aa 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -113,6 +113,26 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  

static struct foo {
struct iommu_domain *domain;
void *data;
};

> +/* iommu->lock must be held */
> +static int
> +vfio_iommu_for_each_dev(struct vfio_iommu *iommu, void *data,
> + int (*fn)(struct device *, void *))
> +{
> + struct vfio_domain *d;
> + struct vfio_group *g;
> + int ret = 0;

struct foo bar = { .data = data };

> +
> + list_for_each_entry(d, >domain_list, next) {

bar.domain = d->domain;

> + list_for_each_entry(g, >group_list, next) {
> + ret = iommu_group_for_each_dev(g->iommu_group,
> +data, fn);

s/data//

> + if (ret)
> + break;
> + }
> + }
> + return ret;
> +}
> +
>  static int put_pfn(unsigned long pfn, int prot);
>  
>  /*
> @@ -1681,6 +1701,15 @@ vfio_attach_pasid_table(struct vfio_iommu *iommu,
>   return ret;
>  }
>  
> +static int vfio_cache_inv_fn(struct device *dev, void *data)
> +{

struct foo *bar = data;

> + struct vfio_iommu_type1_cache_invalidate *ustruct =
> + (struct vfio_iommu_type1_cache_invalidate *)data;

... = bar->data;

> + struct iommu_domain *d = iommu_get_domain_for_dev(dev);

... = bar->domain;

¯\_(ツ)_/¯ seems more efficient that doing a lookup.

> +
> + return iommu_cache_invalidate(d, dev, >info);
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1767,6 +1796,24 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
>   vfio_detach_pasid_table(iommu);
>   return 0;
> + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) {
> + struct vfio_iommu_type1_cache_invalidate ustruct;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_cache_invalidate,
> + info);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + ret = vfio_iommu_for_each_dev(iommu, ,
> +   vfio_cache_inv_fn);

Guess what has a version field that never gets checked ;)

Thanks,
Alex

> + mutex_unlock(>lock);
> + return ret;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 329d378565d9..29f0ef2d805d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -776,6 +776,19 @@ struct vfio_iommu_type1_attach_pasid_table {
>  #define VFIO_IOMMU_ATTACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 22)
>  #define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23)
>  
> +/**
> + * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + 24,
> + *   struct vfio_iommu_type1_cache_invalidate)
> + *
> + * Propagate guest IOMMU cache invalidation to the host.
> + */
> +struct vfio_iommu_type1_cache_invalidate {
> + __u32   argsz;
> + __u32   flags;
> + struct iommu_cache_invalidate_info info;
> +};
> +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 24)
> +
>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 07/22] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:17 +0100
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> This patch adds VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE ioctl
> which aims to pass/withdraw the virtual iommu guest configuration
> to/from the VFIO driver downto to the iommu subsystem.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Eric Auger 
> 
> ---
> v3 -> v4:
> - restore ATTACH/DETACH
> - add unwind on failure
> 
> v2 -> v3:
> - s/BIND_PASID_TABLE/SET_PASID_TABLE
> 
> v1 -> v2:
> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
> - remove the struct device arg
> ---
>  drivers/vfio/vfio_iommu_type1.c | 53 +
>  include/uapi/linux/vfio.h   | 17 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 73652e21efec..222e9199edbf 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1644,6 +1644,43 @@ static int vfio_domains_have_iommu_cache(struct 
> vfio_iommu *iommu)
>   return ret;
>  }
>  
> +static void
> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
> +{
> + struct vfio_domain *d;
> +
> + mutex_lock(>lock);
> +
> + list_for_each_entry(d, >domain_list, next) {
> + iommu_detach_pasid_table(d->domain);
> + }
> + mutex_unlock(>lock);
> +}
> +
> +static int
> +vfio_attach_pasid_table(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_attach_pasid_table *ustruct)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + mutex_lock(>lock);
> +
> + list_for_each_entry(d, >domain_list, next) {
> + ret = iommu_attach_pasid_table(d->domain, >config);
> + if (ret)
> + goto unwind;
> + }
> + goto unlock;
> +unwind:
> + list_for_each_entry_continue_reverse(d, >domain_list, next) {
> + iommu_detach_pasid_table(d->domain);
> + }
> +unlock:
> + mutex_unlock(>lock);
> + return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1714,6 +1751,22 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   return copy_to_user((void __user *)arg, , minsz) ?
>   -EFAULT : 0;
> + } else if (cmd == VFIO_IOMMU_ATTACH_PASID_TABLE) {
> + struct vfio_iommu_type1_attach_pasid_table ustruct;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_attach_pasid_table,
> + config);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;

Who is responsible for validating the ustruct.config?
arm_smmu_attach_pasid_table() only looks at the format, ignoring the
version field :-\  In fact, where is struct iommu_pasid_smmuv3 ever used
from the config?  Should the padding fields be forced to zero?  We
don't have flags to incorporate use of them with future extensions, so
maybe we don't care?

> +
> + return vfio_attach_pasid_table(iommu, );
> + } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
> + vfio_detach_pasid_table(iommu);
> + return 0;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 02bb7ad6e986..329d378565d9 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -14,6 +14,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #define VFIO_API_VERSION 0
>  
> @@ -759,6 +760,22 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE   _IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**
> + * VFIO_IOMMU_ATTACH_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
> + *   struct vfio_iommu_type1_attach_pasid_table)
> + *
> + * Passes the PASID table to the host. Calling ATTACH_PASID_TABLE
> + * while a table is already installed is allowed: it replaces the old
> + * table. DETACH does a comprehensive tear down of the nested mode.
> + */
> +struct vfio_iommu_type1_attach_pasid_table {
> + __u32   argsz;
> + __u32   flags;
> + struct iommu_pasid_table_config config;
> +};
> +#define VFIO_IOMMU_ATTACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 22)
> +#define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23)
> +

DETACH should also be documented, it's not clear from the uapi that it
requires no parameters.  Thanks,

Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Jacob Pan
On Thu, 21 Mar 2019 15:32:45 +0100
Auger Eric  wrote:

> Hi jean, Jacob,
> 
> On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote:
> > On 21/03/2019 13:54, Auger Eric wrote:  
> >> Hi Jacob, Jean-Philippe,
> >>
> >> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:  
> >>> On 20/03/2019 16:37, Jacob Pan wrote:
> >>> [...]  
> > +struct iommu_inv_addr_info {
> > +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0)
> > +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1)
> > +#define IOMMU_INV_ADDR_FLAGS_LEAF  (1 << 2)
> > +   __u32   flags;
> > +   __u32   archid;
> > +   __u64   pasid;
> > +   __u64   addr;
> > +   __u64   granule_size;
> > +   __u64   nb_granules;
> > +};
> > +
> > +/**
> > + * First level/stage invalidation information
> > + * @cache: bitfield that allows to select which caches to
> > invalidate
> > + * @granularity: defines the lowest granularity used for the
> > invalidation:
> > + * domain > pasid > addr
> > + *
> > + * Not all the combinations of cache/granularity make sense:
> > + *
> > + * type |   DEV_IOTLB   | IOTLB |
> > PASID|
> > + * granularity |   |   |
> > cache   |
> > + *
> > -+---+---+---+
> > + * DOMAIN  |   N/A |   Y   |
> > Y   |
> > + * PASID   |   Y   |   Y   |
> > Y   |
> > + * ADDR|   Y   |   Y   |
> > N/A |
> > + */
> > +struct iommu_cache_invalidate_info {
> > +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > +   __u32   version;
> > +/* IOMMU paging structure cache */
> > +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU
> > IOTLB */ +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 <<
> > 1) /* Device IOTLB */ +#define
> > IOMMU_CACHE_INV_TYPE_PASID  (1 << 2) /* PASID cache */  
>  Just a clarification, this used to be an enum. You do intend to
>  issue a single invalidation request on multiple cache types?
>  Perhaps for virtio-IOMMU? I only see a single cache type in your
>  patch #14. For VT-d we plan to issue one cache type at a time
>  for now. So this format works for us.  
> >>>
> >>> Yes for virtio-iommu I'd like as little overhead as possible,
> >>> which means a single invalidation message to hit both IOTLB and
> >>> ATC at once, and the ability to specify multiple pages with
> >>> @nb_granules.  
> >> The original request/explanation from Jean-Philippe can be found
> >> here: https://lkml.org/lkml/2019/1/28/1497
> >>  
> >>>  
>  However, if multiple cache types are issued in a single
>  invalidation. They must share a single granularity, not all
>  combinations are valid. e.g. dev IOTLB does not support domain
>  granularity. Just a reminder, not an issue. Driver could filter
>  out invalid combinations.  
> >> Sure I will add a comment about this restriction.  
> >>>
> >>> Agreed. Even the core could filter out invalid combinations based
> >>> on the table above: IOTLB and domain granularity are N/A.  
> >> I don't get this sentence. What about vtd IOTLB domain-selective
> >> invalidation:  
> > 
> > My mistake: I meant dev-IOTLB and domain granularity are N/A  
> 
> Ah OK, no worries.
> 
> How do we proceed further with those user APIs? Besides the comment to
> be added above and previous suggestion from Jean ("Invalidations by
> @granularity use field ...), have we reached a consensus now on:
> 
> - attach/detach_pasid_table
> - cache_invalidate
> - fault data and fault report API?
> 
These APIs are sufficient for today's VT-d use and leave enough space
for extension. E.g. new fault reasons.

I have cherry picked the above APIs from your patchset to enable VT-d
nested translation. Going forward, I will reuse these until they get
merged.

> If not, please let me know.
> 
> Thanks
> 
> Eric
> 
> 
> > 
> > Thanks,
> > Jean
> >   
> >> "
> >> • IOTLB entries caching mappings associated with the specified
> >> domain-id are invalidated.
> >> • Paging-structure-cache entries caching mappings associated with
> >> the specified domain-id are invalidated.
> >> "
> >>
> >> Thanks
> >>
> >> Eric
> >>  
> >>>
> >>> Thanks,
> >>> Jean
> >>>  
>   
> > +   __u8cache;
> > +   __u8granularity;
> > +   __u8padding[2];
> > +   union {
> > +   __u64   pasid;
> > +   struct iommu_inv_addr_info addr_info;
> > +   };
> > +};
> > +
> > +
> >  #endif /* _UAPI_IOMMU_H */  
> 
>  [Jacob Pan]
>  ___
>  iommu mailing list
>  io...@lists.linux-foundation.org
>  https://lists.linuxfoundation.org/mailman/listinfo/iommu
>   
> >>>  
> >> ___

Re: [PATCH v6 02/22] iommu: introduce device fault data

2019-03-21 Thread Jacob Pan
On Sun, 17 Mar 2019 18:22:12 +0100
Eric Auger  wrote:

> From: Jacob Pan 
> 
> Device faults detected by IOMMU can be reported outside the IOMMU
> subsystem for further processing. This patch introduces
> a generic device fault data structure.
> 
> The fault can be either an unrecoverable fault or a page request,
> also referred to as a recoverable fault.
> 
> We only care about non internal faults that are likely to be reported
> to an external subsystem.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v4 -> v5:
> - simplified struct iommu_fault_event comment
> - Moved IOMMU_FAULT_PERM outside of the struct
> - Removed IOMMU_FAULT_PERM_INST
> - s/IOMMU_FAULT_PAGE_REQUEST_PASID_PRESENT/
>   IOMMU_FAULT_PAGE_REQUEST_PASID_VALID
> 
> v3 -> v4:
> - use a union containing aither an unrecoverable fault or a page
>   request message. Move the device private data in the page request
>   structure. Reshuffle the fields and use flags.
> - move fault perm attributes to the uapi
> - remove a bunch of iommu_fault_reason enum values that were related
>   to internal errors
> ---
>  include/linux/iommu.h  |  44 ++
>  include/uapi/linux/iommu.h | 115
> + 2 files changed, 159
> insertions(+) create mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ffbbc7e39cee..c6f398f7e6e0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IOMMU_READ   (1 << 0)
>  #define IOMMU_WRITE  (1 << 1)
> @@ -48,6 +49,7 @@ struct bus_type;
>  struct device;
>  struct iommu_domain;
>  struct notifier_block;
> +struct iommu_fault_event;
>  
>  /* iommu fault flags */
>  #define IOMMU_FAULT_READ 0x0
> @@ -55,6 +57,7 @@ struct notifier_block;
>  
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   struct device *, unsigned long, int, void *);
> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *); 
>  struct iommu_domain_geometry {
>   dma_addr_t aperture_start; /* First address that can be
> mapped*/ @@ -247,6 +250,46 @@ struct iommu_device {
>   struct device *dev;
>  };
>  
> +/**
> + * struct iommu_fault_event - Generic fault event
> + *
> + * Can represent recoverable faults such as a page requests or
> + * unrecoverable faults such as DMA or IRQ remapping faults.
> + *
> + * @fault: fault descriptor
> + * @iommu_private: used by the IOMMU driver for storing
> fault-specific
> + * data. Users should not modify this field before
> + * sending the fault response.
> + */
> +struct iommu_fault_event {
> + struct iommu_fault fault;
> + u64 iommu_private;
> +};
> +
> +/**
> + * struct iommu_fault_param - per-device IOMMU fault data
> + * @dev_fault_handler: Callback function to handle IOMMU faults at
> device level
> + * @data: handler private data
> + *
> + */
> +struct iommu_fault_param {
> + iommu_dev_fault_handler_t handler;
> + void *data;
> +};
> +
> +/**
> + * struct iommu_param - collection of per-device IOMMU data
> + *
> + * @fault_param: IOMMU detected device fault reporting data
> + *
> + * TODO: migrate other per device data pointers under
> iommu_dev_data, e.g.
> + *   struct iommu_group  *iommu_group;
> + *   struct iommu_fwspec *iommu_fwspec;
> + */
> +struct iommu_param {
> + struct iommu_fault_param *fault_param;
> +};
> +
>  int  iommu_device_register(struct iommu_device *iommu);
>  void iommu_device_unregister(struct iommu_device *iommu);
>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
> @@ -422,6 +465,7 @@ struct iommu_ops {};
>  struct iommu_group {};
>  struct iommu_fwspec {};
>  struct iommu_device {};
> +struct iommu_fault_param {};
>  
>  static inline bool iommu_present(struct bus_type *bus)
>  {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> new file mode 100644
> index ..edcc0dda7993
> --- /dev/null
> +++ b/include/uapi/linux/iommu.h
> @@ -0,0 +1,115 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * IOMMU user API definitions
> + */
> +
> +#ifndef _UAPI_IOMMU_H
> +#define _UAPI_IOMMU_H
> +
> +#include 
> +
> +#define IOMMU_FAULT_PERM_WRITE   (1 << 0) /* write */
> +#define IOMMU_FAULT_PERM_EXEC(1 << 1) /* exec */
> +#define IOMMU_FAULT_PERM_PRIV(1 << 2) /* privileged */
> +
> +/*  Generic fault types, can be expanded IRQ remapping fault */
> +enum iommu_fault_type {
> + IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
> + IOMMU_FAULT_PAGE_REQ,   /* page request fault */
> +};
> +
> +enum iommu_fault_reason {
> + IOMMU_FAULT_REASON_UNKNOWN = 0,
> +
> + /* Could not access the PASID table (fetch caused external
> abort) */
> + 

Re: [PATCH v6 03/22] iommu: introduce device fault report API

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:13 +0100
Eric Auger  wrote:

> From: Jacob Pan 
> 
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> This patch introduces a registration API for device specific fault
> handlers. This differs from the existing iommu_set_fault_handler/
> report_iommu_fault infrastructures in several ways:
> - it allows to report more sophisticated fault events (both
>   unrecoverable faults and page request faults) due to the nature
>   of the iommu_fault struct
> - it is device specific and not domain specific.
> 
> The current iommu_report_device_fault() implementation only handles
> the "shoot and forget" unrecoverable fault case. Handling of page
> request faults or stalled faults will come later.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v4 -> v5:
> - remove stuff related to recoverable faults
> ---
>  drivers/iommu/iommu.c | 134 +-
>  include/linux/iommu.h |  36 +++-
>  2 files changed, 168 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 33a982e33716..56d5bf68de53 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -648,6 +648,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -678,6 +685,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -724,7 +732,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -858,6 +866,130 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, this handler gets called with the
> + * fault event and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto done_unlock;
> + }
> +
> + get_device(dev);
> + param->fault_param =
> + kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
> + if (!param->fault_param) {
> + put_device(dev);
> + ret = -ENOMEM;
> + goto done_unlock;
> + }
> + mutex_init(>fault_param->lock);
> + param->fault_param->handler = handler;
> + param->fault_param->data = data;
> + INIT_LIST_HEAD(>fault_param->faults);
> +
> +done_unlock:
> + mutex_unlock(>lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
> +
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device fault 
> handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_unregister_device_fault_handler(struct device *dev)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> +
> + if (!param->fault_param)
> + 

Re: [PATCH v4 1/2] KVM: arm/arm64: Add save/restore support for firmware workaround state

2019-03-21 Thread Auger Eric
Hi Andre,

On 3/21/19 6:35 PM, Andre Przywara wrote:
> On Thu, 21 Mar 2019 13:54:07 +0100
> Auger Eric  wrote:
> 
> Hi Eric,
> 
> many thanks for looking at this!
> I was about to prepare a new revision.
> 
>> Hi Andre,
>>
>> On 3/1/19 12:43 AM, Andre Przywara wrote:
>>> KVM implements the firmware interface for mitigating cache speculation
>>> vulnerabilities. Guests may use this interface to ensure mitigation is
>>> active.
>>> If we want to migrate such a guest to a host with a different support
>>> level for those workarounds, migration might need to fail, to ensure that
>>> critical guests don't loose their protection.
>>>
>>> Introduce a way for userland to save and restore the workarounds state.
>>> On restoring we do checks that make sure we don't downgrade our
>>> mitigation level.
>>>
>>> Signed-off-by: Andre Przywara 
>>> ---
>>>  arch/arm/include/asm/kvm_emulate.h   |  10 +++
>>>  arch/arm/include/uapi/asm/kvm.h  |  10 +++
>>>  arch/arm64/include/asm/kvm_emulate.h |  14 +++
>>>  arch/arm64/include/uapi/asm/kvm.h|   9 ++
>>>  virt/kvm/arm/psci.c  | 128 +++
>>>  5 files changed, 155 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_emulate.h 
>>> b/arch/arm/include/asm/kvm_emulate.h
>>> index 8927cae7c966..663a02d7e6f4 100644
>>> --- a/arch/arm/include/asm/kvm_emulate.h
>>> +++ b/arch/arm/include/asm/kvm_emulate.h
>>> @@ -283,6 +283,16 @@ static inline unsigned long 
>>> kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>>> return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
>>>  }
>>>  
>>> +static inline bool kvm_arm_get_vcpu_workaround_2_flag(struct kvm_vcpu 
>>> *vcpu)
>>> +{
>>> +   return false;
>>> +}
>>> +
>>> +static inline void kvm_arm_set_vcpu_workaround_2_flag(struct kvm_vcpu 
>>> *vcpu,
>>> + bool flag)
>>> +{
>>> +}
>>> +
>>>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
>>>  {
>>> *vcpu_cpsr(vcpu) |= PSR_E_BIT;
>>> diff --git a/arch/arm/include/uapi/asm/kvm.h 
>>> b/arch/arm/include/uapi/asm/kvm.h
>>> index 4602464ebdfb..ba4d2afe65e3 100644
>>> --- a/arch/arm/include/uapi/asm/kvm.h
>>> +++ b/arch/arm/include/uapi/asm/kvm.h
>>> @@ -214,6 +214,16 @@ struct kvm_vcpu_events {
>>>  #define KVM_REG_ARM_FW_REG(r)  (KVM_REG_ARM | KVM_REG_SIZE_U64 
>>> | \
>>>  KVM_REG_ARM_FW | ((r) & 0x))
>>>  #define KVM_REG_ARM_PSCI_VERSION   KVM_REG_ARM_FW_REG(0)
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1KVM_REG_ARM_FW_REG(1)
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL  0
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL  1
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_UNAFFECTED 2
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2KVM_REG_ARM_FW_REG(2)
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL  0
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN1  
>> Would be worth adding a comment saying that values are chosen so that
>> higher values mean better protection. Otherwise it looks strange
>> NOT_AVAIL/AVAIL/UNAFFECTED values are not the same for both workarounds. 
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL  2
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNAFFECTED 3
>>> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED(1U << 4)  
>>
>>>  
>>>  /* Device Control API: ARM VGIC */
>>>  #define KVM_DEV_ARM_VGIC_GRP_ADDR  0
>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>>> b/arch/arm64/include/asm/kvm_emulate.h
>>> index d3842791e1c4..c00c17c9adb6 100644
>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>> @@ -348,6 +348,20 @@ static inline unsigned long 
>>> kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>>> return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>>>  }
>>>  
>>> +static inline bool kvm_arm_get_vcpu_workaround_2_flag(struct kvm_vcpu 
>>> *vcpu)
>>> +{
>>> +   return vcpu->arch.workaround_flags & VCPU_WORKAROUND_2_FLAG;
>>> +}
>>> +
>>> +static inline void kvm_arm_set_vcpu_workaround_2_flag(struct kvm_vcpu 
>>> *vcpu,
>>> + bool flag)
>>> +{
>>> +   if (flag)
>>> +   vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
>>> +   else
>>> +   vcpu->arch.workaround_flags &= ~VCPU_WORKAROUND_2_FLAG;
>>> +}
>>> +
>>>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
>>>  {
>>> if (vcpu_mode_is_32bit(vcpu)) {
>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
>>> b/arch/arm64/include/uapi/asm/kvm.h
>>> index 97c3478ee6e7..367e96fe654e 100644
>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>> @@ -225,6 +225,15 @@ struct kvm_vcpu_events {
>>>  #define KVM_REG_ARM_FW_REG(r)  (KVM_REG_ARM64 | 
>>> KVM_REG_SIZE_U64 | \
>>>  KVM_REG_ARM_FW | ((r) & 0x))
>>>  

Re: [PATCH v4 1/2] KVM: arm/arm64: Add save/restore support for firmware workaround state

2019-03-21 Thread Andre Przywara
On Thu, 21 Mar 2019 13:54:07 +0100
Auger Eric  wrote:

Hi Eric,

many thanks for looking at this!
I was about to prepare a new revision.

> Hi Andre,
> 
> On 3/1/19 12:43 AM, Andre Przywara wrote:
> > KVM implements the firmware interface for mitigating cache speculation
> > vulnerabilities. Guests may use this interface to ensure mitigation is
> > active.
> > If we want to migrate such a guest to a host with a different support
> > level for those workarounds, migration might need to fail, to ensure that
> > critical guests don't loose their protection.
> > 
> > Introduce a way for userland to save and restore the workarounds state.
> > On restoring we do checks that make sure we don't downgrade our
> > mitigation level.
> > 
> > Signed-off-by: Andre Przywara 
> > ---
> >  arch/arm/include/asm/kvm_emulate.h   |  10 +++
> >  arch/arm/include/uapi/asm/kvm.h  |  10 +++
> >  arch/arm64/include/asm/kvm_emulate.h |  14 +++
> >  arch/arm64/include/uapi/asm/kvm.h|   9 ++
> >  virt/kvm/arm/psci.c  | 128 +++
> >  5 files changed, 155 insertions(+), 16 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_emulate.h 
> > b/arch/arm/include/asm/kvm_emulate.h
> > index 8927cae7c966..663a02d7e6f4 100644
> > --- a/arch/arm/include/asm/kvm_emulate.h
> > +++ b/arch/arm/include/asm/kvm_emulate.h
> > @@ -283,6 +283,16 @@ static inline unsigned long 
> > kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> > return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
> >  }
> >  
> > +static inline bool kvm_arm_get_vcpu_workaround_2_flag(struct kvm_vcpu 
> > *vcpu)
> > +{
> > +   return false;
> > +}
> > +
> > +static inline void kvm_arm_set_vcpu_workaround_2_flag(struct kvm_vcpu 
> > *vcpu,
> > + bool flag)
> > +{
> > +}
> > +
> >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> >  {
> > *vcpu_cpsr(vcpu) |= PSR_E_BIT;
> > diff --git a/arch/arm/include/uapi/asm/kvm.h 
> > b/arch/arm/include/uapi/asm/kvm.h
> > index 4602464ebdfb..ba4d2afe65e3 100644
> > --- a/arch/arm/include/uapi/asm/kvm.h
> > +++ b/arch/arm/include/uapi/asm/kvm.h
> > @@ -214,6 +214,16 @@ struct kvm_vcpu_events {
> >  #define KVM_REG_ARM_FW_REG(r)  (KVM_REG_ARM | KVM_REG_SIZE_U64 
> > | \
> >  KVM_REG_ARM_FW | ((r) & 0x))
> >  #define KVM_REG_ARM_PSCI_VERSION   KVM_REG_ARM_FW_REG(0)
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1KVM_REG_ARM_FW_REG(1)
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL  0
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL  1
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_UNAFFECTED 2
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2KVM_REG_ARM_FW_REG(2)
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL  0
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN1  
> Would be worth adding a comment saying that values are chosen so that
> higher values mean better protection. Otherwise it looks strange
> NOT_AVAIL/AVAIL/UNAFFECTED values are not the same for both workarounds.  
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL  2
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNAFFECTED 3
> > +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED(1U << 4)  
> 
> >  
> >  /* Device Control API: ARM VGIC */
> >  #define KVM_DEV_ARM_VGIC_GRP_ADDR  0
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index d3842791e1c4..c00c17c9adb6 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -348,6 +348,20 @@ static inline unsigned long 
> > kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> > return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> >  }
> >  
> > +static inline bool kvm_arm_get_vcpu_workaround_2_flag(struct kvm_vcpu 
> > *vcpu)
> > +{
> > +   return vcpu->arch.workaround_flags & VCPU_WORKAROUND_2_FLAG;
> > +}
> > +
> > +static inline void kvm_arm_set_vcpu_workaround_2_flag(struct kvm_vcpu 
> > *vcpu,
> > + bool flag)
> > +{
> > +   if (flag)
> > +   vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
> > +   else
> > +   vcpu->arch.workaround_flags &= ~VCPU_WORKAROUND_2_FLAG;
> > +}
> > +
> >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> >  {
> > if (vcpu_mode_is_32bit(vcpu)) {
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 97c3478ee6e7..367e96fe654e 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -225,6 +225,15 @@ struct kvm_vcpu_events {
> >  #define KVM_REG_ARM_FW_REG(r)  (KVM_REG_ARM64 | 
> > KVM_REG_SIZE_U64 | \
> >  KVM_REG_ARM_FW | ((r) & 0x))
> >  #define KVM_REG_ARM_PSCI_VERSION   KVM_REG_ARM_FW_REG(0)
> > +#define 

Re: [PATCH RFC 01/14] arm64/mm: Introduce asid_info structure and move asid_generation/asid_map to it

2019-03-21 Thread Julien Grall

On 3/21/19 5:03 PM, Suzuki K Poulose wrote:

Hi Julien,


Hi Suzuki,


On 21/03/2019 16:36, Julien Grall wrote:

In an attempt to make the ASID allocator generic, create a new structure
asid_info to store all the information necessary for the allocator.

For now, move the variables asid_generation and asid_map to the new 
structure

asid_info. Follow-up patches will move more variables.

Note to avoid more renaming aftwards, a local variable 'info' has been
created and is a pointer to the ASID allocator structure.

Signed-off-by: Julien Grall 
---
  arch/arm64/mm/context.c | 46 
++

  1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 1f0ea2facf24..34db54f1a39a 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -30,8 +30,11 @@
  static u32 asid_bits;
  static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
-static atomic64_t asid_generation;
-static unsigned long *asid_map;
+struct asid_info
+{
+    atomic64_t    generation;
+    unsigned long    *map;
+} asid_info;


Shouldn't this be static ? Rest looks fine.


Yes it should be static. I have updated my code.

Thank you for the review!

Cheers,



Cheers
Suzuki


--
Julien Grall
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 01/14] arm64/mm: Introduce asid_info structure and move asid_generation/asid_map to it

2019-03-21 Thread Suzuki K Poulose

Hi Julien,

On 21/03/2019 16:36, Julien Grall wrote:

In an attempt to make the ASID allocator generic, create a new structure
asid_info to store all the information necessary for the allocator.

For now, move the variables asid_generation and asid_map to the new structure
asid_info. Follow-up patches will move more variables.

Note to avoid more renaming aftwards, a local variable 'info' has been
created and is a pointer to the ASID allocator structure.

Signed-off-by: Julien Grall 
---
  arch/arm64/mm/context.c | 46 ++
  1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 1f0ea2facf24..34db54f1a39a 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -30,8 +30,11 @@
  static u32 asid_bits;
  static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
  
-static atomic64_t asid_generation;

-static unsigned long *asid_map;
+struct asid_info
+{
+   atomic64_t  generation;
+   unsigned long   *map;
+} asid_info;


Shouldn't this be static ? Rest looks fine.

Cheers
Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 10/14] arm64/mm: Introduce a callback to flush the local context

2019-03-21 Thread Julien Grall
Flushing the local context will vary depending on the actual user of the ASID
allocator. Introduce a new callback to flush the local context and move
the call to flush local TLB in it.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index cbf1c24cb3ee..678a57b77c91 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -39,6 +39,8 @@ struct asid_info
cpumask_t   flush_pending;
/* Number of ASID allocated by context (shift value) */
unsigned intctxt_shift;
+   /* Callback to locally flush the context. */
+   void(*flush_cpu_ctxt_cb)(void);
 } asid_info;
 
 #define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
@@ -266,7 +268,7 @@ static void asid_new_context(struct asid_info *info, 
atomic64_t *pasid,
}
 
if (cpumask_test_and_clear_cpu(cpu, >flush_pending))
-   local_flush_tlb_all();
+   info->flush_cpu_ctxt_cb();
 
atomic64_set(_asid(info, cpu), asid);
raw_spin_unlock_irqrestore(>lock, flags);
@@ -298,6 +300,11 @@ asmlinkage void post_ttbr_update_workaround(void)
CONFIG_CAVIUM_ERRATUM_27456));
 }
 
+static void asid_flush_cpu_ctxt(void)
+{
+   local_flush_tlb_all();
+}
+
 /*
  * Initialize the ASID allocator
  *
@@ -308,10 +315,12 @@ asmlinkage void post_ttbr_update_workaround(void)
  * 2.
  */
 static int asid_allocator_init(struct asid_info *info,
-  u32 bits, unsigned int asid_per_ctxt)
+  u32 bits, unsigned int asid_per_ctxt,
+  void (*flush_cpu_ctxt_cb)(void))
 {
info->bits = bits;
info->ctxt_shift = ilog2(asid_per_ctxt);
+   info->flush_cpu_ctxt_cb = flush_cpu_ctxt_cb;
/*
 * Expect allocation after rollover to fail if we don't have at least
 * one more ASID than CPUs. ASID #0 is always reserved.
@@ -332,7 +341,8 @@ static int asids_init(void)
 {
u32 bits = get_cpu_asid_bits();
 
-   if (!asid_allocator_init(_info, bits, ASID_PER_CONTEXT))
+   if (!asid_allocator_init(_info, bits, ASID_PER_CONTEXT,
+asid_flush_cpu_ctxt))
panic("Unable to initialize ASID allocator for %lu ASIDs\n",
  1UL << bits);
 
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 09/14] arm64/mm: Split the function check_and_switch_context in 3 parts

2019-03-21 Thread Julien Grall
The function check_and_switch_context is used to:
1) Check whether the ASID is still valid
2) Generate a new one if it is not valid
3) Switch the context

While the latter is specific to the MM subsystem, the rest could be part
of the generic ASID allocator.

After this patch, the function is now split in 3 parts which corresponds
to the use of the functions:
1) asid_check_context: Check if the ASID is still valid
2) asid_new_context: Generate a new ASID for the context
3) check_and_switch_context: Call 1) and 2) and switch the context

1) and 2) have not been merged in a single function because we want to
avoid to add a branch in when the ASID is still valid. This will matter
when the code will be moved in separate file later on as 1) will reside
in the header as a static inline function.

Signed-off-by: Julien Grall 

---

Will wants to avoid to add a branch when the ASID is still valid. So
1) and 2) are in separates function. The former will move to a new
header and make static inline.
---
 arch/arm64/mm/context.c | 51 +
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index b071a1b3469e..cbf1c24cb3ee 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -204,16 +204,21 @@ static u64 new_context(struct asid_info *info, atomic64_t 
*pasid)
return idx2asid(info, asid) | generation;
 }
 
-void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
+static void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+unsigned int cpu);
+
+/*
+ * Check the ASID is still valid for the context. If not generate a new ASID.
+ *
+ * @pasid: Pointer to the current ASID batch
+ * @cpu: current CPU ID. Must have been acquired throught get_cpu()
+ */
+static void asid_check_context(struct asid_info *info,
+  atomic64_t *pasid, unsigned int cpu)
 {
-   unsigned long flags;
u64 asid, old_active_asid;
-   struct asid_info *info = _info;
 
-   if (system_supports_cnp())
-   cpu_set_reserved_ttbr0();
-
-   asid = atomic64_read(>context.id);
+   asid = atomic64_read(pasid);
 
/*
 * The memory ordering here is subtle.
@@ -234,14 +239,30 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
!((asid ^ atomic64_read(>generation)) >> info->bits) &&
atomic64_cmpxchg_relaxed(_asid(info, cpu),
 old_active_asid, asid))
-   goto switch_mm_fastpath;
+   return;
+
+   asid_new_context(info, pasid, cpu);
+}
+
+/*
+ * Generate a new ASID for the context.
+ *
+ * @pasid: Pointer to the current ASID batch allocated. It will be updated
+ * with the new ASID batch.
+ * @cpu: current CPU ID. Must have been acquired through get_cpu()
+ */
+static void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+unsigned int cpu)
+{
+   unsigned long flags;
+   u64 asid;
 
raw_spin_lock_irqsave(>lock, flags);
/* Check that our ASID belongs to the current generation. */
-   asid = atomic64_read(>context.id);
+   asid = atomic64_read(pasid);
if ((asid ^ atomic64_read(>generation)) >> info->bits) {
-   asid = new_context(info, >context.id);
-   atomic64_set(>context.id, asid);
+   asid = new_context(info, pasid);
+   atomic64_set(pasid, asid);
}
 
if (cpumask_test_and_clear_cpu(cpu, >flush_pending))
@@ -249,8 +270,14 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
 
atomic64_set(_asid(info, cpu), asid);
raw_spin_unlock_irqrestore(>lock, flags);
+}
+
+void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
+{
+   if (system_supports_cnp())
+   cpu_set_reserved_ttbr0();
 
-switch_mm_fastpath:
+   asid_check_context(_info, >context.id, cpu);
 
arm64_apply_bp_hardening();
 
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 05/14] arm64/mm: Remove dependency on MM in new_context

2019-03-21 Thread Julien Grall
The function new_context will be part of a generic ASID allocator. At
the moment, the MM structure is only used to fetch the ASID.

To remove the dependency on MM, it is possible to just pass a pointer to
the current ASID.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index e98ab348b9cb..488845c39c39 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -151,10 +151,10 @@ static bool check_update_reserved_asid(struct asid_info 
*info, u64 asid,
return hit;
 }
 
-static u64 new_context(struct asid_info *info, struct mm_struct *mm)
+static u64 new_context(struct asid_info *info, atomic64_t *pasid)
 {
static u32 cur_idx = 1;
-   u64 asid = atomic64_read(>context.id);
+   u64 asid = atomic64_read(pasid);
u64 generation = atomic64_read(>generation);
 
if (asid != 0) {
@@ -236,7 +236,7 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
/* Check that our ASID belongs to the current generation. */
asid = atomic64_read(>context.id);
if ((asid ^ atomic64_read(>generation)) >> info->bits) {
-   asid = new_context(info, mm);
+   asid = new_context(info, >context.id);
atomic64_set(>context.id, asid);
}
 
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 03/14] arm64/mm: Move bits to asid_info

2019-03-21 Thread Julien Grall
The variable bits hold information for a given ASID allocator. So move
it to the asid_info structure.

Because most of the macros were relying on bits, they are now taking an
extra parameter that is a pointer to the asid_info structure.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 59 +
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index cfe4c5f7abf3..da17ed6c7117 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 
-static u32 asid_bits;
 static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
 
 struct asid_info
@@ -36,6 +35,7 @@ struct asid_info
unsigned long   *map;
atomic64_t __percpu *active;
u64 __percpu*reserved;
+   u32 bits;
 } asid_info;
 
 #define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
@@ -46,17 +46,17 @@ static DEFINE_PER_CPU(u64, reserved_asids);
 
 static cpumask_t tlb_flush_pending;
 
-#define ASID_MASK  (~GENMASK(asid_bits - 1, 0))
-#define ASID_FIRST_VERSION (1UL << asid_bits)
+#define ASID_MASK(info)(~GENMASK((info)->bits - 1, 0))
+#define ASID_FIRST_VERSION(info)   (1UL << ((info)->bits))
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-#define NUM_USER_ASIDS (ASID_FIRST_VERSION >> 1)
-#define asid2idx(asid) (((asid) & ~ASID_MASK) >> 1)
-#define idx2asid(idx)  (((idx) << 1) & ~ASID_MASK)
+#define NUM_USER_ASIDS(info)   (ASID_FIRST_VERSION(info) >> 1)
+#define asid2idx(info, asid)   (((asid) & ~ASID_MASK(info)) >> 1)
+#define idx2asid(info, idx)(((idx) << 1) & ~ASID_MASK(info))
 #else
-#define NUM_USER_ASIDS (ASID_FIRST_VERSION)
-#define asid2idx(asid) ((asid) & ~ASID_MASK)
-#define idx2asid(idx)  asid2idx(idx)
+#define NUM_USER_ASIDS(info)   (ASID_FIRST_VERSION(info))
+#define asid2idx(info, asid)   ((asid) & ~ASID_MASK(info))
+#define idx2asid(info, idx)asid2idx(info, idx)
 #endif
 
 /* Get the ASIDBits supported by the current CPU */
@@ -86,13 +86,13 @@ void verify_cpu_asid_bits(void)
 {
u32 asid = get_cpu_asid_bits();
 
-   if (asid < asid_bits) {
+   if (asid < asid_info.bits) {
/*
 * We cannot decrease the ASID size at runtime, so panic if we 
support
 * fewer ASID bits than the boot CPU.
 */
pr_crit("CPU%d: smaller ASID size(%u) than boot CPU (%u)\n",
-   smp_processor_id(), asid, asid_bits);
+   smp_processor_id(), asid, asid_info.bits);
cpu_panic_kernel();
}
 }
@@ -103,7 +103,7 @@ static void flush_context(struct asid_info *info)
u64 asid;
 
/* Update the list of reserved ASIDs and the ASID bitmap. */
-   bitmap_clear(info->map, 0, NUM_USER_ASIDS);
+   bitmap_clear(info->map, 0, NUM_USER_ASIDS(info));
 
for_each_possible_cpu(i) {
asid = atomic64_xchg_relaxed(_asid(info, i), 0);
@@ -116,7 +116,7 @@ static void flush_context(struct asid_info *info)
 */
if (asid == 0)
asid = reserved_asid(info, i);
-   __set_bit(asid2idx(asid), info->map);
+   __set_bit(asid2idx(info, asid), info->map);
reserved_asid(info, i) = asid;
}
 
@@ -159,7 +159,7 @@ static u64 new_context(struct asid_info *info, struct 
mm_struct *mm)
u64 generation = atomic64_read(>generation);
 
if (asid != 0) {
-   u64 newasid = generation | (asid & ~ASID_MASK);
+   u64 newasid = generation | (asid & ~ASID_MASK(info));
 
/*
 * If our current ASID was active during a rollover, we
@@ -172,7 +172,7 @@ static u64 new_context(struct asid_info *info, struct 
mm_struct *mm)
 * We had a valid ASID in a previous life, so try to re-use
 * it if possible.
 */
-   if (!__test_and_set_bit(asid2idx(asid), info->map))
+   if (!__test_and_set_bit(asid2idx(info, asid), info->map))
return newasid;
}
 
@@ -183,22 +183,22 @@ static u64 new_context(struct asid_info *info, struct 
mm_struct *mm)
 * a reserved TTBR0 for the init_mm and we allocate ASIDs in even/odd
 * pairs.
 */
-   asid = find_next_zero_bit(info->map, NUM_USER_ASIDS, cur_idx);
-   if (asid != NUM_USER_ASIDS)
+   asid = find_next_zero_bit(info->map, NUM_USER_ASIDS(info), cur_idx);
+   if (asid != NUM_USER_ASIDS(info))
goto set_asid;
 
/* We're out of ASIDs, so increment the global generation count */
-   generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
+   generation = 

[PATCH RFC 07/14] arm64/mm: Introduce NUM_ASIDS

2019-03-21 Thread Julien Grall
At the moment ASID_FIRST_VERSION is used to know the number of ASIDs
supported. As we are going to move the ASID allocator in a separate, it
would be better to use a different name for external users.

This patch adds NUM_ASIDS and implements ASID_FIRST_VERSION using it.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 5a4c2b1aac71..fb13bc249951 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -48,7 +48,9 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 
 #define ASID_MASK(info)(~GENMASK((info)->bits - 1, 0))
-#define ASID_FIRST_VERSION(info)   (1UL << ((info)->bits))
+#define NUM_ASIDS(info)(1UL << ((info)->bits))
+
+#define ASID_FIRST_VERSION(info)   NUM_ASIDS(info)
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 #define ASID_PER_CONTEXT   2
@@ -56,7 +58,7 @@ static DEFINE_PER_CPU(u64, reserved_asids);
 #define ASID_PER_CONTEXT   1
 #endif
 
-#define NUM_CTXT_ASIDS(info)   (ASID_FIRST_VERSION(info) >> 
(info)->ctxt_shift)
+#define NUM_CTXT_ASIDS(info)   (NUM_ASIDS(info) >> (info)->ctxt_shift)
 #define asid2idx(info, asid)   (((asid) & ~ASID_MASK(info)) >> 
(info)->ctxt_shift)
 #define idx2asid(info, idx)(((idx) << (info)->ctxt_shift) & 
~ASID_MASK(info))
 
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 08/14] arm64/mm: Split asid_inits in 2 parts

2019-03-21 Thread Julien Grall
Move out the common initialization of the ASID allocator in a separate
function.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 43 +++
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index fb13bc249951..b071a1b3469e 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -271,31 +271,50 @@ asmlinkage void post_ttbr_update_workaround(void)
CONFIG_CAVIUM_ERRATUM_27456));
 }
 
-static int asids_init(void)
+/*
+ * Initialize the ASID allocator
+ *
+ * @info: Pointer to the asid allocator structure
+ * @bits: Number of ASIDs available
+ * @asid_per_ctxt: Number of ASIDs to allocate per-context. ASIDs are
+ * allocated contiguously for a given context. This value should be a power of
+ * 2.
+ */
+static int asid_allocator_init(struct asid_info *info,
+  u32 bits, unsigned int asid_per_ctxt)
 {
-   struct asid_info *info = _info;
-
-   info->bits = get_cpu_asid_bits();
-   info->ctxt_shift = ilog2(ASID_PER_CONTEXT);
+   info->bits = bits;
+   info->ctxt_shift = ilog2(asid_per_ctxt);
/*
 * Expect allocation after rollover to fail if we don't have at least
-* one more ASID than CPUs. ASID #0 is reserved for init_mm.
+* one more ASID than CPUs. ASID #0 is always reserved.
 */
WARN_ON(NUM_CTXT_ASIDS(info) - 1 <= num_possible_cpus());
atomic64_set(>generation, ASID_FIRST_VERSION(info));
info->map = kcalloc(BITS_TO_LONGS(NUM_CTXT_ASIDS(info)),
sizeof(*info->map), GFP_KERNEL);
if (!info->map)
-   panic("Failed to allocate bitmap for %lu ASIDs\n",
- NUM_CTXT_ASIDS(info));
-
-   info->active = _asids;
-   info->reserved = _asids;
+   return -ENOMEM;
 
raw_spin_lock_init(>lock);
 
+   return 0;
+}
+
+static int asids_init(void)
+{
+   u32 bits = get_cpu_asid_bits();
+
+   if (!asid_allocator_init(_info, bits, ASID_PER_CONTEXT))
+   panic("Unable to initialize ASID allocator for %lu ASIDs\n",
+ 1UL << bits);
+
+   asid_info.active = _asids;
+   asid_info.reserved = _asids;
+
pr_info("ASID allocator initialised with %lu entries\n",
-   NUM_CTXT_ASIDS(info));
+   NUM_CTXT_ASIDS(_info));
+
return 0;
 }
 early_initcall(asids_init);
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 14/14] kvm/arm: Align the VMID allocation with the arm64 ASID one

2019-03-21 Thread Julien Grall
At the moment, the VMID algorithm will send an SGI to all the CPUs to
force an exit and then broadcast a full TLB flush and I-Cache
invalidation.

This patch re-use the new ASID allocator. The
benefits are:
- CPUs are not forced to exit at roll-over. Instead the VMID will be
marked reserved and the context will be flushed at next exit. This
will reduce the IPIs traffic.
- Context invalidation is now per-CPU rather than broadcasted.

With the new algo, the code is now adapted:
- The function __kvm_flush_vm_context() has been renamed to
__kvm_flush_cpu_vmid_context and now only flushing the current CPU context.
- The call to update_vttbr() will be done with preemption disabled
as the new algo requires to store information per-CPU.
- The TLBs associated to EL1 will be flushed when booting a CPU to
deal with stale information. This was previously done on the
allocation of the first VMID of a new generation.

The measurement was made on a Seattle based SoC (8 CPUs), with the
number of VMID limited to 4-bit. The test involves running concurrently 40
guests with 2 vCPUs. Each guest will then execute hackbench 5 times
before exiting.

The performance difference between the current algo and the new one are:
- 2.5% less exit from the guest
- 22.4% more flush, although they are now local rather than
broadcasted
- 0.11% faster (just for the record)

Signed-off-by: Julien Grall 


Looking at the __kvm_flush_vm_context, it might be possible to
reduce more the overhead by removing the I-Cache flush for other
cache than VIPT. This has been left aside for now.
---
 arch/arm/include/asm/kvm_asm.h|   2 +-
 arch/arm/include/asm/kvm_host.h   |   5 +-
 arch/arm/include/asm/kvm_hyp.h|   1 +
 arch/arm/kvm/hyp/tlb.c|   8 +--
 arch/arm64/include/asm/kvm_asid.h |   8 +++
 arch/arm64/include/asm/kvm_asm.h  |   2 +-
 arch/arm64/include/asm/kvm_host.h |   5 +-
 arch/arm64/kvm/hyp/tlb.c  |  10 ++--
 virt/kvm/arm/arm.c| 112 +-
 9 files changed, 61 insertions(+), 92 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_asid.h

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 35491af87985..ce60a4a46fcc 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -65,7 +65,7 @@ struct kvm_vcpu;
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
 
-extern void __kvm_flush_vm_context(void);
+extern void __kvm_flush_cpu_vmid_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 770d73257ad9..e2c3a4a7b020 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -59,8 +59,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
 struct kvm_vmid {
-   /* The VMID generation used for the virt. memory system */
-   u64vmid_gen;
+   /* The ASID used for the ASID allocator */
+   atomic64_t asid;
u32vmid;
 };
 
@@ -264,7 +264,6 @@ unsigned long __kvm_call_hyp(void *hypfn, ...);
ret;\
})
 
-void force_vm_exit(const cpumask_t *mask);
 int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
  struct kvm_vcpu_events *events);
 
diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 87bcd18df8d5..c3d1011ca1bf 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -75,6 +75,7 @@
 #define TLBIALLIS  __ACCESS_CP15(c8, 0, c3, 0)
 #define TLBIALL__ACCESS_CP15(c8, 0, c7, 0)
 #define TLBIALLNSNHIS  __ACCESS_CP15(c8, 4, c3, 4)
+#define TLBIALLNSNH__ACCESS_CP15(c8, 4, c7, 4)
 #define PRRR   __ACCESS_CP15(c10, 0, c2, 0)
 #define NMRR   __ACCESS_CP15(c10, 0, c2, 1)
 #define AMAIR0 __ACCESS_CP15(c10, 0, c3, 0)
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 8e4afba73635..42b9ab47fc94 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -71,9 +71,9 @@ void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu 
*vcpu)
write_sysreg(0, VTTBR);
 }
 
-void __hyp_text __kvm_flush_vm_context(void)
+void __hyp_text __kvm_flush_cpu_vmid_context(void)
 {
-   write_sysreg(0, TLBIALLNSNHIS);
-   write_sysreg(0, ICIALLUIS);
-   dsb(ish);
+   write_sysreg(0, TLBIALLNSNH);
+   write_sysreg(0, ICIALLU);
+   dsb(nsh);
 }
diff --git a/arch/arm64/include/asm/kvm_asid.h 
b/arch/arm64/include/asm/kvm_asid.h
new file mode 100644
index ..8b586e43c094
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_asid.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: 

[PATCH RFC 02/14] arm64/mm: Move active_asids and reserved_asids to asid_info

2019-03-21 Thread Julien Grall
The variables active_asids and reserved_asids hold information for a
given ASID allocator. So move them to the structure asid_info.

At the same time, introduce wrappers to access the active and reserved
ASIDs to make the code clearer.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 34db54f1a39a..cfe4c5f7abf3 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -34,10 +34,16 @@ struct asid_info
 {
atomic64_t  generation;
unsigned long   *map;
+   atomic64_t __percpu *active;
+   u64 __percpu*reserved;
 } asid_info;
 
+#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
+#define reserved_asid(info, cpu) *per_cpu_ptr((info)->reserved, cpu)
+
 static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
+
 static cpumask_t tlb_flush_pending;
 
 #define ASID_MASK  (~GENMASK(asid_bits - 1, 0))
@@ -100,7 +106,7 @@ static void flush_context(struct asid_info *info)
bitmap_clear(info->map, 0, NUM_USER_ASIDS);
 
for_each_possible_cpu(i) {
-   asid = atomic64_xchg_relaxed(_cpu(active_asids, i), 0);
+   asid = atomic64_xchg_relaxed(_asid(info, i), 0);
/*
 * If this CPU has already been through a
 * rollover, but hasn't run another task in
@@ -109,9 +115,9 @@ static void flush_context(struct asid_info *info)
 * the process it is still running.
 */
if (asid == 0)
-   asid = per_cpu(reserved_asids, i);
+   asid = reserved_asid(info, i);
__set_bit(asid2idx(asid), info->map);
-   per_cpu(reserved_asids, i) = asid;
+   reserved_asid(info, i) = asid;
}
 
/*
@@ -121,7 +127,8 @@ static void flush_context(struct asid_info *info)
cpumask_setall(_flush_pending);
 }
 
-static bool check_update_reserved_asid(u64 asid, u64 newasid)
+static bool check_update_reserved_asid(struct asid_info *info, u64 asid,
+  u64 newasid)
 {
int cpu;
bool hit = false;
@@ -136,9 +143,9 @@ static bool check_update_reserved_asid(u64 asid, u64 
newasid)
 * generation.
 */
for_each_possible_cpu(cpu) {
-   if (per_cpu(reserved_asids, cpu) == asid) {
+   if (reserved_asid(info, cpu) == asid) {
hit = true;
-   per_cpu(reserved_asids, cpu) = newasid;
+   reserved_asid(info, cpu) = newasid;
}
}
 
@@ -158,7 +165,7 @@ static u64 new_context(struct asid_info *info, struct 
mm_struct *mm)
 * If our current ASID was active during a rollover, we
 * can continue to use it and this was just a false alarm.
 */
-   if (check_update_reserved_asid(asid, newasid))
+   if (check_update_reserved_asid(info, asid, newasid))
return newasid;
 
/*
@@ -207,8 +214,8 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
 
/*
 * The memory ordering here is subtle.
-* If our active_asids is non-zero and the ASID matches the current
-* generation, then we update the active_asids entry with a relaxed
+* If our active_asid is non-zero and the ASID matches the current
+* generation, then we update the active_asid entry with a relaxed
 * cmpxchg. Racing with a concurrent rollover means that either:
 *
 * - We get a zero back from the cmpxchg and end up waiting on the
@@ -219,10 +226,10 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
 *   relaxed xchg in flush_context will treat us as reserved
 *   because atomic RmWs are totally ordered for a given location.
 */
-   old_active_asid = atomic64_read(_cpu(active_asids, cpu));
+   old_active_asid = atomic64_read(_asid(info, cpu));
if (old_active_asid &&
!((asid ^ atomic64_read(>generation)) >> asid_bits) &&
-   atomic64_cmpxchg_relaxed(_cpu(active_asids, cpu),
+   atomic64_cmpxchg_relaxed(_asid(info, cpu),
 old_active_asid, asid))
goto switch_mm_fastpath;
 
@@ -237,7 +244,7 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
if (cpumask_test_and_clear_cpu(cpu, _flush_pending))
local_flush_tlb_all();
 
-   atomic64_set(_cpu(active_asids, cpu), asid);
+   atomic64_set(_asid(info, cpu), asid);
raw_spin_unlock_irqrestore(_asid_lock, flags);
 
 switch_mm_fastpath:
@@ -278,6 +285,9 @@ static int asids_init(void)

[PATCH RFC 12/14] arm64/lib: asid: Allow user to update the context under the lock

2019-03-21 Thread Julien Grall
Some users of the ASID allocator (e.g VMID) will require to update the
context when a new ASID is generated. This has to be protected by a lock
to prevent concurrent modification.

Rather than introducing yet another lock, it is possible to re-use the
allocator lock for that purpose. This patch introduces a new callback
that will be call when updating the context.

Signed-off-by: Julien Grall 
---
 arch/arm64/include/asm/asid.h | 12 
 arch/arm64/lib/asid.c | 10 --
 arch/arm64/mm/context.c   | 11 ---
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/asid.h b/arch/arm64/include/asm/asid.h
index bb62b587f37f..d8d9dc875bec 100644
--- a/arch/arm64/include/asm/asid.h
+++ b/arch/arm64/include/asm/asid.h
@@ -23,6 +23,8 @@ struct asid_info
unsigned intctxt_shift;
/* Callback to locally flush the context. */
void(*flush_cpu_ctxt_cb)(void);
+   /* Callback to call when a context is updated */
+   void(*update_ctxt_cb)(void *ctxt);
 };
 
 #define NUM_ASIDS(info)(1UL << ((info)->bits))
@@ -31,7 +33,7 @@ struct asid_info
 #define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
 
 void asid_new_context(struct asid_info *info, atomic64_t *pasid,
- unsigned int cpu);
+ unsigned int cpu, void *ctxt);
 
 /*
  * Check the ASID is still valid for the context. If not generate a new ASID.
@@ -40,7 +42,8 @@ void asid_new_context(struct asid_info *info, atomic64_t 
*pasid,
  * @cpu: current CPU ID. Must have been acquired throught get_cpu()
  */
 static inline void asid_check_context(struct asid_info *info,
- atomic64_t *pasid, unsigned int cpu)
+  atomic64_t *pasid, unsigned int cpu,
+  void *ctxt)
 {
u64 asid, old_active_asid;
 
@@ -67,11 +70,12 @@ static inline void asid_check_context(struct asid_info 
*info,
 old_active_asid, asid))
return;
 
-   asid_new_context(info, pasid, cpu);
+   asid_new_context(info, pasid, cpu, ctxt);
 }
 
 int asid_allocator_init(struct asid_info *info,
u32 bits, unsigned int asid_per_ctxt,
-   void (*flush_cpu_ctxt_cb)(void));
+   void (*flush_cpu_ctxt_cb)(void),
+   void (*update_ctxt_cb)(void *ctxt));
 
 #endif
diff --git a/arch/arm64/lib/asid.c b/arch/arm64/lib/asid.c
index 72b71bfb32be..b47e6769c1bc 100644
--- a/arch/arm64/lib/asid.c
+++ b/arch/arm64/lib/asid.c
@@ -130,9 +130,10 @@ static u64 new_context(struct asid_info *info, atomic64_t 
*pasid)
  * @pasid: Pointer to the current ASID batch allocated. It will be updated
  * with the new ASID batch.
  * @cpu: current CPU ID. Must have been acquired through get_cpu()
+ * @ctxt: Context to update when calling update_context
  */
 void asid_new_context(struct asid_info *info, atomic64_t *pasid,
- unsigned int cpu)
+ unsigned int cpu, void *ctxt)
 {
unsigned long flags;
u64 asid;
@@ -149,6 +150,9 @@ void asid_new_context(struct asid_info *info, atomic64_t 
*pasid,
info->flush_cpu_ctxt_cb();
 
atomic64_set(_asid(info, cpu), asid);
+
+   info->update_ctxt_cb(ctxt);
+
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
@@ -163,11 +167,13 @@ void asid_new_context(struct asid_info *info, atomic64_t 
*pasid,
  */
 int asid_allocator_init(struct asid_info *info,
u32 bits, unsigned int asid_per_ctxt,
-   void (*flush_cpu_ctxt_cb)(void))
+   void (*flush_cpu_ctxt_cb)(void),
+   void (*update_ctxt_cb)(void *ctxt))
 {
info->bits = bits;
info->ctxt_shift = ilog2(asid_per_ctxt);
info->flush_cpu_ctxt_cb = flush_cpu_ctxt_cb;
+   info->update_ctxt_cb = update_ctxt_cb;
/*
 * Expect allocation after rollover to fail if we don't have at least
 * one more ASID than CPUs. ASID #0 is always reserved.
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 95ee7711a2ef..737b4bd7bbe7 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -82,7 +82,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned 
int cpu)
if (system_supports_cnp())
cpu_set_reserved_ttbr0();
 
-   asid_check_context(_info, >context.id, cpu);
+   asid_check_context(_info, >context.id, cpu, mm);
 
arm64_apply_bp_hardening();
 
@@ -108,12 +108,17 @@ static void asid_flush_cpu_ctxt(void)
local_flush_tlb_all();
 }
 
+static void asid_update_ctxt(void *ctxt)
+{
+   /* Nothing to do */
+}
+
 static int asids_init(void)
 {
u32 bits = get_cpu_asid_bits();
 
-   if (!asid_allocator_init(_info, bits, 

[PATCH RFC 04/14] arm64/mm: Move the variable lock and tlb_flush_pending to asid_info

2019-03-21 Thread Julien Grall
The variables lock and tlb_flush_pending holds information for a given
ASID allocator. So move them to the asid_info structure.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index da17ed6c7117..e98ab348b9cb 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -27,8 +27,6 @@
 #include 
 #include 
 
-static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
-
 struct asid_info
 {
atomic64_t  generation;
@@ -36,6 +34,9 @@ struct asid_info
atomic64_t __percpu *active;
u64 __percpu*reserved;
u32 bits;
+   raw_spinlock_t  lock;
+   /* Which CPU requires context flush on next call */
+   cpumask_t   flush_pending;
 } asid_info;
 
 #define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
@@ -44,8 +45,6 @@ struct asid_info
 static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 
-static cpumask_t tlb_flush_pending;
-
 #define ASID_MASK(info)(~GENMASK((info)->bits - 1, 0))
 #define ASID_FIRST_VERSION(info)   (1UL << ((info)->bits))
 
@@ -124,7 +123,7 @@ static void flush_context(struct asid_info *info)
 * Queue a TLB invalidation for each CPU to perform on next
 * context-switch
 */
-   cpumask_setall(_flush_pending);
+   cpumask_setall(>flush_pending);
 }
 
 static bool check_update_reserved_asid(struct asid_info *info, u64 asid,
@@ -233,7 +232,7 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
 old_active_asid, asid))
goto switch_mm_fastpath;
 
-   raw_spin_lock_irqsave(_asid_lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
/* Check that our ASID belongs to the current generation. */
asid = atomic64_read(>context.id);
if ((asid ^ atomic64_read(>generation)) >> info->bits) {
@@ -241,11 +240,11 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
atomic64_set(>context.id, asid);
}
 
-   if (cpumask_test_and_clear_cpu(cpu, _flush_pending))
+   if (cpumask_test_and_clear_cpu(cpu, >flush_pending))
local_flush_tlb_all();
 
atomic64_set(_asid(info, cpu), asid);
-   raw_spin_unlock_irqrestore(_asid_lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
 switch_mm_fastpath:
 
@@ -288,6 +287,8 @@ static int asids_init(void)
info->active = _asids;
info->reserved = _asids;
 
+   raw_spin_lock_init(>lock);
+
pr_info("ASID allocator initialised with %lu entries\n",
NUM_USER_ASIDS(info));
return 0;
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 13/14] arm/kvm: Introduce a new VMID allocator

2019-03-21 Thread Julien Grall
A follow-up patch will replace the KVM VMID allocator with the arm64 ASID
allocator. It is not yet clear how the code can be shared between arm
and arm64, so this is a verbatim copy of arch/arm64/lib/asid.c.

Signed-off-by: Julien Grall 
---
 arch/arm/include/asm/kvm_asid.h |  81 +
 arch/arm/kvm/Makefile   |   1 +
 arch/arm/kvm/asid.c | 191 
 3 files changed, 273 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_asid.h
 create mode 100644 arch/arm/kvm/asid.c

diff --git a/arch/arm/include/asm/kvm_asid.h b/arch/arm/include/asm/kvm_asid.h
new file mode 100644
index ..f312a6d7543c
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asid.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM_KVM_ASID_H__
+#define __ARM_KVM_ASID_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct asid_info
+{
+   atomic64_t  generation;
+   unsigned long   *map;
+   atomic64_t __percpu *active;
+   u64 __percpu*reserved;
+   u32 bits;
+   /* Lock protecting the structure */
+   raw_spinlock_t  lock;
+   /* Which CPU requires context flush on next call */
+   cpumask_t   flush_pending;
+   /* Number of ASID allocated by context (shift value) */
+   unsigned intctxt_shift;
+   /* Callback to locally flush the context. */
+   void(*flush_cpu_ctxt_cb)(void);
+   /* Callback to call when a context is updated */
+   void(*update_ctxt_cb)(void *ctxt);
+};
+
+#define NUM_ASIDS(info)(1UL << ((info)->bits))
+#define NUM_CTXT_ASIDS(info)   (NUM_ASIDS(info) >> (info)->ctxt_shift)
+
+#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
+
+void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+ unsigned int cpu, void *ctxt);
+
+/*
+ * Check the ASID is still valid for the context. If not generate a new ASID.
+ *
+ * @pasid: Pointer to the current ASID batch
+ * @cpu: current CPU ID. Must have been acquired throught get_cpu()
+ */
+static inline void asid_check_context(struct asid_info *info,
+  atomic64_t *pasid, unsigned int cpu,
+  void *ctxt)
+{
+   u64 asid, old_active_asid;
+
+   asid = atomic64_read(pasid);
+
+   /*
+* The memory ordering here is subtle.
+* If our active_asid is non-zero and the ASID matches the current
+* generation, then we update the active_asid entry with a relaxed
+* cmpxchg. Racing with a concurrent rollover means that either:
+*
+* - We get a zero back from the cmpxchg and end up waiting on the
+*   lock. Taking the lock synchronises with the rollover and so
+*   we are forced to see the updated generation.
+*
+* - We get a valid ASID back from the cmpxchg, which means the
+*   relaxed xchg in flush_context will treat us as reserved
+*   because atomic RmWs are totally ordered for a given location.
+*/
+   old_active_asid = atomic64_read(_asid(info, cpu));
+   if (old_active_asid &&
+   !((asid ^ atomic64_read(>generation)) >> info->bits) &&
+   atomic64_cmpxchg_relaxed(_asid(info, cpu),
+old_active_asid, asid))
+   return;
+
+   asid_new_context(info, pasid, cpu, ctxt);
+}
+
+int asid_allocator_init(struct asid_info *info,
+   u32 bits, unsigned int asid_per_ctxt,
+   void (*flush_cpu_ctxt_cb)(void),
+   void (*update_ctxt_cb)(void *ctxt));
+
+#endif /* __ARM_KVM_ASID_H__ */
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 531e59f5be9c..35d2d4c67827 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += hyp/
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += handle_exit.o guest.o emulate.o reset.o
+obj-y += asid.o
 obj-y += coproc.o coproc_a15.o coproc_a7.o   vgic-v3-coproc.o
 obj-y += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
 obj-y += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
diff --git a/arch/arm/kvm/asid.c b/arch/arm/kvm/asid.c
new file mode 100644
index ..60a25270163a
--- /dev/null
+++ b/arch/arm/kvm/asid.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic ASID allocator.
+ *
+ * Based on arch/arm/mm/context.c
+ *
+ * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include 
+
+#include 
+
+#define reserved_asid(info, cpu) *per_cpu_ptr((info)->reserved, cpu)
+
+#define ASID_MASK(info)(~GENMASK((info)->bits - 1, 0))
+#define ASID_FIRST_VERSION(info)   (1UL << ((info)->bits))
+
+#define asid2idx(info, asid)   (((asid) 

[PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-03-21 Thread Julien Grall
We will want to re-use the ASID allocator in a separate context (e.g
allocating VMID). So move the code in a new file.

The function asid_check_context has been moved in the header as a static
inline function because we want to avoid add a branch when checking if the
ASID is still valid.

Signed-off-by: Julien Grall 

---

This code will be used in the virt code for allocating VMID. I am not
entirely sure where to place it. Lib could potentially be a good place but I
am not entirely convinced the algo as it is could be used by other
architecture.

Looking at x86, it seems that it will not be possible to re-use because
the number of PCID (aka ASID) could be smaller than the number of CPUs.
See commit message 10af6235e0d327d42e1bad974385197817923dc1 "x86/mm:
Implement PCID based optimization: try to preserve old TLB entries using
PCI".
---
 arch/arm64/include/asm/asid.h |  77 ++
 arch/arm64/lib/Makefile   |   2 +
 arch/arm64/lib/asid.c | 185 +
 arch/arm64/mm/context.c   | 235 +-
 4 files changed, 267 insertions(+), 232 deletions(-)
 create mode 100644 arch/arm64/include/asm/asid.h
 create mode 100644 arch/arm64/lib/asid.c

diff --git a/arch/arm64/include/asm/asid.h b/arch/arm64/include/asm/asid.h
new file mode 100644
index ..bb62b587f37f
--- /dev/null
+++ b/arch/arm64/include/asm/asid.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_ASM_ASID_H
+#define __ASM_ASM_ASID_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct asid_info
+{
+   atomic64_t  generation;
+   unsigned long   *map;
+   atomic64_t __percpu *active;
+   u64 __percpu*reserved;
+   u32 bits;
+   /* Lock protecting the structure */
+   raw_spinlock_t  lock;
+   /* Which CPU requires context flush on next call */
+   cpumask_t   flush_pending;
+   /* Number of ASID allocated by context (shift value) */
+   unsigned intctxt_shift;
+   /* Callback to locally flush the context. */
+   void(*flush_cpu_ctxt_cb)(void);
+};
+
+#define NUM_ASIDS(info)(1UL << ((info)->bits))
+#define NUM_CTXT_ASIDS(info)   (NUM_ASIDS(info) >> (info)->ctxt_shift)
+
+#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
+
+void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+ unsigned int cpu);
+
+/*
+ * Check the ASID is still valid for the context. If not generate a new ASID.
+ *
+ * @pasid: Pointer to the current ASID batch
+ * @cpu: current CPU ID. Must have been acquired throught get_cpu()
+ */
+static inline void asid_check_context(struct asid_info *info,
+ atomic64_t *pasid, unsigned int cpu)
+{
+   u64 asid, old_active_asid;
+
+   asid = atomic64_read(pasid);
+
+   /*
+* The memory ordering here is subtle.
+* If our active_asid is non-zero and the ASID matches the current
+* generation, then we update the active_asid entry with a relaxed
+* cmpxchg. Racing with a concurrent rollover means that either:
+*
+* - We get a zero back from the cmpxchg and end up waiting on the
+*   lock. Taking the lock synchronises with the rollover and so
+*   we are forced to see the updated generation.
+*
+* - We get a valid ASID back from the cmpxchg, which means the
+*   relaxed xchg in flush_context will treat us as reserved
+*   because atomic RmWs are totally ordered for a given location.
+*/
+   old_active_asid = atomic64_read(_asid(info, cpu));
+   if (old_active_asid &&
+   !((asid ^ atomic64_read(>generation)) >> info->bits) &&
+   atomic64_cmpxchg_relaxed(_asid(info, cpu),
+old_active_asid, asid))
+   return;
+
+   asid_new_context(info, pasid, cpu);
+}
+
+int asid_allocator_init(struct asid_info *info,
+   u32 bits, unsigned int asid_per_ctxt,
+   void (*flush_cpu_ctxt_cb)(void));
+
+#endif
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 5540a1638baf..720df5ee2aa2 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -5,6 +5,8 @@ lib-y   := clear_user.o delay.o copy_from_user.o
\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
   strchr.o strrchr.o tishift.o
 
+lib-y  += asid.o
+
 ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
 obj-$(CONFIG_XOR_BLOCKS)   += xor-neon.o
 CFLAGS_REMOVE_xor-neon.o   += -mgeneral-regs-only
diff --git a/arch/arm64/lib/asid.c b/arch/arm64/lib/asid.c
new file mode 100644
index ..72b71bfb32be
--- /dev/null
+++ b/arch/arm64/lib/asid.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * 

[PATCH RFC 06/14] arm64/mm: Store the number of asid allocated per context

2019-03-21 Thread Julien Grall
Currently the number of ASID allocated per context is determined at
compilation time. As the algorithm is becoming generic, the user may
want to instantiate the ASID allocator multiple time with different
number of ASID allocated.

Add a field in asid_info to track the number ASID allocated per context.
This is stored in term of shift amount to avoid division in the code.

This means the number of ASID allocated per context should be a power of
two.

At the same time rename NUM_USERS_ASIDS to NUM_CTXT_ASIDS to make the
name more generic.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 488845c39c39..5a4c2b1aac71 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,8 @@ struct asid_info
raw_spinlock_t  lock;
/* Which CPU requires context flush on next call */
cpumask_t   flush_pending;
+   /* Number of ASID allocated by context (shift value) */
+   unsigned intctxt_shift;
 } asid_info;
 
 #define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
@@ -49,15 +51,15 @@ static DEFINE_PER_CPU(u64, reserved_asids);
 #define ASID_FIRST_VERSION(info)   (1UL << ((info)->bits))
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-#define NUM_USER_ASIDS(info)   (ASID_FIRST_VERSION(info) >> 1)
-#define asid2idx(info, asid)   (((asid) & ~ASID_MASK(info)) >> 1)
-#define idx2asid(info, idx)(((idx) << 1) & ~ASID_MASK(info))
+#define ASID_PER_CONTEXT   2
 #else
-#define NUM_USER_ASIDS(info)   (ASID_FIRST_VERSION(info))
-#define asid2idx(info, asid)   ((asid) & ~ASID_MASK(info))
-#define idx2asid(info, idx)asid2idx(info, idx)
+#define ASID_PER_CONTEXT   1
 #endif
 
+#define NUM_CTXT_ASIDS(info)   (ASID_FIRST_VERSION(info) >> 
(info)->ctxt_shift)
+#define asid2idx(info, asid)   (((asid) & ~ASID_MASK(info)) >> 
(info)->ctxt_shift)
+#define idx2asid(info, idx)(((idx) << (info)->ctxt_shift) & 
~ASID_MASK(info))
+
 /* Get the ASIDBits supported by the current CPU */
 static u32 get_cpu_asid_bits(void)
 {
@@ -102,7 +104,7 @@ static void flush_context(struct asid_info *info)
u64 asid;
 
/* Update the list of reserved ASIDs and the ASID bitmap. */
-   bitmap_clear(info->map, 0, NUM_USER_ASIDS(info));
+   bitmap_clear(info->map, 0, NUM_CTXT_ASIDS(info));
 
for_each_possible_cpu(i) {
asid = atomic64_xchg_relaxed(_asid(info, i), 0);
@@ -182,8 +184,8 @@ static u64 new_context(struct asid_info *info, atomic64_t 
*pasid)
 * a reserved TTBR0 for the init_mm and we allocate ASIDs in even/odd
 * pairs.
 */
-   asid = find_next_zero_bit(info->map, NUM_USER_ASIDS(info), cur_idx);
-   if (asid != NUM_USER_ASIDS(info))
+   asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), cur_idx);
+   if (asid != NUM_CTXT_ASIDS(info))
goto set_asid;
 
/* We're out of ASIDs, so increment the global generation count */
@@ -192,7 +194,7 @@ static u64 new_context(struct asid_info *info, atomic64_t 
*pasid)
flush_context(info);
 
/* We have more ASIDs than CPUs, so this will always succeed */
-   asid = find_next_zero_bit(info->map, NUM_USER_ASIDS(info), 1);
+   asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), 1);
 
 set_asid:
__set_bit(asid, info->map);
@@ -272,17 +274,18 @@ static int asids_init(void)
struct asid_info *info = _info;
 
info->bits = get_cpu_asid_bits();
+   info->ctxt_shift = ilog2(ASID_PER_CONTEXT);
/*
 * Expect allocation after rollover to fail if we don't have at least
 * one more ASID than CPUs. ASID #0 is reserved for init_mm.
 */
-   WARN_ON(NUM_USER_ASIDS(info) - 1 <= num_possible_cpus());
+   WARN_ON(NUM_CTXT_ASIDS(info) - 1 <= num_possible_cpus());
atomic64_set(>generation, ASID_FIRST_VERSION(info));
-   info->map = kcalloc(BITS_TO_LONGS(NUM_USER_ASIDS(info)),
+   info->map = kcalloc(BITS_TO_LONGS(NUM_CTXT_ASIDS(info)),
sizeof(*info->map), GFP_KERNEL);
if (!info->map)
panic("Failed to allocate bitmap for %lu ASIDs\n",
- NUM_USER_ASIDS(info));
+ NUM_CTXT_ASIDS(info));
 
info->active = _asids;
info->reserved = _asids;
@@ -290,7 +293,7 @@ static int asids_init(void)
raw_spin_lock_init(>lock);
 
pr_info("ASID allocator initialised with %lu entries\n",
-   NUM_USER_ASIDS(info));
+   NUM_CTXT_ASIDS(info));
return 0;
 }
 early_initcall(asids_init);
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu

[PATCH RFC 00/14] kvm/arm: Align the VMID allocation with the arm64 ASID one

2019-03-21 Thread Julien Grall
This patch series is moving out the ASID allocator in a separate file in order
to re-use it for the VMID. The benefits are:
- CPUs are not forced to exit a roll-over.
- Context invalidation is now per-CPU rather than
  broadcasted.

There are no performance regression on the fastpath for ASID allocation.
Actually on the hackbench measurement (300 hackbench) it was .7% faster.

The measurement was made on a Seattle based SoC (8 CPUs), with the
number of VMID limited to 4-bit. The test involves running concurrently 40
guests with 2 vCPUs. Each guest will then execute hackbench 5 times
before exiting.

The performance difference between the current algo and the new one are:
- 2.5% less exit from the guest
- 22.4% more flush, although they are now local rather than
broadcasted
- 0.11% faster (just for the record)

The ASID allocator rework to make it generic has been divided in multiple
patches to make the review easier.

A branch with the patch based on 5.1-rc1 can be found:

http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/vmid-rework/rfc

Cheers,

Julien Grall (14):
  arm64/mm: Introduce asid_info structure and move
asid_generation/asid_map to it
  arm64/mm: Move active_asids and reserved_asids to asid_info
  arm64/mm: Move bits to asid_info
  arm64/mm: Move the variable lock and tlb_flush_pending to asid_info
  arm64/mm: Remove dependency on MM in new_context
  arm64/mm: Store the number of asid allocated per context
  arm64/mm: Introduce NUM_ASIDS
  arm64/mm: Split asid_inits in 2 parts
  arm64/mm: Split the function check_and_switch_context in 3 parts
  arm64/mm: Introduce a callback to flush the local context
  arm64: Move the ASID allocator code in a separate file
  arm64/lib: asid: Allow user to update the context under the lock
  arm/kvm: Introduce a new VMID allocator
  kvm/arm: Align the VMID allocation with the arm64 ASID one

 arch/arm/include/asm/kvm_asid.h   |  81 +++
 arch/arm/include/asm/kvm_asm.h|   2 +-
 arch/arm/include/asm/kvm_host.h   |   5 +-
 arch/arm/include/asm/kvm_hyp.h|   1 +
 arch/arm/kvm/Makefile |   1 +
 arch/arm/kvm/asid.c   | 191 +++
 arch/arm/kvm/hyp/tlb.c|   8 +-
 arch/arm64/include/asm/asid.h |  81 +++
 arch/arm64/include/asm/kvm_asid.h |   8 ++
 arch/arm64/include/asm/kvm_asm.h  |   2 +-
 arch/arm64/include/asm/kvm_host.h |   5 +-
 arch/arm64/kvm/hyp/tlb.c  |  10 +-
 arch/arm64/lib/Makefile   |   2 +
 arch/arm64/lib/asid.c | 191 +++
 arch/arm64/mm/context.c   | 205 ++
 virt/kvm/arm/arm.c| 112 +++--
 16 files changed, 638 insertions(+), 267 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_asid.h
 create mode 100644 arch/arm/kvm/asid.c
 create mode 100644 arch/arm64/include/asm/asid.h
 create mode 100644 arch/arm64/include/asm/kvm_asid.h
 create mode 100644 arch/arm64/lib/asid.c

-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH RFC 01/14] arm64/mm: Introduce asid_info structure and move asid_generation/asid_map to it

2019-03-21 Thread Julien Grall
In an attempt to make the ASID allocator generic, create a new structure
asid_info to store all the information necessary for the allocator.

For now, move the variables asid_generation and asid_map to the new structure
asid_info. Follow-up patches will move more variables.

Note to avoid more renaming aftwards, a local variable 'info' has been
created and is a pointer to the ASID allocator structure.

Signed-off-by: Julien Grall 
---
 arch/arm64/mm/context.c | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 1f0ea2facf24..34db54f1a39a 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -30,8 +30,11 @@
 static u32 asid_bits;
 static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
 
-static atomic64_t asid_generation;
-static unsigned long *asid_map;
+struct asid_info
+{
+   atomic64_t  generation;
+   unsigned long   *map;
+} asid_info;
 
 static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
@@ -88,13 +91,13 @@ void verify_cpu_asid_bits(void)
}
 }
 
-static void flush_context(void)
+static void flush_context(struct asid_info *info)
 {
int i;
u64 asid;
 
/* Update the list of reserved ASIDs and the ASID bitmap. */
-   bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+   bitmap_clear(info->map, 0, NUM_USER_ASIDS);
 
for_each_possible_cpu(i) {
asid = atomic64_xchg_relaxed(_cpu(active_asids, i), 0);
@@ -107,7 +110,7 @@ static void flush_context(void)
 */
if (asid == 0)
asid = per_cpu(reserved_asids, i);
-   __set_bit(asid2idx(asid), asid_map);
+   __set_bit(asid2idx(asid), info->map);
per_cpu(reserved_asids, i) = asid;
}
 
@@ -142,11 +145,11 @@ static bool check_update_reserved_asid(u64 asid, u64 
newasid)
return hit;
 }
 
-static u64 new_context(struct mm_struct *mm)
+static u64 new_context(struct asid_info *info, struct mm_struct *mm)
 {
static u32 cur_idx = 1;
u64 asid = atomic64_read(>context.id);
-   u64 generation = atomic64_read(_generation);
+   u64 generation = atomic64_read(>generation);
 
if (asid != 0) {
u64 newasid = generation | (asid & ~ASID_MASK);
@@ -162,7 +165,7 @@ static u64 new_context(struct mm_struct *mm)
 * We had a valid ASID in a previous life, so try to re-use
 * it if possible.
 */
-   if (!__test_and_set_bit(asid2idx(asid), asid_map))
+   if (!__test_and_set_bit(asid2idx(asid), info->map))
return newasid;
}
 
@@ -173,20 +176,20 @@ static u64 new_context(struct mm_struct *mm)
 * a reserved TTBR0 for the init_mm and we allocate ASIDs in even/odd
 * pairs.
 */
-   asid = find_next_zero_bit(asid_map, NUM_USER_ASIDS, cur_idx);
+   asid = find_next_zero_bit(info->map, NUM_USER_ASIDS, cur_idx);
if (asid != NUM_USER_ASIDS)
goto set_asid;
 
/* We're out of ASIDs, so increment the global generation count */
generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
-_generation);
-   flush_context();
+>generation);
+   flush_context(info);
 
/* We have more ASIDs than CPUs, so this will always succeed */
-   asid = find_next_zero_bit(asid_map, NUM_USER_ASIDS, 1);
+   asid = find_next_zero_bit(info->map, NUM_USER_ASIDS, 1);
 
 set_asid:
-   __set_bit(asid, asid_map);
+   __set_bit(asid, info->map);
cur_idx = asid;
return idx2asid(asid) | generation;
 }
@@ -195,6 +198,7 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
 {
unsigned long flags;
u64 asid, old_active_asid;
+   struct asid_info *info = _info;
 
if (system_supports_cnp())
cpu_set_reserved_ttbr0();
@@ -217,7 +221,7 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
 */
old_active_asid = atomic64_read(_cpu(active_asids, cpu));
if (old_active_asid &&
-   !((asid ^ atomic64_read(_generation)) >> asid_bits) &&
+   !((asid ^ atomic64_read(>generation)) >> asid_bits) &&
atomic64_cmpxchg_relaxed(_cpu(active_asids, cpu),
 old_active_asid, asid))
goto switch_mm_fastpath;
@@ -225,8 +229,8 @@ void check_and_switch_context(struct mm_struct *mm, 
unsigned int cpu)
raw_spin_lock_irqsave(_asid_lock, flags);
/* Check that our ASID belongs to the current generation. */
asid = atomic64_read(>context.id);
-   if ((asid ^ atomic64_read(_generation)) >> asid_bits) {
-   asid = new_context(mm);
+   if ((asid ^ 

Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Auger Eric
Hi jean, Jacob,

On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote:
> On 21/03/2019 13:54, Auger Eric wrote:
>> Hi Jacob, Jean-Philippe,
>>
>> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:
>>> On 20/03/2019 16:37, Jacob Pan wrote:
>>> [...]
> +struct iommu_inv_addr_info {
> +#define IOMMU_INV_ADDR_FLAGS_PASID   (1 << 0)
> +#define IOMMU_INV_ADDR_FLAGS_ARCHID  (1 << 1)
> +#define IOMMU_INV_ADDR_FLAGS_LEAF(1 << 2)
> + __u32   flags;
> + __u32   archid;
> + __u64   pasid;
> + __u64   addr;
> + __u64   granule_size;
> + __u64   nb_granules;
> +};
> +
> +/**
> + * First level/stage invalidation information
> + * @cache: bitfield that allows to select which caches to invalidate
> + * @granularity: defines the lowest granularity used for the
> invalidation:
> + * domain > pasid > addr
> + *
> + * Not all the combinations of cache/granularity make sense:
> + *
> + * type |   DEV_IOTLB   | IOTLB |  PASID|
> + * granularity   |   |   |
> cache |
> + * -+---+---+---+
> + * DOMAIN|   N/A |   Y   |
> Y |
> + * PASID |   Y   |   Y   |
> Y |
> + * ADDR  |   Y   |   Y   |
> N/A   |
> + */
> +struct iommu_cache_invalidate_info {
> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> + __u32   version;
> +/* IOMMU paging structure cache */
> +#define IOMMU_CACHE_INV_TYPE_IOTLB   (1 << 0) /* IOMMU IOTLB */
> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB   (1 << 1) /* Device
> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID  (1 << 2) /* PASID
> cache */
 Just a clarification, this used to be an enum. You do intend to issue a
 single invalidation request on multiple cache types? Perhaps for
 virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d
 we plan to issue one cache type at a time for now. So this format works
 for us.
>>>
>>> Yes for virtio-iommu I'd like as little overhead as possible, which
>>> means a single invalidation message to hit both IOTLB and ATC at once,
>>> and the ability to specify multiple pages with @nb_granules.
>> The original request/explanation from Jean-Philippe can be found here:
>> https://lkml.org/lkml/2019/1/28/1497
>>
>>>
 However, if multiple cache types are issued in a single invalidation.
 They must share a single granularity, not all combinations are valid.
 e.g. dev IOTLB does not support domain granularity. Just a reminder,
 not an issue. Driver could filter out invalid combinations.
>> Sure I will add a comment about this restriction.
>>>
>>> Agreed. Even the core could filter out invalid combinations based on the
>>> table above: IOTLB and domain granularity are N/A.
>> I don't get this sentence. What about vtd IOTLB domain-selective
>> invalidation:
> 
> My mistake: I meant dev-IOTLB and domain granularity are N/A

Ah OK, no worries.

How do we proceed further with those user APIs? Besides the comment to
be added above and previous suggestion from Jean ("Invalidations by
@granularity use field ...), have we reached a consensus now on:

- attach/detach_pasid_table
- cache_invalidate
- fault data and fault report API?

If not, please let me know.

Thanks

Eric


> 
> Thanks,
> Jean
> 
>> "
>> • IOTLB entries caching mappings associated with the specified domain-id
>> are invalidated.
>> • Paging-structure-cache entries caching mappings associated with the
>> specified domain-id are invalidated.
>> "
>>
>> Thanks
>>
>> Eric
>>
>>>
>>> Thanks,
>>> Jean
>>>

> + __u8cache;
> + __u8granularity;
> + __u8padding[2];
> + union {
> + __u64   pasid;
> + struct iommu_inv_addr_info addr_info;
> + };
> +};
> +
> +
>  #endif /* _UAPI_IOMMU_H */

 [Jacob Pan]
 ___
 iommu mailing list
 io...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/iommu

>>>
>> ___
>> iommu mailing list
>> io...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Jean-Philippe Brucker
On 21/03/2019 13:54, Auger Eric wrote:
> Hi Jacob, Jean-Philippe,
> 
> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:
>> On 20/03/2019 16:37, Jacob Pan wrote:
>> [...]
 +struct iommu_inv_addr_info {
 +#define IOMMU_INV_ADDR_FLAGS_PASID(1 << 0)
 +#define IOMMU_INV_ADDR_FLAGS_ARCHID   (1 << 1)
 +#define IOMMU_INV_ADDR_FLAGS_LEAF (1 << 2)
 +  __u32   flags;
 +  __u32   archid;
 +  __u64   pasid;
 +  __u64   addr;
 +  __u64   granule_size;
 +  __u64   nb_granules;
 +};
 +
 +/**
 + * First level/stage invalidation information
 + * @cache: bitfield that allows to select which caches to invalidate
 + * @granularity: defines the lowest granularity used for the
 invalidation:
 + * domain > pasid > addr
 + *
 + * Not all the combinations of cache/granularity make sense:
 + *
 + * type |   DEV_IOTLB   | IOTLB |  PASID|
 + * granularity|   |   |
 cache  |
 + * -+---+---+---+
 + * DOMAIN |   N/A |   Y   |
 Y  |
 + * PASID  |   Y   |   Y   |
 Y  |
 + * ADDR   |   Y   |   Y   |
 N/A|
 + */
 +struct iommu_cache_invalidate_info {
 +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
 +  __u32   version;
 +/* IOMMU paging structure cache */
 +#define IOMMU_CACHE_INV_TYPE_IOTLB(1 << 0) /* IOMMU IOTLB */
 +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 << 1) /* Device
 IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID   (1 << 2) /* PASID
 cache */
>>> Just a clarification, this used to be an enum. You do intend to issue a
>>> single invalidation request on multiple cache types? Perhaps for
>>> virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d
>>> we plan to issue one cache type at a time for now. So this format works
>>> for us.
>>
>> Yes for virtio-iommu I'd like as little overhead as possible, which
>> means a single invalidation message to hit both IOTLB and ATC at once,
>> and the ability to specify multiple pages with @nb_granules.
> The original request/explanation from Jean-Philippe can be found here:
> https://lkml.org/lkml/2019/1/28/1497
> 
>>
>>> However, if multiple cache types are issued in a single invalidation.
>>> They must share a single granularity, not all combinations are valid.
>>> e.g. dev IOTLB does not support domain granularity. Just a reminder,
>>> not an issue. Driver could filter out invalid combinations.
> Sure I will add a comment about this restriction.
>>
>> Agreed. Even the core could filter out invalid combinations based on the
>> table above: IOTLB and domain granularity are N/A.
> I don't get this sentence. What about vtd IOTLB domain-selective
> invalidation:

My mistake: I meant dev-IOTLB and domain granularity are N/A

Thanks,
Jean

> "
> • IOTLB entries caching mappings associated with the specified domain-id
> are invalidated.
> • Paging-structure-cache entries caching mappings associated with the
> specified domain-id are invalidated.
> "
> 
> Thanks
> 
> Eric
> 
>>
>> Thanks,
>> Jean
>>
>>>
 +  __u8cache;
 +  __u8granularity;
 +  __u8padding[2];
 +  union {
 +  __u64   pasid;
 +  struct iommu_inv_addr_info addr_info;
 +  };
 +};
 +
 +
  #endif /* _UAPI_IOMMU_H */
>>>
>>> [Jacob Pan]
>>> ___
>>> iommu mailing list
>>> io...@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>
>>
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Auger Eric
Hi Jacob, Jean-Philippe,

On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:
> On 20/03/2019 16:37, Jacob Pan wrote:
> [...]
>>> +struct iommu_inv_addr_info {
>>> +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0)
>>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1)
>>> +#define IOMMU_INV_ADDR_FLAGS_LEAF  (1 << 2)
>>> +   __u32   flags;
>>> +   __u32   archid;
>>> +   __u64   pasid;
>>> +   __u64   addr;
>>> +   __u64   granule_size;
>>> +   __u64   nb_granules;
>>> +};
>>> +
>>> +/**
>>> + * First level/stage invalidation information
>>> + * @cache: bitfield that allows to select which caches to invalidate
>>> + * @granularity: defines the lowest granularity used for the
>>> invalidation:
>>> + * domain > pasid > addr
>>> + *
>>> + * Not all the combinations of cache/granularity make sense:
>>> + *
>>> + * type |   DEV_IOTLB   | IOTLB |  PASID|
>>> + * granularity |   |   |
>>> cache   |
>>> + * -+---+---+---+
>>> + * DOMAIN  |   N/A |   Y   |
>>> Y   |
>>> + * PASID   |   Y   |   Y   |
>>> Y   |
>>> + * ADDR|   Y   |   Y   |
>>> N/A |
>>> + */
>>> +struct iommu_cache_invalidate_info {
>>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>>> +   __u32   version;
>>> +/* IOMMU paging structure cache */
>>> +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */
>>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device
>>> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID(1 << 2) /* PASID
>>> cache */
>> Just a clarification, this used to be an enum. You do intend to issue a
>> single invalidation request on multiple cache types? Perhaps for
>> virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d
>> we plan to issue one cache type at a time for now. So this format works
>> for us.
> 
> Yes for virtio-iommu I'd like as little overhead as possible, which
> means a single invalidation message to hit both IOTLB and ATC at once,
> and the ability to specify multiple pages with @nb_granules.
The original request/explanation from Jean-Philippe can be found here:
https://lkml.org/lkml/2019/1/28/1497

> 
>> However, if multiple cache types are issued in a single invalidation.
>> They must share a single granularity, not all combinations are valid.
>> e.g. dev IOTLB does not support domain granularity. Just a reminder,
>> not an issue. Driver could filter out invalid combinations.
Sure I will add a comment about this restriction.
> 
> Agreed. Even the core could filter out invalid combinations based on the
> table above: IOTLB and domain granularity are N/A.
I don't get this sentence. What about vtd IOTLB domain-selective
invalidation:
"
• IOTLB entries caching mappings associated with the specified domain-id
are invalidated.
• Paging-structure-cache entries caching mappings associated with the
specified domain-id are invalidated.
"

Thanks

Eric

> 
> Thanks,
> Jean
> 
>>
>>> +   __u8cache;
>>> +   __u8granularity;
>>> +   __u8padding[2];
>>> +   union {
>>> +   __u64   pasid;
>>> +   struct iommu_inv_addr_info addr_info;
>>> +   };
>>> +};
>>> +
>>> +
>>>  #endif /* _UAPI_IOMMU_H */
>>
>> [Jacob Pan]
>> ___
>> iommu mailing list
>> io...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 1/2] KVM: arm/arm64: Add save/restore support for firmware workaround state

2019-03-21 Thread Auger Eric
Hi Andre,

On 3/1/19 12:43 AM, Andre Przywara wrote:
> KVM implements the firmware interface for mitigating cache speculation
> vulnerabilities. Guests may use this interface to ensure mitigation is
> active.
> If we want to migrate such a guest to a host with a different support
> level for those workarounds, migration might need to fail, to ensure that
> critical guests don't loose their protection.
> 
> Introduce a way for userland to save and restore the workarounds state.
> On restoring we do checks that make sure we don't downgrade our
> mitigation level.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arch/arm/include/asm/kvm_emulate.h   |  10 +++
>  arch/arm/include/uapi/asm/kvm.h  |  10 +++
>  arch/arm64/include/asm/kvm_emulate.h |  14 +++
>  arch/arm64/include/uapi/asm/kvm.h|   9 ++
>  virt/kvm/arm/psci.c  | 128 +++
>  5 files changed, 155 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h 
> b/arch/arm/include/asm/kvm_emulate.h
> index 8927cae7c966..663a02d7e6f4 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -283,6 +283,16 @@ static inline unsigned long 
> kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>   return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
>  }
>  
> +static inline bool kvm_arm_get_vcpu_workaround_2_flag(struct kvm_vcpu *vcpu)
> +{
> + return false;
> +}
> +
> +static inline void kvm_arm_set_vcpu_workaround_2_flag(struct kvm_vcpu *vcpu,
> +   bool flag)
> +{
> +}
> +
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
>  {
>   *vcpu_cpsr(vcpu) |= PSR_E_BIT;
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index 4602464ebdfb..ba4d2afe65e3 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -214,6 +214,16 @@ struct kvm_vcpu_events {
>  #define KVM_REG_ARM_FW_REG(r)(KVM_REG_ARM | KVM_REG_SIZE_U64 
> | \
>KVM_REG_ARM_FW | ((r) & 0x))
>  #define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0)
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1  KVM_REG_ARM_FW_REG(1)
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL0
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL1
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_UNAFFECTED   2
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2  KVM_REG_ARM_FW_REG(2)
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL0
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN  1
Would be worth adding a comment saying that values are chosen so that
higher values mean better protection. Otherwise it looks strange
NOT_AVAIL/AVAIL/UNAFFECTED values are not the same for both workarounds.
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL2
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNAFFECTED   3
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED  (1U << 4)

>  
>  /* Device Control API: ARM VGIC */
>  #define KVM_DEV_ARM_VGIC_GRP_ADDR0
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index d3842791e1c4..c00c17c9adb6 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -348,6 +348,20 @@ static inline unsigned long 
> kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>   return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>  }
>  
> +static inline bool kvm_arm_get_vcpu_workaround_2_flag(struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.workaround_flags & VCPU_WORKAROUND_2_FLAG;
> +}
> +
> +static inline void kvm_arm_set_vcpu_workaround_2_flag(struct kvm_vcpu *vcpu,
> +   bool flag)
> +{
> + if (flag)
> + vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG;
> + else
> + vcpu->arch.workaround_flags &= ~VCPU_WORKAROUND_2_FLAG;
> +}
> +
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
>  {
>   if (vcpu_mode_is_32bit(vcpu)) {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 97c3478ee6e7..367e96fe654e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -225,6 +225,15 @@ struct kvm_vcpu_events {
>  #define KVM_REG_ARM_FW_REG(r)(KVM_REG_ARM64 | 
> KVM_REG_SIZE_U64 | \
>KVM_REG_ARM_FW | ((r) & 0x))
>  #define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0)
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1  KVM_REG_ARM_FW_REG(1)
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL0
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL1
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2  KVM_REG_ARM_FW_REG(2)
> +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL0
> +#define 

Re: [PATCH v4 2/2] KVM: doc: add API documentation on the KVM_REG_ARM_WORKAROUNDS register

2019-03-21 Thread Auger Eric
Hi Andre,

On 3/1/19 12:43 AM, Andre Przywara wrote:
> Add documentation for the newly defined firmware registers to save and
> restore any vulnerability migitation status.
s/migitation/mitigation
> 
> Signed-off-by: Andre Przywara 
> ---
>  Documentation/virtual/kvm/arm/psci.txt | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/arm/psci.txt 
> b/Documentation/virtual/kvm/arm/psci.txt
> index aafdab887b04..1ed0f0515cd8 100644
> --- a/Documentation/virtual/kvm/arm/psci.txt
> +++ b/Documentation/virtual/kvm/arm/psci.txt
> @@ -28,3 +28,28 @@ The following register is defined:
>- Allows any PSCI version implemented by KVM and compatible with
>  v0.2 to be set with SET_ONE_REG
>- Affects the whole VM (even if the register view is per-vcpu)
> +
> +* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
> +  Holds the state of the firmware controlled workaround to mitigate
> +  CVE-2017-5715, as described under SMCCC_ARCH_WORKAROUND_1 in [1].> +  
> Accepted values are:
> +KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL: Workaround not available.
> +  The mitigation status for the guest is unknown.
> +KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL: The workaround HVC call is
> +  available to the guest and required for the mitigation.
It is unclear to me why we don't report the actual enable status too as
for WA2. Assuming the mitigation is not applied on the source (ie. AVAIL
but SMCCC_ARCH_WORKAROUND_1 not called), do we need it on the destination?
> +KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_UNAFFECTED: The workaround HVC call
> +  is available to the guest, but it is not needed on this VCPU.
> +
> +* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
> +  Holds the state of the firmware controlled workaround to mitigate
> +  CVE-2018-3639, as described under SMCCC_ARCH_WORKAROUND_2 in [1].
This does not really match the terminology used in [1]. At least it is
not straightforward to me.
> +  Accepted values are:
> +KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL: Workaround not available.
> +KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN: Workaround state unknown.
both included in NOT_SUPPORTED. By the way why did we need to be more
precise here? Seems WA1 also has the UNKNOWN state as part of
UNSUPPORTED. Why isn't it homogeneous?
> +KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL: Workaround available, and can
> +  be disabled by a vCPU. If KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED 
> is
> +  set, it is active for this vCPU.
> +KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNAFFECTED: Workaround always active
NOT_REQUIRED or 1?
> +  or not needed.
> +
> +[1] 
> https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
> 
Thanks

Eric
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH -next] KVM: arm: vgic: Make two functions static

2019-03-21 Thread Mukesh Ojha



On 3/20/2019 7:48 PM, Yue Haibing wrote:

From: YueHaibing 

Fix sparse warnings:

arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning:
  symbol 'vgic_its_has_attr_regs' was not declared. Should it be static?
arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning:
  symbol 'vgic_its_attr_regs_access' was not declared. Should it be static?

Signed-off-by: YueHaibing 
---
  virt/kvm/arm/vgic/vgic-its.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index ab3f477..a97db69 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1729,8 +1729,8 @@ static void vgic_its_destroy(struct kvm_device *kvm_dev)
kfree(its);
  }
  
-int vgic_its_has_attr_regs(struct kvm_device *dev,

-  struct kvm_device_attr *attr)
+static int vgic_its_has_attr_regs(struct kvm_device *dev,
+ struct kvm_device_attr *attr)
  {
const struct vgic_register_region *region;
gpa_t offset = attr->attr;
@@ -1750,9 +1750,9 @@ int vgic_its_has_attr_regs(struct kvm_device *dev,
return 0;
  }
  
-int vgic_its_attr_regs_access(struct kvm_device *dev,

- struct kvm_device_attr *attr,
- u64 *reg, bool is_write)
+static int vgic_its_attr_regs_access(struct kvm_device *dev,
+struct kvm_device_attr *attr,
+u64 *reg, bool is_write)
  {
const struct vgic_register_region *region;
struct vgic_its *its;



Acked-by: Mukesh Ojha 

Cheers,
Mukesh

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 7/10] KVM: arm/arm64: context-switch ptrauth registers

2019-03-21 Thread Julien Thierry


On 21/03/2019 06:08, Amit Daniel Kachhap wrote:
> Hi Julien,
> 
> On 3/20/19 5:43 PM, Julien Thierry wrote:
>> Hi Amit,
>>
>> On 19/03/2019 08:30, Amit Daniel Kachhap wrote:
>>> From: Mark Rutland 
>>>
>>> When pointer authentication is supported, a guest may wish to use it.
>>> This patch adds the necessary KVM infrastructure for this to work, with
>>> a semi-lazy context switch of the pointer auth state.
>>>
>>> Pointer authentication feature is only enabled when VHE is built
>>> in the kernel and present in the CPU implementation so only VHE code
>>> paths are modified.
>>>
>>> When we schedule a vcpu, we disable guest usage of pointer
>>> authentication instructions and accesses to the keys. While these are
>>> disabled, we avoid context-switching the keys. When we trap the guest
>>> trying to use pointer authentication functionality, we change to eagerly
>>> context-switching the keys, and enable the feature. The next time the
>>> vcpu is scheduled out/in, we start again. However the host key save is
>>> optimized and implemented inside ptrauth instruction/register access
>>> trap.
>>>
>>> Pointer authentication consists of address authentication and generic
>>> authentication, and CPUs in a system might have varied support for
>>> either. Where support for either feature is not uniform, it is hidden
>>> from guests via ID register emulation, as a result of the cpufeature
>>> framework in the host.
>>>
>>> Unfortunately, address authentication and generic authentication cannot
>>> be trapped separately, as the architecture provides a single EL2 trap
>>> covering both. If we wish to expose one without the other, we cannot
>>> prevent a (badly-written) guest from intermittently using a feature
>>> which is not uniformly supported (when scheduled on a physical CPU which
>>> supports the relevant feature). Hence, this patch expects both type of
>>> authentication to be present in a cpu.
>>>
>>> This switch of key is done from guest enter/exit assembly as preperation
>>> for the upcoming in-kernel pointer authentication support. Hence, these
>>> key switching routines are not implemented in C code as they may cause
>>> pointer authentication key signing error in some situations.
>>>
>>> Signed-off-by: Mark Rutland 
>>> [Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
>>> , save host key in ptrauth exception trap]
>>> Signed-off-by: Amit Daniel Kachhap 
>>> Reviewed-by: Julien Thierry 
>>> Cc: Marc Zyngier 
>>> Cc: Christoffer Dall 
>>> Cc: kvmarm@lists.cs.columbia.edu
>>> ---
>>>   arch/arm64/include/asm/kvm_host.h    |  17 ++
>>>   arch/arm64/include/asm/kvm_ptrauth_asm.h | 100
>>> +++
>>>   arch/arm64/kernel/asm-offsets.c  |   6 ++
>>>   arch/arm64/kvm/guest.c   |  14 +
>>>   arch/arm64/kvm/handle_exit.c |  24 +---
>>>   arch/arm64/kvm/hyp/entry.S   |   7 +++
>>>   arch/arm64/kvm/reset.c   |   7 +++
>>>   arch/arm64/kvm/sys_regs.c    |  46 +-
>>>   virt/kvm/arm/arm.c   |   2 +
>>>   9 files changed, 212 insertions(+), 11 deletions(-)
>>>   create mode 100644 arch/arm64/include/asm/kvm_ptrauth_asm.h
>>>
> [...]
>>> +#ifdef    CONFIG_ARM64_PTR_AUTH
>>> +
>>> +#define PTRAUTH_REG_OFFSET(x)    (x - CPU_APIAKEYLO_EL1)
>>
>> I don't really see the point of this macro. You move the pointers of
>> kvm_cpu_contexts to point to where the ptr auth registers are (which is
>> in the middle of an array) by adding the offset of APIAKEYLO and then we
>> have to recompute all offsets with this macro.
>>
>> Why not just pass the kvm_cpu_context pointers to
>> ptrauth_save/restore_state and use the already defined offsets
>> (CPU_AP*_EL1) directly?
>>
>> I think this would also allow to use one less register for the
>> ptrauth_switch_to_* macros.
> Actually the values of CPU_AP*_EL1 are exceeding the immediate range
> (i.e 512), so this was done to keep the immediate offset within the range.
> The other way would have been to calculate the destination register but
> these would add one more add instruction everywhere.
> I should have mentioned them as comments somewhere.

Oh, I see. Yes, it would definitely be worth a comment.

Thanks,

-- 
Julien Thierry
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 9/10] KVM: arm64: docs: document KVM support of pointer authentication

2019-03-21 Thread Amit Daniel Kachhap

Hi Julien/Kristina,

On 3/21/19 2:26 AM, Kristina Martsenko wrote:

On 20/03/2019 18:06, Julien Thierry wrote:



On 20/03/2019 15:04, Kristina Martsenko wrote:

On 20/03/2019 13:37, Julien Thierry wrote:

Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:

This adds sections for KVM API extension for pointer authentication.
A brief description about usage of pointer authentication for KVM guests
is added in the arm64 documentations.


[...]


diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7de9eee..b5c66bc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2659,6 +2659,12 @@ Possible features:
  Depends on KVM_CAP_ARM_PSCI_0_2.
- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
  Depends on KVM_CAP_ARM_PMU_V3.
+   - KVM_ARM_VCPU_PTRAUTH_ADDRESS:
+   - KVM_ARM_VCPU_PTRAUTH_GENERIC:
+ Enables Pointer authentication for the CPU.
+ Depends on KVM_CAP_ARM_PTRAUTH and only on arm64 architecture. If
+ set, then the KVM guest allows the execution of pointer authentication
+ instructions. Otherwise, KVM treats these instructions as undefined.
  


Overall I feel one could easily get confused to whether
PTRAUTH_ADDRESS/GENERIC are two individual features, whether one is a
superset of the other, if the names are just an alias of one another, etc...

I think the doc should at least stress out that *both* flags are
required to enable ptrauth in a guest. However it raises the question,
if we don't plan to support the features individually (because we
can't), should we really expose two feature flags? I seems odd to
introduce two flags that only do something if used together...


Why can't we support the features individually? For example, if we ever
get a system where all CPUs support address authentication and none of
them support generic authentication, then we could still support address
authentication in the guest.




That's a good point, I didn't think of that.

Although, currently we don't have a way to detect that we are in such a
configuration. So as is, both flags are required to enable either
feature, and I feel the documentation should be clear on that aspect.


For now we only support enabling both features together, so both flags
need to be set. I agree that the documentation should be made clear on this.

In the future, if we need to, we can add "negative" cpucaps to detect
that a feature is absent on all CPUs.



Another option would be to introduce a flag that enables both for now,
and if one day we decide to support the configuration you mentioned we
could add "more modular" flags that allow you to control those features
individually. While a bit cumbersome, I would find that less awkward
than having two flags that only do something if both are present.


That would work too.

I find it more logical to have two flags since there are two features
(two ID register fields), and KVM enables two features for the guest.
The fact that KVM does not currently support enabling them separately is
a KVM implementation choice, and could change in the future.
Kristina, this comments of yours is actually what this patch series is 
trying to do. I should have added more details about the necessity of 
keeping two flags and enhancement of them is actually a future work.


Thanks,
Amit Daniel


Thanks,
Kristina


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 7/10] KVM: arm/arm64: context-switch ptrauth registers

2019-03-21 Thread Amit Daniel Kachhap

Hi Julien,

On 3/20/19 5:43 PM, Julien Thierry wrote:

Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:

From: Mark Rutland 

When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.

Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.

When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.

Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.

Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.

This switch of key is done from guest enter/exit assembly as preperation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.

Signed-off-by: Mark Rutland 
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap 
Reviewed-by: Julien Thierry 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: kvmarm@lists.cs.columbia.edu
---
  arch/arm64/include/asm/kvm_host.h|  17 ++
  arch/arm64/include/asm/kvm_ptrauth_asm.h | 100 +++
  arch/arm64/kernel/asm-offsets.c  |   6 ++
  arch/arm64/kvm/guest.c   |  14 +
  arch/arm64/kvm/handle_exit.c |  24 +---
  arch/arm64/kvm/hyp/entry.S   |   7 +++
  arch/arm64/kvm/reset.c   |   7 +++
  arch/arm64/kvm/sys_regs.c|  46 +-
  virt/kvm/arm/arm.c   |   2 +
  9 files changed, 212 insertions(+), 11 deletions(-)
  create mode 100644 arch/arm64/include/asm/kvm_ptrauth_asm.h


[...]

+#ifdef CONFIG_ARM64_PTR_AUTH
+
+#define PTRAUTH_REG_OFFSET(x)  (x - CPU_APIAKEYLO_EL1)


I don't really see the point of this macro. You move the pointers of
kvm_cpu_contexts to point to where the ptr auth registers are (which is
in the middle of an array) by adding the offset of APIAKEYLO and then we
have to recompute all offsets with this macro.

Why not just pass the kvm_cpu_context pointers to
ptrauth_save/restore_state and use the already defined offsets
(CPU_AP*_EL1) directly?

I think this would also allow to use one less register for the
ptrauth_switch_to_* macros.
Actually the values of CPU_AP*_EL1 are exceeding the immediate range 
(i.e 512), so this was done to keep the immediate offset within the range.
The other way would have been to calculate the destination register but 
these would add one more add instruction everywhere.

I should have mentioned them as comments somewhere.



+
+.macro ptrauth_save_state base, reg1, reg2
+   mrs_s   \reg1, SYS_APIAKEYLO_EL1
+   mrs_s   \reg2, SYS_APIAKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APIBKEYLO_EL1
+   mrs_s   \reg2, SYS_APIBKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APDAKEYLO_EL1
+   mrs_s   \reg2, SYS_APDAKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APDBKEYLO_EL1
+   mrs_s   \reg2, SYS_APDBKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APGAKEYLO_EL1
+   mrs_s   \reg2, SYS_APGAKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)]
+.endm
+
+.macro ptrauth_restore_state base, reg1, reg2
+   ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
+   msr_s   SYS_APIAKEYLO_EL1, \reg1
+   msr_s   SYS_APIAKEYHI_EL1, \reg2
+   ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
+