Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function

2017-12-15 Thread Jean-Philippe Brucker
A quick update on invalidations before I leave for holidays, since we're
struggling to define useful semantics. I worked on the virtio-iommu
prototype for vSVA, so I tried to pin down what I think is needed for vSVA
invalidation in the host. I don't know whether the VT-d and AMD emulations
can translate all of this from guest commands.

Scope selects which entries are invalidated, and flags cherry-pick what
caches to invalidate. For example a guest might remove GBs of sparse
mappings, and decide that it would be quicker to invalidate the whole
context instead of one at a time. Then it would set only flags = (TLB |
DEV_TLB) with scope = PASID. If the guest clears one entry in the PASID
table, then it would send scope = PASID and flags = (LEAF | CONFIG | TLB |
DEV_TLB). On an ARM system the guest can invalidate TLBs with CPU
instructions, but can't invalidate ATCs. So it would send an invalidate
with flags = (LEAF | TLB) and scope = VA.

enum iommu_sva_inval_scope {
IOMMU_INVALIDATE_DOMAIN = 1,
IOMMU_INVALIDATE_PASID,
IOMMU_INVALIDATE_VA,
};

/* Only invalidate leaf entry. Applies to PASID table if scope == PASID or
 * page tables if scope == VA. */
#define IOMMU_INVALIDATE_LEAF   (1 << 0)
/* Invalidate cached PASID table configuration */
#define IOMMU_INVALIDATE_CONFIG (1 << 1)
/* Invalidate IOTLBs */
#define IOMMU_INVALIDATE_TLB(1 << 2)
/* Invalidate ATCs */
#define IOMMU_INVALIDATE_DEV_TLB(1 << 3)
/* + Need a global flag? */

struct iommu_sva_invalidate {
enum iommu_sva_inval_scope  scope;
u32 flags;
u32 pasid;
u64 iova;
u64 size;
/* Arch-specific, format is determined at bind time */
union {
struct {
u16 asid;
u8  granule;
} arm;
}
};

ARM needs two more fields. A 16-bit @asid (Address Space ID) targets TLB
entries and may be different from the PASID (up to the guest to decide),
which targets ATC and config entries.

@granule is the TLB granule that we're invalidating. For instance if the
guest just unmapped a few 2M huge pages, it sets @granule to 21 bits, so
we issue less invalidation commands, since we only need to evict huge TLB
entries. I'm not sure about other architecture but I'd be surprised if
this wasn't more common. Should we move it to the common part?


int iommu_sva_invalidate(struct iommu_domain *domain,
 struct iommu_sva_invalidate *inval);

And so the host driver implementation is roughly:
--
bool leaf   = flags & IOMMU_INVALIDATE_LEAF;
bool config = flags & IOMMU_INVALIDATE_CONFIG;
bool tlb= flags & IOMMU_INVALIDATE_TLB;
bool atc= flags & IOMMU_INVALIDATE_DEV_TLB;

if (config) {
switch (scope) {
case IOMMU_INVALIDATE_PASID:
inval_cached_pasid_entry(domain, pasid, leaf);
break;
case IOMMU_INVALIDATE_DOMAIN:
inval_all_cached_pasid_entries(domain);
break;
default:
return -EINVAL;
}

/* Wait for caches to be clean, then invalidate TLBs */
sync_commands();
}

if (tlb) {
switch (scope) {
case IOMMU_INVALIDATE_VA:
inval_tlb_entries(domain, asid, iova, size, granule,
  leaf);
break;
case IOMMU_INVALIDATE_PASID:
inval_all_tlb_entries_for_asid(domain, asid);
break;
case IOMMU_INVALIDATE_DOMAIN:
inval_all_tlb_entries(domain);
break;
default:
return -EINVAL;
}

/* Wait for TLBs to be clean, then invalidate ATCs. */
sync_commands();
}

if (atc) {
/* ATC invalidations are sent to all devices in the domain */
switch (scope) {
case IOMMU_INVALIDATE_VA:
inval_atc_entries(domain, pasid, iova, size);
break;
case IOMMU_INVALIDATE_PASID:
/* Covers the full address space */
inval_all_atc_entries_for_pasid(domain, pasid);
break;
case IOMMU_INVALIDATE_DOMAIN:
/* Set Global Invalidate */
inval_all_atc_entries(domain);
break;
default:
return -EINVAL;
}

sync_commands();
}

/* Then return to guest. */
--

I think this covers what we need and allows userspace or the guest to
gather multiple invalidations into a single request/ioctl.

I don't think per-device ATC invalidation is needed, but might be wrong.
According to ATS it is implicit when the 

RE: [PATCH v12 0/3] iommu/smmu-v3: Workaround for hisilicon 161010801 erratum(reserve HW MSI)

2017-12-15 Thread Shameerali Kolothum Thodi
Hi Lorenzo/Robin/Will,

Since this has all the necessary reviewed-by from all concerned now(Thanks to 
all),
just wondering how this will be picked up? through iort or iommu?

Please let me know.

Thanks,
Shameer

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: Thursday, December 14, 2017 4:10 PM
> To: lorenzo.pieral...@arm.com; robin.mur...@arm.com;
> marc.zyng...@arm.com; will.dea...@arm.com
> Cc: j...@8bytes.org; John Garry ; xuwei (O)
> ; Guohanjun (Hanjun Guo) ;
> iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org; linux-
> a...@vger.kernel.org; devicet...@vger.kernel.org; Linuxarm
> ; Shameerali Kolothum Thodi
> 
> Subject: [PATCH v12 0/3] iommu/smmu-v3: Workaround for hisilicon
> 161010801 erratum(reserve HW MSI)
> 
> On certain HiSilicon platforms (hip06/hip07) the GIC ITS and PCIe RC
> deviates from the standard implementation and this breaks PCIe MSI
> functionality when SMMU is enabled.
> 
> The HiSilicon erratum 161010801 describes this limitation of certain
> HiSilicon platforms to support the SMMU mappings for MSI transactions.
> On these platforms GICv3 ITS translator is presented with the deviceID
> by extending the MSI payload data to 64 bits to include the deviceID.
> Hence, the PCIe controller on this platforms has to differentiate the MSI
> payload against other DMA payload and has to modify the MSI payload.
> This basically makes it difficult for this platforms to have a SMMU
> translation for MSI.
> 
> This patch implements an ACPI based quirk to reserve the hw msi regions
> in the smmu-v3 driver which means these address regions will not be
> translated and will be excluded from iova allocations.
> 
> To implement this quirk, the following changes are incorporated:
> 1. Added a generic helper function to IORT code to retrieve and reserve
>the associated ITS base address from a device IORT node. The function
>has a check for smmu model to determine whether the platform requires
>the HW MSI reservation or not.
> 2. Added smmu node entries and explicitly disabled them in hip06/hip07
> dts files so that users are warned about the non-DT support for this
> erratum.
> 
> Changelog:
> 
> v11--> v12
> -Thanks to Lorenzo, Fixed !CONFIG_IOMMU_API compile error(patch #1).
> 
> v10 --> v11
> -Addressed comments from Lorenzo(patch#1)
> -Added Robin's Reviewed-by to patch #2
> 
> v9 --> v10
> Addressed comments:
> -Moved smmu model check to iort helper function to selectively apply
>  the msi reservation which will make the fn call generic from iommu-dma.
> -Removed PCI blacklisting patch, instead added smmu nodes(disabled)
>  with comments to hip06/hip07 dts file.
> 
> v8 --> v9
> -Thanks to Marc, fixed IORT helper function to reserve the ITS
>  translater region only.
> -Removed the DT support for MSI reservation and blacklisted
>  HiSilicon PCIe controllers on DT based systems when SMMUv3 is
>  enabled.
> 
> v7 --> v8
> Addressed comments from Rob and Lorenzo:
>  -Modified to use DT compatible string for errata.
>  -Changed logic to retrieve the msi-parent for DT case.
> 
> v6 --> v7
> Addressed request from Will to add DT support for the erratum:
>  - added bt binding
>  - add of_iommu_msi_get_resv_regions()
> New arm64 silicon errata entry
> Rename iort_iommu_{its->msi}_get_resv_regions
> 
> v5 --> v6
> Addressed comments from Robin and Lorenzo:
> -No change to patch#1 .
> -Reverted v5 patch#2 as this might break the platforms where this quirk
>   is not applicable. Provided a generic function in iommu code and added
>   back the quirk implementation in SMMU v3 driver(patch#3)
> 
> v4 --> v5
> Addressed comments from Robin and Lorenzo:
> -Added a comment to make it clear that, for now, only straightforward
>   HW topologies are handled while reserving ITS regions(patch #1).
> 
> v3 --> v4
> Rebased on 4.13-rc1.
> Addressed comments from Robin, Will and Lorenzo:
> -As suggested by Robin, moved the ITS msi reservation into
>   iommu_dma_get_resv_regions().
> -Added its_count != resv region failure case(patch #1).
> 
> v2 --> v3
> Addressed comments from Lorenzo and Robin:
> -Removed dev_is_pci() check in smmuV3 driver.
> -Don't treat device not having an ITS mapping as an error in
>   iort helper function.
> 
> v1 --> v2
> -patch 2/2: Invoke iort helper fn based on fwnode type(acpi).
> 
> RFCv2 -->PATCH
> -Incorporated Lorenzo's review comments.
> 
> RFC v1 --> RFC v2
> Based on Robin's review comments,
> -Removed  the generic erratum framework.
> -Using IORT/MADT tables to retrieve the ITS base addr instead
>  of vendor specific CSRT table.
> 
> Shameer Kolothum (3):
>   ACPI/IORT: Add msi address regions reservation helper
>   iommu/dma: Add HW MSI(GICv3 ITS) address regions reservation
>   arm64:dts:hisilicon Disable hisilicon smmu node on hip06/hip07
> 
>  arch/arm64/boot/dts/hisilicon/hip06.dtsi |  56 

Re: [PATCH v12 1/3] ACPI/IORT: Add msi address regions reservation helper

2017-12-15 Thread Marc Zyngier
On Thu, 14 Dec 2017 16:09:55 +,
Shameer Kolothum wrote:
> 
> On some platforms msi parent address regions have to be excluded from
> normal IOVA allocation in that they are detected and decoded in a HW
> specific way by system components and so they cannot be considered normal
> IOVA address space.
> 
> Add a helper function that retrieves ITS address regions - the msi
> parent - through IORT device <-> ITS mappings and reserves it so that
> these regions will not be translated by IOMMU and will be excluded from
> IOVA allocations. The function checks for the smmu model number and
> only applies the msi reservation if the platform requires it.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/acpi/arm64/iort.c| 111 
> +--
>  drivers/irqchip/irq-gic-v3-its.c |   3 +-
>  include/linux/acpi_iort.h|   7 ++-
>  3 files changed, 116 insertions(+), 5 deletions(-)

For the ITS part:

Reviewed-by: Marc Zyngier 

M.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 1/3] ACPI/IORT: Add msi address regions reservation helper

2017-12-15 Thread Lorenzo Pieralisi
On Thu, Dec 14, 2017 at 04:09:55PM +, Shameer Kolothum wrote:
> On some platforms msi parent address regions have to be excluded from
> normal IOVA allocation in that they are detected and decoded in a HW
> specific way by system components and so they cannot be considered normal
> IOVA address space.
> 
> Add a helper function that retrieves ITS address regions - the msi
> parent - through IORT device <-> ITS mappings and reserves it so that
> these regions will not be translated by IOMMU and will be excluded from
> IOVA allocations. The function checks for the smmu model number and
> only applies the msi reservation if the platform requires it.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/acpi/arm64/iort.c| 111 
> +--
>  drivers/irqchip/irq-gic-v3-its.c |   3 +-
>  include/linux/acpi_iort.h|   7 ++-
>  3 files changed, 116 insertions(+), 5 deletions(-)

Reviewed-by: Lorenzo Pieralisi 

> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 95255ec..e2f7bdd 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -39,6 +39,7 @@
>  struct iort_its_msi_chip {
>   struct list_headlist;
>   struct fwnode_handle*fw_node;
> + phys_addr_t base_addr;
>   u32 translation_id;
>  };
>  
> @@ -161,14 +162,16 @@ typedef acpi_status (*iort_find_node_callback)
>  static DEFINE_SPINLOCK(iort_msi_chip_lock);
>  
>  /**
> - * iort_register_domain_token() - register domain token and related ITS ID
> - * to the list from where we can get it back later on.
> + * iort_register_domain_token() - register domain token along with related
> + * ITS ID and base address to the list from where we can get it back later 
> on.
>   * @trans_id: ITS ID.
> + * @base: ITS base address.
>   * @fw_node: Domain token.
>   *
>   * Returns: 0 on success, -ENOMEM if no memory when allocating list element
>   */
> -int iort_register_domain_token(int trans_id, struct fwnode_handle *fw_node)
> +int iort_register_domain_token(int trans_id, phys_addr_t base,
> +struct fwnode_handle *fw_node)
>  {
>   struct iort_its_msi_chip *its_msi_chip;
>  
> @@ -178,6 +181,7 @@ int iort_register_domain_token(int trans_id, struct 
> fwnode_handle *fw_node)
>  
>   its_msi_chip->fw_node = fw_node;
>   its_msi_chip->translation_id = trans_id;
> + its_msi_chip->base_addr = base;
>  
>   spin_lock(_msi_chip_lock);
>   list_add(_msi_chip->list, _msi_chip_list);
> @@ -581,6 +585,24 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id)
>   return -ENODEV;
>  }
>  
> +static int __maybe_unused iort_find_its_base(u32 its_id, phys_addr_t *base)
> +{
> + struct iort_its_msi_chip *its_msi_chip;
> + int ret = -ENODEV;
> +
> + spin_lock(_msi_chip_lock);
> + list_for_each_entry(its_msi_chip, _msi_chip_list, list) {
> + if (its_msi_chip->translation_id == its_id) {
> + *base = its_msi_chip->base_addr;
> + ret = 0;
> + break;
> + }
> + }
> + spin_unlock(_msi_chip_lock);
> +
> + return ret;
> +}
> +
>  /**
>   * iort_dev_find_its_id() - Find the ITS identifier for a device
>   * @dev: The device.
> @@ -766,6 +788,24 @@ static inline bool iort_iommu_driver_enabled(u8 type)
>  }
>  
>  #ifdef CONFIG_IOMMU_API
> +static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev)
> +{
> + struct acpi_iort_node *iommu;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> +
> + iommu = iort_get_iort_node(fwspec->iommu_fwnode);
> +
> + if (iommu && (iommu->type == ACPI_IORT_NODE_SMMU_V3)) {
> + struct acpi_iort_smmu_v3 *smmu;
> +
> + smmu = (struct acpi_iort_smmu_v3 *)iommu->node_data;
> + if (smmu->model == ACPI_IORT_SMMU_V3_HISILICON_HI161X)
> + return iommu;
> + }
> +
> + return NULL;
> +}
> +
>  static inline const struct iommu_ops *iort_fwspec_iommu_ops(
>   struct iommu_fwspec *fwspec)
>  {
> @@ -782,6 +822,69 @@ static inline int iort_add_device_replay(const struct 
> iommu_ops *ops,
>  
>   return err;
>  }
> +
> +/**
> + * iort_iommu_msi_get_resv_regions - Reserved region driver helper
> + * @dev: Device from iommu_get_resv_regions()
> + * @head: Reserved region list from iommu_get_resv_regions()
> + *
> + * Returns: Number of msi reserved regions on success (0 if platform
> + *  doesn't require the reservation or no associated msi regions),
> + *  appropriate error value otherwise. The ITS interrupt translation
> + *  spaces (ITS_base + SZ_64K, SZ_64K) associated with the device
> + *  are the msi reserved regions.
> + */
> +int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
> *head)
> +{
> + struct