Re: [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

2021-05-06 Thread Logan Gunthorpe
Sorry, I think I missed responding to this one so here are the answers:

On 2021-05-02 7:14 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> When a PCI P2PDMA page is seen, set the IOVA length of the segment
>> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
>> apply the appropriate bus address to the segment. The IOVA is not
>> created if the scatterlist only consists of P2PDMA pages.
>>
>> Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
>> indicate bus address segments. On unmap, P2PDMA segments are skipped
>> over when determining the start and end IOVA addresses.
>>
>> With this change, the flags variable in the dma_map_ops is
>> set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
>> P2PDMA pages.
>>
>> Signed-off-by: Logan Gunthorpe 
>> ---
>>   drivers/iommu/dma-iommu.c | 66 ++-
>>   1 file changed, 58 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index af765c813cc8..ef49635f9819 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -20,6 +20,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>   #include 
>>   #include 
>> @@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev,
>> struct scatterlist *sg, int nents,
>>   sg_dma_address(s) = DMA_MAPPING_ERROR;
>>   sg_dma_len(s) = 0;
>>   +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
> 
> Newbie question: I'm in the dark as to why the !s_iova_len check is there,
> can you please enlighten me?

The loop in iommu_dma_map_sg() will decide what to do with P2PDMA pages.
If it is to map it with the bus address it will set s_iova_len to zero
so that no space is allocated in the IOVA. If it is to map it through
the host bridge, then it it will leave s_iova_len alone and create the
appropriate mapping with the CPU physical address.

This condition notices that s_iova_len was set to zero and fills in a SG
segment with the PCI bus address for that region.


> 
>> +    if (i > 0)
>> +    cur = sg_next(cur);
>> +
>> +    pci_p2pdma_map_bus_segment(s, cur);
>> +    count++;
>> +    cur_len = 0;
>> +    continue;
>> +    }
>> +
> 
> This is really an if/else condition. And arguably, it would be better
> to split out two subroutines, and call one or the other depending on
> the result of if is_pci_p2pdma_page(), instead of this "continue" approach.

I really disagree here. Putting the exceptional condition in it's own if
statement and leaving the normal case un-indented is easier to read and
understand. It also saves an extra level of indentation in code that is
already starting to look a little squished.


>>   /*
>>    * Now fill in the real DMA data. If...
>>    * - there is a valid output segment to append to
>> @@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>   struct iova_domain *iovad = >iovad;
>>   struct scatterlist *s, *prev = NULL;
>>   int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
>> +    struct dev_pagemap *pgmap = NULL;
>> +    enum pci_p2pdma_map_type map_type;
>>   dma_addr_t iova;
>>   size_t iova_len = 0;
>>   unsigned long mask = dma_get_seg_boundary(dev);
>> -    int i;
>> +    int i, ret = 0;
>>     if (static_branch_unlikely(_deferred_attach_enabled) &&
>>   iommu_deferred_attach(dev, domain))
>> @@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>   s_length = iova_align(iovad, s_length + s_iova_off);
>>   s->length = s_length;
>>   +    if (is_pci_p2pdma_page(sg_page(s))) {
>> +    if (sg_page(s)->pgmap != pgmap) {
>> +    pgmap = sg_page(s)->pgmap;
>> +    map_type = pci_p2pdma_map_type(pgmap, dev,
>> +   attrs);
>> +    }
>> +
>> +    switch (map_type) {
>> +    case PCI_P2PDMA_MAP_BUS_ADDR:
>> +    /*
>> + * A zero length will be ignored by
>> + * iommu_map_sg() and then can be detected
>> + * in __finalise_sg() to actually map the
>> + * bus address.
>> + */
>> +    s->length = 0;
>> +    continue;
>> +    case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
>> +    break;
>> +    default:
>> +    ret = -EREMOTEIO;
>> +    goto out_restore_sg;
>> +    }
>> +    }
>> +
>>   /*
>>    * Due to the alignment of our single IOVA allocation, we can
>>    * depend on these assumptions about the segment boundary mask:
>> @@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>   prev = s;
>>   }
>>   +    if 

Re: [PATCH v3 2/4] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-06 Thread Rob Herring
On Tue, 04 May 2021 10:41:22 +0200, Benjamin Gaignard wrote:
> Add compatible for the second version of IOMMU hardware block.
> RK356x IOMMU can also be link to a power domain.
> 
> Signed-off-by: Benjamin Gaignard 
> ---
> version 3:
>  - Rename compatible with SoC name
> 
> version 2:
>  - Add power-domains property
> 
>  .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-06 Thread Rob Herring
On Tue, May 04, 2021 at 10:41:21AM +0200, Benjamin Gaignard wrote:
> Convert Rockchip IOMMU to DT schema
> 
> Signed-off-by: Benjamin Gaignard 
> ---
> version 2:
>  - Change maintainer
>  - Change reg maxItems
>  - Change interrupt maxItems
> 
>  .../bindings/iommu/rockchip,iommu.txt | 38 -
>  .../bindings/iommu/rockchip,iommu.yaml| 79 +++
>  2 files changed, 79 insertions(+), 38 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
>  create mode 100644 
> Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> 
> diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt 
> b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
> deleted file mode 100644
> index 6ecefea1c6f9..
> --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
> +++ /dev/null
> @@ -1,38 +0,0 @@
> -Rockchip IOMMU
> -==
> -
> -A Rockchip DRM iommu translates io virtual addresses to physical addresses 
> for
> -its master device.  Each slave device is bound to a single master device, and
> -shares its clocks, power domain and irq.
> -
> -Required properties:
> -- compatible  : Should be "rockchip,iommu"
> -- reg : Address space for the configuration registers
> -- interrupts  : Interrupt specifier for the IOMMU instance
> -- interrupt-names : Interrupt name for the IOMMU instance
> -- #iommu-cells: Should be <0>.  This indicates the iommu is a
> -"single-master" device, and needs no additional 
> information
> -to associate with its master device.  See:
> -Documentation/devicetree/bindings/iommu/iommu.txt
> -- clocks  : A list of clocks required for the IOMMU to be accessible 
> by
> -the host CPU.
> -- clock-names : Should contain the following:
> - "iface" - Main peripheral bus clock (PCLK/HCL) (required)
> - "aclk"  - AXI bus clock (required)
> -
> -Optional properties:
> -- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
> -Some mmu instances may produce unexpected results
> -when the reset operation is used.
> -
> -Example:
> -
> - vopl_mmu: iommu@ff940300 {
> - compatible = "rockchip,iommu";
> - reg = <0xff940300 0x100>;
> - interrupts = ;
> - interrupt-names = "vopl_mmu";
> - clocks = < ACLK_VOP1>, < HCLK_VOP1>;
> - clock-names = "aclk", "iface";
> - #iommu-cells = <0>;
> - };
> diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
> b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> new file mode 100644
> index ..0db208cf724a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> @@ -0,0 +1,79 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Rockchip IOMMU
> +
> +maintainers:
> +  - Heiko Stuebner 
> +
> +description: |+
> +  A Rockchip DRM iommu translates io virtual addresses to physical addresses 
> for
> +  its master device. Each slave device is bound to a single master device and
> +  shares its clocks, power domain and irq.
> +
> +  For information on assigning IOMMU controller to its peripheral devices,
> +  see generic IOMMU bindings.
> +
> +properties:
> +  compatible:
> +const: rockchip,iommu
> +
> +  reg:
> +minItems: 1
> +maxItems: 2

What's the 2nd entry? If there's only 1 entry, then you don't have to 
describe what it is. If more than 1, then each entry has to be defined.

> +
> +  interrupts:
> +minItems: 1
> +maxItems: 2

Same here, though if interrupt-names defines them, that's good enough.

> +
> +  interrupt-names:
> +minItems: 1
> +maxItems: 2

Here we need the values.

> +
> +  clocks:
> +items:
> +  - description: Core clock
> +  - description: Interface clock
> +
> +  clock-names:
> +items:
> +  - const: aclk
> +  - const: iface
> +
> +  "#iommu-cells":
> +const: 0
> +
> +  rockchip,disable-mmu-reset:
> +$ref: /schemas/types.yaml#/definitions/flag
> +description: |
> +  Do not use the mmu reset operation.
> +  Some mmu instances may produce unexpected results
> +  when the reset operation is used.
> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +  - clocks
> +  - clock-names
> +  - "#iommu-cells"
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +#include 
> +#include 
> +
> +vopl_mmu: iommu@ff940300 {
> +  compatible = "rockchip,iommu";
> +  reg = <0xff940300 0x100>;
> +  interrupts = ;
> +  interrupt-names = "vopl_mmu";
> +  clocks = < ACLK_VOP1>, < HCLK_VOP1>;
> +  clock-names = "aclk", "iface";
> +  

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-06 Thread Raj, Ashok
Hi Jason

On Thu, May 06, 2021 at 09:27:30AM -0300, Jason Gunthorpe wrote:
> On Thu, May 06, 2021 at 09:23:48AM +0200, Jean-Philippe Brucker wrote:
> > On Wed, May 05, 2021 at 01:04:46PM -0700, Jacob Pan wrote:
> > > > > For ARM, since the guest owns the per device PASID table. There is no
> > > > > need to allocate PASIDs from the host nor the hypervisor. Without SWQ,
> > > > > there is no need for global PASID/SSID either. So PASID being global
> > > > > for ARM is for simplicity in case of host PASID/SSID.  
> > > > 
> > > > It isn't clear how ARM can support PASID and mdev but that is an
> > > > unrelated issue..
> > > > 
> > > AFAIK, the current SMMU device assignment is per RID, since only one 
> > > stage2
> > > page tables per RID, not per PASID. This is equivalent to the older VT-d
> > > spec. prior to scalable mode.
> > 
> > Yes that's right. Since SMMUv3 has a single level-2 page table per RID, it
> > doesn't support assigning level-1 page tables to guests for mdevs (sub-VF
> > devices). So no PASIDs for mdevs, which also means each guest has its own
> > PASID space and the host doesn't track guest PASIDs.
> 
> Basically it means when the guest's top level IOASID is created for
> nesting that IOASID claims all PASID's on the RID and excludes any
> PASID IOASIDs from existing on the RID now or in future.

The way to look at it this is as follows:

For platforms that do not have a need to support shared work queue model
support for ENQCMD or similar, PASID space is naturally per RID. There is no
complication with this. Every RID has the full range of PASID's and no need
for host to track which PASIDs are allocated now or in future in the guest.

For platforms that support ENQCMD, it is required to mandate PASIDs are
global across the entire system. Maybe its better to call them gPASID for
guest and hPASID for host. Short reason being gPASID->hPASID is a guest
wide mapping for ENQCMD and not a per-RID based mapping. (We covered that
in earlier responses)

In our current implementation we actually don't separate this space, and
gPASID == hPASID. The iommu driver enforces that by using the custom
allocator and the architected interface that allows all guest vIOMMU
allocations to be proxied to host. Nothing but a glorified hypercall like
interface. In fact some OS's do use hypercall to get a hPASID vs using
the vCMD style interface.

For cases where there is full PASID range for every RID and completely
managed by the guest that requires no assist from host to ensure
uniqueness, they don't need to have a custom allocator. Maybe the general
allocator can have ways to ensure global uniqueness vs. RID wide
uniqueness. This is still managed by the iommu driver (vIOMMU) + the
backend for vCMD in the host IOMMU driver.

> 
> Which would be a different behavior than something like Intel's top
> level IOASID that doesn't claim all the PASIDs.

isn't this simple, if we can say ioasid allocator can provide 

- system wide PASID
- RID local PASID

Based on platform capabilities that require such differentiation?

And based on the other threads, if ioasid is just a pgtable representation,
it doesn't need a PASID per-se. But when you want to use SVM or such, you
can associate a PASID with it for the IOMMU to plumb things with hardware.

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 09/10] iommu/arm-smmu: Get associated RMR info and install bypass SMR

2021-05-06 Thread Steven Price

On 20/04/2021 09:27, Shameer Kolothum wrote:

From: Jon Nettleton 

Check if there is any RMR info associated with the devices behind
the SMMU and if any, install bypass SMRs for them. This is to
keep any ongoing traffic associated with these devices alive
when we enable/reset SMMU during probe().

Signed-off-by: Jon Nettleton 
Signed-off-by: Shameer Kolothum 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 42 +++
  drivers/iommu/arm/arm-smmu/arm-smmu.h |  2 ++
  2 files changed, 44 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..4d2f91626d87 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2102,6 +2102,43 @@ err_reset_platform_ops: __maybe_unused;
return err;
  }
  
+static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu)

+{
+   struct iommu_rmr *e;
+   int i, cnt = 0;
+   u32 smr;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+   if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
+   continue;
+
+   list_for_each_entry(e, >rmr_list, list) {
+   if (FIELD_GET(ARM_SMMU_SMR_ID, smr) != e->sid)
+   continue;
+
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+
+   cnt++;
+   }
+   }


If I understand this correctly - this is looking at the current
(hardware) configuration of the SMMU and attempting to preserve any
bypass SMRs. However from what I can tell it suffers from the following
two problems:

 (a) Only the ID of the SMR is being checked, not the MASK. So if the
firmware has setup an SMR matching a number of streams this will break.

 (b) The SMMU might not be enabled at all (CLIENTPD==1) or bypass
enabled for unmatched streams (USFCFG==0).

Certainly in my test setup case (b) applies and so this doesn't work.
Perhaps something like the below would work better? (It works in the
case of the SMMU not enabled - I've not tested case (a)).

Steve

8<
static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu)
{
struct iommu_rmr *e;
int i, cnt = 0;
u32 smr;
u32 reg;

reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0);

if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & ARM_SMMU_sCR0_CLIENTPD)) {
/*
 * SMMU is already enabled and disallowing bypass, so preserve
 * the existing SMRs
 */
for (i = 0; i < smmu->num_mapping_groups; i++) {
smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
continue;
smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
smmu->smrs[i].valid = true;
}
}

list_for_each_entry(e, >rmr_list, list) {
u32 sid = e->sid;

i = arm_smmu_find_sme(smmu, sid, ~0);
if (i < 0)
continue;
if (smmu->s2crs[i].count == 0) {
smmu->smrs[i].id = sid;
smmu->smrs[i].mask = ~0;
smmu->smrs[i].valid = true;
}
smmu->s2crs[i].count++;
smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
smmu->s2crs[i].cbndx = 0xff;

cnt++;
}

if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & ARM_SMMU_sCR0_CLIENTPD)) {
/* Remove the valid bit for unused SMRs */
for (i = 0; i < smmu->num_mapping_groups; i++) {
if (smmu->s2crs[i].count == 0)
smmu->smrs[i].valid = false;
}
}

dev_notice(smmu->dev, "\tpreserved %d boot mapping%s\n", cnt,
   cnt == 1 ? "" : "s");
}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-06 Thread Jason Gunthorpe
On Thu, May 06, 2021 at 09:23:48AM +0200, Jean-Philippe Brucker wrote:
> On Wed, May 05, 2021 at 01:04:46PM -0700, Jacob Pan wrote:
> > > > For ARM, since the guest owns the per device PASID table. There is no
> > > > need to allocate PASIDs from the host nor the hypervisor. Without SWQ,
> > > > there is no need for global PASID/SSID either. So PASID being global
> > > > for ARM is for simplicity in case of host PASID/SSID.  
> > > 
> > > It isn't clear how ARM can support PASID and mdev but that is an
> > > unrelated issue..
> > > 
> > AFAIK, the current SMMU device assignment is per RID, since only one stage2
> > page tables per RID, not per PASID. This is equivalent to the older VT-d
> > spec. prior to scalable mode.
> 
> Yes that's right. Since SMMUv3 has a single level-2 page table per RID, it
> doesn't support assigning level-1 page tables to guests for mdevs (sub-VF
> devices). So no PASIDs for mdevs, which also means each guest has its own
> PASID space and the host doesn't track guest PASIDs.

Basically it means when the guest's top level IOASID is created for
nesting that IOASID claims all PASID's on the RID and excludes any
PASID IOASIDs from existing on the RID now or in future.

Which would be a different behavior than something like Intel's top
level IOASID that doesn't claim all the PASIDs.

Lots of little special flags in here :|

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-06 Thread Jason Gunthorpe
On Wed, May 05, 2021 at 04:23:19PM -0700, Raj, Ashok wrote:
> > Which implies the API to the iommu driver should be more like:
> > 
> >   'assign an IOASID to this RID and return the PASID'
> >   'reserve a PASID from every RID'
> 
> I don't think this has any decent change of success. Its rather round about
> way to get a global PASID namespace.
> 
> >   'assign an IOASID to this RID and use this specific PASID'
> 
> This seems a bit complicated. Another way to specify this.

Maybe, but I don't like that the driver-based iommu API has been
corrupted by injecting a global 'first driver to claim it'
resource. It is not properly layered anymore.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-06 Thread Jean-Philippe Brucker
On Wed, May 05, 2021 at 01:04:46PM -0700, Jacob Pan wrote:
> > > For ARM, since the guest owns the per device PASID table. There is no
> > > need to allocate PASIDs from the host nor the hypervisor. Without SWQ,
> > > there is no need for global PASID/SSID either. So PASID being global
> > > for ARM is for simplicity in case of host PASID/SSID.  
> > 
> > It isn't clear how ARM can support PASID and mdev but that is an
> > unrelated issue..
> > 
> AFAIK, the current SMMU device assignment is per RID, since only one stage2
> page tables per RID, not per PASID. This is equivalent to the older VT-d
> spec. prior to scalable mode.

Yes that's right. Since SMMUv3 has a single level-2 page table per RID, it
doesn't support assigning level-1 page tables to guests for mdevs (sub-VF
devices). So no PASIDs for mdevs, which also means each guest has its own
PASID space and the host doesn't track guest PASIDs.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu