Re: [PATCH v2 0/3] PCIe Host request to reserve IOVA

2018-12-13 Thread poza

On 2018-12-13 16:02, Srinath Mannam wrote:

Few SOCs have limitation that their PCIe host can't allow few inbound
address ranges.
Allowed inbound address ranges are listed in dma-ranges DT property and
this address ranges are required to do IOVA mapping.
Remaining address ranges have to be reserved in IOVA mapping.

PCIe Host driver of those SOCs has to list all address ranges which 
have
to reserve their IOVA address into PCIe host bridge resource entry 
list.
IOMMU framework will reserve these IOVAs while initializing IOMMU 
domain.


This patch set is based on Linux-4.19-rc1.

Changes from v1:
  - Addressed Oza review comments.

Srinath Mannam (3):
  PCI: Add dma-resv window list
  iommu/dma: IOVA reserve for PCI host reserve address list
  PCI: iproc: Add dma reserve resources to host

 drivers/iommu/dma-iommu.c   |  8 ++
 drivers/pci/controller/pcie-iproc.c | 51 
-

 drivers/pci/probe.c |  3 +++
 include/linux/pci.h |  1 +
 4 files changed, 62 insertions(+), 1 deletion(-)


Looks good to me.

Reviewed-by: Oza Pawandeep 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/3] PCI: iproc: Add dma reserve resources to host

2018-12-13 Thread poza

On 2018-12-13 14:47, Srinath Mannam wrote:

Hi Oza,

Thank you for the review.
Please find my comments in lined.

On Thu, Dec 13, 2018 at 11:33 AM  wrote:


On 2018-12-12 11:16, Srinath Mannam wrote:
> IPROC host has the limitation that it can use
> only those address ranges given by dma-ranges
> property as inbound address.
> So that the memory address holes in dma-ranges
> should be reserved to allocate as DMA address.
>
> All such reserved addresses are created as resource
> entries and add to dma_resv list of pci host bridge.
>
> These dma reserve resources created by parsing
> dma-ranges parameter.
>
> Ex:
> dma-ranges = < \
>   0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \
>   0x4300 0x08 0x 0x08 0x 0x08 0x \
>   0x4300 0x80 0x 0x80 0x 0x40 0x>
>
> In the above example of dma-ranges, memory address from
> 0x0 - 0x8000,
> 0x1 - 0x8,
> 0x10 - 0x80 and
> 0x100 - 0x.
> are not allowed to use as inbound addresses.
> So that we need to add these address range to dma_resv
> list to reserve their IOVA address ranges.
>
> Signed-off-by: Srinath Mannam 
> ---
>  drivers/pci/controller/pcie-iproc.c | 49
> +
>  1 file changed, 49 insertions(+)
>
> diff --git a/drivers/pci/controller/pcie-iproc.c
> b/drivers/pci/controller/pcie-iproc.c
> index 3160e93..43e465a 100644
> --- a/drivers/pci/controller/pcie-iproc.c
> +++ b/drivers/pci/controller/pcie-iproc.c
> @@ -1154,25 +1154,74 @@ static int iproc_pcie_setup_ib(struct
> iproc_pcie *pcie,
>   return ret;
>  }
>
> +static int
> +iproc_pcie_add_dma_resv_range(struct device *dev, struct list_head
> *resources,
> +   uint64_t start, uint64_t end)
> +{
> + struct resource *res;
> +
> + res = devm_kzalloc(dev, sizeof(struct resource), GFP_KERNEL);
> + if (!res)
> + return -ENOMEM;
> +
> + res->start = (resource_size_t)start;
> + res->end = (resource_size_t)end;
> + pci_add_resource_offset(resources, res, 0);
> +
> + return 0;
> +}
> +
>  static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie)
>  {
> + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
>   struct of_pci_range range;
>   struct of_pci_range_parser parser;
>   int ret;
> + uint64_t start, end;
> + LIST_HEAD(resources);
>
>   /* Get the dma-ranges from DT */
>   ret = of_pci_dma_range_parser_init(, pcie->dev->of_node);
>   if (ret)
>   return ret;
>
> + start = 0;
>   for_each_of_pci_range(, ) {
> + end = range.pci_addr;
> + /* dma-ranges list expected in sorted order */
> + if (end < start) {
> + ret = -EINVAL;
> + goto out;
> + }
>   /* Each range entry corresponds to an inbound mapping region */
>   ret = iproc_pcie_setup_ib(pcie, , IPROC_PCIE_IB_MAP_MEM);
>   if (ret)
>   return ret;
> +
> + if (end - start) {
> + ret = iproc_pcie_add_dma_resv_range(pcie->dev,
> + ,
> + start, end);
> + if (ret)
> + goto out;
> + }
> + start = range.pci_addr + range.size;
>   }
>
> + end = ~0;
Hi Srinath,

this series is based on following patch sets.

https://lkml.org/lkml/2017/5/16/19
https://lkml.org/lkml/2017/5/16/23
https://lkml.org/lkml/2017/5/16/21,


Yes, this patch series is done based on the inputs of the patches you
sent earlier.


some comments to be adapted from the patch-set I did.

end = ~0;
you should consider DMA_MASK, to see iproc controller is in 32 bit or 
64

bit system.
please check following code snippet.

if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) 
* 8) - 1);

+   reserve_iova(iovad, lo, hi);
+   }

Also if this controller is integrated to 64bit platform, but decide to
restrict DMA to 32 bit for some reason, the code should address such
scenarios.
so it is always safe to do

#define BITS_PER_BYTE 8
DMA_BIT_MASK(sizeof(dma_addr_t) * BITS_PER_BYTE)
so please use kernle macro to find the end of DMA region.


this change done with the assumption, that end_address is max bus
address(~0) instead
pcie RC dma mask.
Even dma-ranges has 64bit size dma-mask of PCIe host is forced to 
32bit.

// in of_dma_configure function
dev->coherent_dma_mask = DMA_BIT_MASK(32);
And dma-mask of endpoint was set to 64bit in their drivers. also SMMU 
supported

dma mask is 48-bit.
But here requirement is all address ranges except 

Re: [RFC PATCH 3/3] PCI: iproc: Add dma reserve resources to host

2018-12-12 Thread poza

On 2018-12-12 11:16, Srinath Mannam wrote:

IPROC host has the limitation that it can use
only those address ranges given by dma-ranges
property as inbound address.
So that the memory address holes in dma-ranges
should be reserved to allocate as DMA address.

All such reserved addresses are created as resource
entries and add to dma_resv list of pci host bridge.

These dma reserve resources created by parsing
dma-ranges parameter.

Ex:
dma-ranges = < \
  0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \
  0x4300 0x08 0x 0x08 0x 0x08 0x \
  0x4300 0x80 0x 0x80 0x 0x40 0x>

In the above example of dma-ranges, memory address from
0x0 - 0x8000,
0x1 - 0x8,
0x10 - 0x80 and
0x100 - 0x.
are not allowed to use as inbound addresses.
So that we need to add these address range to dma_resv
list to reserve their IOVA address ranges.

Signed-off-by: Srinath Mannam 
---
 drivers/pci/controller/pcie-iproc.c | 49 
+

 1 file changed, 49 insertions(+)

diff --git a/drivers/pci/controller/pcie-iproc.c
b/drivers/pci/controller/pcie-iproc.c
index 3160e93..43e465a 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -1154,25 +1154,74 @@ static int iproc_pcie_setup_ib(struct 
iproc_pcie *pcie,

return ret;
 }

+static int
+iproc_pcie_add_dma_resv_range(struct device *dev, struct list_head 
*resources,

+ uint64_t start, uint64_t end)
+{
+   struct resource *res;
+
+   res = devm_kzalloc(dev, sizeof(struct resource), GFP_KERNEL);
+   if (!res)
+   return -ENOMEM;
+
+   res->start = (resource_size_t)start;
+   res->end = (resource_size_t)end;
+   pci_add_resource_offset(resources, res, 0);
+
+   return 0;
+}
+
 static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie)
 {
+   struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
struct of_pci_range range;
struct of_pci_range_parser parser;
int ret;
+   uint64_t start, end;
+   LIST_HEAD(resources);

/* Get the dma-ranges from DT */
ret = of_pci_dma_range_parser_init(, pcie->dev->of_node);
if (ret)
return ret;

+   start = 0;
for_each_of_pci_range(, ) {
+   end = range.pci_addr;
+   /* dma-ranges list expected in sorted order */
+   if (end < start) {
+   ret = -EINVAL;
+   goto out;
+   }
/* Each range entry corresponds to an inbound mapping region */
ret = iproc_pcie_setup_ib(pcie, , IPROC_PCIE_IB_MAP_MEM);
if (ret)
return ret;
+
+   if (end - start) {
+   ret = iproc_pcie_add_dma_resv_range(pcie->dev,
+   ,
+   start, end);
+   if (ret)
+   goto out;
+   }
+   start = range.pci_addr + range.size;
}

+   end = ~0;

Hi Srinath,

this series is based on following patch sets.

https://lkml.org/lkml/2017/5/16/19
https://lkml.org/lkml/2017/5/16/23
https://lkml.org/lkml/2017/5/16/21,

some comments to be adapted from the patch-set I did.

end = ~0;
you should consider DMA_MASK, to see iproc controller is in 32 bit or 64 
bit system.

please check following code snippet.

if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }

Also if this controller is integrated to 64bit platform, but decide to 
restrict DMA to 32 bit for some reason, the code should address such 
scenarios.

so it is always safe to do

#define BITS_PER_BYTE 8
DMA_BIT_MASK(sizeof(dma_addr_t) * BITS_PER_BYTE)
so please use kernle macro to find the end of DMA region.


Also ideally according to SBSA v5

8.3 PCI Express device view of memory

Transactions from a PCI express device will either directly address the 
memory system of the base server system
or be presented to a SMMU for optional address translation and 
permission policing.
In systems that are compatible with level 3 or above of the SBSA, the 
addresses sent by PCI express devices
must be presented to the memory system or SMMU unmodified. In a system 
where the PCI express does not use
an SMMU, the PCI express devices have the same view of physical memory 
as the PEs. In a system with a
SMMU for PCI express there are no transformations to addresses being 
sent by PCI express devices before they

are presented as an input address to the SMMU.


Re: Explicit IOVA management from a PCIe endpoint driver

2018-09-18 Thread poza



On 2018-09-18 03:06, Stephen Warren wrote:

Joerg, Christoph, Marek, Robin,

I believe that the driver for our PCIe endpoint controller hardware
will need to explicitly manage its IOVA space more than current APIs
allow. I'd like to discuss how to make that possible.

First some background on our hardware:

NVIDIA's Xavier SoC contains a Synopsis Designware PCIe controller.
This can operate in either root port or endpoint mode. I'm
particularly interested in endpoint mode.

Our particular instantiation of this controller exposes a single
function with a single software-controlled PCIe BAR to the PCIe bus
(there are also BARs for access to DMA controller registers and
outbound MSI configuration, which can both be enabled/disabled but not
used for any other purpose). When a transaction is received from the
PCIe bus, the following happens:

1) Transaction is matched against the BAR base/size (in PCIe address
space) to determine whether it "hits" this BAR or not.

2) The transaction's address is processed by the PCIe controller's ATU
(Address Translation Unit), which can re-write the address that the
transaction accesses.

Our particular instantiation of the hardware only has 2 entries in the
ATU mapping table, which gives very little flexibility in setting up a
mapping.

As an FYI, ATU entries can match PCIe transactions either:
a) Any transaction received on a particular BAR.
b) Any transaction received within a single contiguous window of PCIe
address space. This kind of mapping entry obviously has to be set up
after device enumeration is complete so that it can match the correct
PCIe address.

Each ATU entry maps a single contiguous set of PCIe addresses to a
single contiguous set of IOVAs which are passed to the IOMMU.
Transactions can pass through the ATU without being translated if
desired.

3) The transaction is passed to the IOMMU, which can again re-write
the address that the transaction accesses.

4) The transaction is passed to the memory controller and reads/writes 
DRAM.


In general, we want to be able to expose a large and dynamic set of
data buffers to the PCIe bus; certainly /far/ more than two separate
buffers (the number of ATU table entries). With current Linux APIs,
these buffers will not be located in contiguous or adjacent physical
(DRAM) or virtual (IOVA) addresses, nor in any particular window of
physical or IOVA addresses. However, the ATU's mapping from PCIe to
IOVA can only expose one or two contiguous ranges of IOVA space. These
two sets of requirements are at odds!

So, I'd like to propose some new APIs that the PCIe endpoint driver can 
use:


1) Allocate/reserve an IOVA range of specified size, but don't map
anything into the IOVA range.


I had done some work on this in the past, those patches were tested on 
Broadcom HW.


https://lkml.org/lkml/2017/5/16/23,
https://lkml.org/lkml/2017/5/16/21,
https://lkml.org/lkml/2017/5/16/19

I could not pursue it further, since I do not have the same HW to test 
it.
Although now in Qualcomm SOC, we do use Synopsis Designware PCIe 
controller

but we dont restrict inbound addresses range for our SOC.

of course these patches can easily be ported, and extended.
they basically reserve IOVA ranges based on inbound dma-ranges DT 
property.


Regards,
Oza.



2) De-allocate the IOVA range allocated in (1).

3) Map a specific set (scatter-gather list I suppose) of
already-allocated/extant physical addresses into part of an IOVA range
allocated in (1).

4) Unmap a portion of an IOVA range that was mapped by (3).

One final note:

The memory controller can translate accesses to a small region of DRAM
address space into accesses to an interrupt generation module. This
allows devices attached to the PCIe bus to generate interrupts to
software running on the system with the PCIe endpoint controller. Thus
I deliberately described API 3 above as mapping a specific physical
address into IOVA space, as opposed to mapping an existing DRAM
allocation into IOVA space, in order to allow mapping this interrupt
generation address space into IOVA space. If we needed separate APIs
to map physical addresses vs. DRAM allocations into IOVA space, that
would likely be fine too.

Does this API proposal sound reasonable?

I have heard from some NVIDIA developers that the above APIs rather go
against the principle that individual drivers should not be aware of
the presence/absence of an IOMMU, and hence direct management of IOVA
allocation/layout is deliberately avoided, and hence there hasn't been
a need/desire for this kind of API in the past. However, I think our
current hardware design and use-case rather requires it. Do you agree?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu