from:"Oza Pawandeep"

[PATCH v6 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-15 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the following problems of_dma_get_range.
1) return of wrong size as 0.
2) not handling absence of dma-ranges which is valid for PCI master.
3) not handling multipe inbound windows.
4) in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..800731c 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+   u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+*/
+   if (!ranges)
+   break;
+   }
+
+   if (!ranges) {
+   pr_debug("no dma-ranges found for node(%s)\n", np->full_name);
+   r

[PATCH v6 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-15 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the following problems of_dma_get_range.
1) return of wrong size as 0.
2) not handling absence of dma-ranges which is valid for PCI master.
3) not handling multipe inbound windows.
4) in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..800731c 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+   u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+*/
+   if (!ranges)
+   break;
+   }
+
+   if (!ranges) {
+   pr_debug("no dma-ranges found for node(%s)\n", np->full_name);
+   ret = -ENODEV;
+

[PATCH v5 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..b43e347 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+*/
+   if (!ranges)
+   break;
+   }
+
+   if (!ranges) {
+   pr_debug("no dma-ranges found for node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   len /= sizeof(u32);
+
+

[PATCH v5 2/3] iommu/pci: reserve IOVA for PCI masters

2017-05-05 Thread Oza Pawandeep

this patch reserves the IOVA for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
<0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
<0x0090 0x 0x4 0x>, /* 16G @ 576G */
<0x00a0 0x 0x4 0x>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.

to address this problem, iommu has to avoid allocating IOVA which
are reserved. which inturn does not allocate IOVA if it falls into hole.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
struct iova_domain *iovad)
 {
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct device_node *np = bridge->dev.parent->of_node;
struct resource_entry *window;
unsigned long lo, hi;
+   int ret;
+   dma_addr_t tmp_dma_addr = 0, dma_addr;
+   LIST_HEAD(res);
 
resource_list_for_each_entry(window, >windows) {
if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
hi = iova_pfn(iovad, window->res->end - window->offset);
reserve_iova(iovad, lo, hi);
}
+
+   /* PCI inbound memory reservation. */
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   dma_addr = res_dma->start - window->offset;
+   if (tmp_dma_addr > dma_addr) {
+   pr_warn("PCI: failed to reserve iovas; ranges 
should be sorted\n");
+   return;
+   }
+   if (tmp_dma_addr != dma_addr) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad, dma_addr - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   tmp_dma_addr = window->res->end - window->offset;
+   }
+   /*
+* the last dma-range should honour based on the
+* 32/64-bit dma addresses.
+*/
+   if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   }
 }
 
 /**
-- 
1.9.1

[PATCH v5 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..b43e347 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+*/
+   if (!ranges)
+   break;
+   }
+
+   if (!ranges) {
+   pr_debug("no dma-ranges found for node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   len /= sizeof(u32);
+
+   pna = of_n_addr_cells(node

[PATCH v5 2/3] iommu/pci: reserve IOVA for PCI masters

2017-05-05 Thread Oza Pawandeep

this patch reserves the IOVA for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
<0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
<0x0090 0x 0x4 0x>, /* 16G @ 576G */
<0x00a0 0x 0x4 0x>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.

to address this problem, iommu has to avoid allocating IOVA which
are reserved. which inturn does not allocate IOVA if it falls into hole.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
struct iova_domain *iovad)
 {
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct device_node *np = bridge->dev.parent->of_node;
struct resource_entry *window;
unsigned long lo, hi;
+   int ret;
+   dma_addr_t tmp_dma_addr = 0, dma_addr;
+   LIST_HEAD(res);
 
resource_list_for_each_entry(window, >windows) {
if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
hi = iova_pfn(iovad, window->res->end - window->offset);
reserve_iova(iovad, lo, hi);
}
+
+   /* PCI inbound memory reservation. */
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   dma_addr = res_dma->start - window->offset;
+   if (tmp_dma_addr > dma_addr) {
+   pr_warn("PCI: failed to reserve iovas; ranges 
should be sorted\n");
+   return;
+   }
+   if (tmp_dma_addr != dma_addr) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad, dma_addr - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   tmp_dma_addr = window->res->end - window->offset;
+   }
+   /*
+* the last dma-range should honour based on the
+* 32/64-bit dma addresses.
+*/
+   if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   }
 }
 
 /**
-- 
1.9.1

[PATCH v5 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+stati

[PATCH v5 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static inli

[PATCH v5 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v4:
- something wrong with my mail client: not all teh patch went out, attempting 
again:
Changes since v3:
- minor change, redudant checkes removed
Changes since v2:
- removed internal review
Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin

Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve IOVA for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c |  35 
 drivers/of/address.c  | 216 --
 drivers/of/of_pci.c   |  77 +
 include/linux/of_pci.h|   7 ++
 4 files changed, 272 insertions(+), 63 deletions(-)

-- 
1.9.1

[PATCH v5 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v4:
- something wrong with my mail client: not all teh patch went out, attempting 
again:
Changes since v3:
- minor change, redudant checkes removed
Changes since v2:
- removed internal review
Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin

Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve IOVA for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c |  35 
 drivers/of/address.c  | 216 --
 drivers/of/of_pci.c   |  77 +
 include/linux/of_pci.h|   7 ++
 4 files changed, 272 insertions(+), 63 deletions(-)

-- 
1.9.1

[PATCH v4 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v3:
- redudant check removed
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
Changes since v2:
- internal review names removed.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v4 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v3:
- redudant check removed
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
Changes since v2:
- internal review names removed.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v4 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..b43e347 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+*/
+   if (!ranges)
+   break;
+   }
+
+   if (!ranges) {
+   pr_debug("no dma-ranges found for node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   len /= sizeof(u32);
+
+

[PATCH v4 2/3] iommu/pci: reserve IOVA for PCI masters

2017-05-05 Thread Oza Pawandeep

this patch reserves the IOVA for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
<0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
<0x0090 0x 0x4 0x>, /* 16G @ 576G */
<0x00a0 0x 0x4 0x>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.

to address this problem, iommu has to avoid allocating IOVA which
are reserved. which inturn does not allocate IOVA if it falls into hole.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
struct iova_domain *iovad)
 {
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct device_node *np = bridge->dev.parent->of_node;
struct resource_entry *window;
unsigned long lo, hi;
+   int ret;
+   dma_addr_t tmp_dma_addr = 0, dma_addr;
+   LIST_HEAD(res);
 
resource_list_for_each_entry(window, >windows) {
if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
hi = iova_pfn(iovad, window->res->end - window->offset);
reserve_iova(iovad, lo, hi);
}
+
+   /* PCI inbound memory reservation. */
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   dma_addr = res_dma->start - window->offset;
+   if (tmp_dma_addr > dma_addr) {
+   pr_warn("PCI: failed to reserve iovas; ranges 
should be sorted\n");
+   return;
+   }
+   if (tmp_dma_addr != dma_addr) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad, dma_addr - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   tmp_dma_addr = window->res->end - window->offset;
+   }
+   /*
+* the last dma-range should honour based on the
+* 32/64-bit dma addresses.
+*/
+   if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   }
 }
 
 /**
-- 
1.9.1

[PATCH v4 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..b43e347 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+*/
+   if (!ranges)
+   break;
+   }
+
+   if (!ranges) {
+   pr_debug("no dma-ranges found for node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   len /= sizeof(u32);
+
+   pna = of_n_addr_cells(node

[PATCH v4 2/3] iommu/pci: reserve IOVA for PCI masters

2017-05-05 Thread Oza Pawandeep

this patch reserves the IOVA for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
<0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
<0x0090 0x 0x4 0x>, /* 16G @ 576G */
<0x00a0 0x 0x4 0x>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.

to address this problem, iommu has to avoid allocating IOVA which
are reserved. which inturn does not allocate IOVA if it falls into hole.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
struct iova_domain *iovad)
 {
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct device_node *np = bridge->dev.parent->of_node;
struct resource_entry *window;
unsigned long lo, hi;
+   int ret;
+   dma_addr_t tmp_dma_addr = 0, dma_addr;
+   LIST_HEAD(res);
 
resource_list_for_each_entry(window, >windows) {
if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
hi = iova_pfn(iovad, window->res->end - window->offset);
reserve_iova(iovad, lo, hi);
}
+
+   /* PCI inbound memory reservation. */
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   dma_addr = res_dma->start - window->offset;
+   if (tmp_dma_addr > dma_addr) {
+   pr_warn("PCI: failed to reserve iovas; ranges 
should be sorted\n");
+   return;
+   }
+   if (tmp_dma_addr != dma_addr) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad, dma_addr - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   tmp_dma_addr = window->res->end - window->offset;
+   }
+   /*
+* the last dma-range should honour based on the
+* 32/64-bit dma addresses.
+*/
+   if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   }
 }
 
 /**
-- 
1.9.1

[PATCH v4 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+stati

[PATCH v4 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static inli

[PATCH v3 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v2:
- Remove internal detailes such as reviewed by  

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v3 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v2:
- Remove internal detailes such as reviewed by  

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v3 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..cc0fc28 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   }
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+

[PATCH v3 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..cc0fc28 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   }
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+

[PATCH v3 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+stati

[PATCH v3 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static inli

[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v2:
- Remove internal detailes such as reviewed by  

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v2:
- Remove internal detailes such as reviewed by  

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+stati

[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..cc0fc28 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   }
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+

[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static inli

[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..cc0fc28 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   }
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+

[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..cc0fc28 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   }
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+

[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations

2017-05-05 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound 
transaction addressing.

This is particularly problematic on ARM/ARM64 SOCs where the 
IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions
only after PCI Host has forwarded these transactions on SOC IO bus. 

This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions
has to honor the addressing restrictions of the PCI Host.

Current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices but no implementation exists for pci devices.

For e.g. iproc based SOCs and other SOCs (such as rcar) have
PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

This patchset
reserves the IOVA ranges for PCI masters based
on the PCI world dma-ranges.
fix of_dma_get_range to cater to PCI dma-ranges.
fix of_dma_get_range which currently returns size 0 for PCI devices.

IOVA allocation patch:
[PATCH 2/3] iommu/pci: reserve iova for PCI masters

Fix of_dma_get_range bug and address PCI master.
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific

Base patch for both of the above patches:
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

Changes since v1:
- Remove internal GERRIT details from patch descriptions
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of this function to of_bus_default_dma_get_ranges
  and adding that to the default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
- no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since 
under discussion with Robin
  
Oza Pawandeep (3):
  of/pci/dma: fix DMA configuration for PCI masters
  iommu/pci: reserve iova for PCI masters
  PCI/of fix of_dma_get_range; get PCI specific dma-ranges

 drivers/iommu/dma-iommu.c | 35 +
 drivers/of/address.c  | 52 
 drivers/of/of_pci.c   | 77 +++
 include/linux/of_pci.h|  7 +
 4 files changed, 171 insertions(+)

-- 
1.9.1

[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes the bug in of_dma_get_range, which with as is,
parses the PCI memory ranges and return wrong size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which IOVA allocation space will honour PCI host
bridge limitations.

the implementation hooks bus specific callbacks for getting
dma-ranges.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..cc0fc28 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@ struct of_bus {
int na, int ns, int pna);
int (*translate)(__be32 *addr, u64 offset, int na);
unsigned int(*get_flags)(const __be32 *addr);
+   int (*get_dma_ranges)(struct device_node *np,
+ u64 *dma_addr, u64 *paddr, u64 *size);
 };
 
 /*
@@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, 
int na)
 {
return of_bus_default_translate(addr + 1, offset, na - 1);
 }
+
+static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
+u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int ret = 0;
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /*
+* return the largest possible size,
+* since PCI master allows everything.
+*/
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1;
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   }
+
+   of_node_put(node);
+
+   return ret;
+}
+
+static int get_dma_ranges(struct device_node *np, u64 *dma_addr,
+ u64 *paddr, u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   const __be32 *ranges = NULL;
+   int len, naddr, nsize, pna;
+   int ret = 0;
+   u64 dmaaddr;
+
+   if (!node)
+   return -EINVAL;
+
+   while (1) {
+   naddr = of_n_addr_cells(node);
+   nsize = of_n_size_cells(node);
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
+
+   ranges = of_get_property(node, "dma-ranges", );
+
+   /* Ignore empty ranges, they imply no translation required */
+   if (ranges && len > 0)
+   break;
+
+   /*
+* At least empty ranges has to be defined for parent node if
+* DMA is supported
+

[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>
Reviewed-by: Ray Jui <ray@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_

[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-05 Thread Oza Pawandeep

current device framework and OF framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static

[PATCH 2/3] iommu/pci: reserve iova for PCI masters

2017-05-02 Thread Oza Pawandeep

this patch reserves the iova for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
<0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
<0x0090 0x 0x4 0x>, /* 16G @ 576G */
<0x00a0 0x 0x4 0x>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.

to address this problem, iommu has to avoid allocating iovas which
are reserved. which inturn does not allocate iova if it falls into hole.

Bug: SOC-5216
Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
Signed-off-by: Oza Pawandeep <oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
Reviewed-by: vpx_checkpatch status <vpx_checkpa...@broadcom.com>
Reviewed-by: CCXSW <ccxswbu...@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobu...@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoket...@broadcom.com>
Tested-by: CCXSW <ccxswbu...@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
struct iova_domain *iovad)
 {
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct device_node *np = bridge->dev.parent->of_node;
struct resource_entry *window;
unsigned long lo, hi;
+   int ret;
+   dma_addr_t tmp_dma_addr = 0, dma_addr;
+   LIST_HEAD(res);
 
resource_list_for_each_entry(window, >windows) {
if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
hi = iova_pfn(iovad, window->res->end - window->offset);
reserve_iova(iovad, lo, hi);
}
+
+   /* PCI inbound memory reservation. */
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   dma_addr = res_dma->start - window->offset;
+   if (tmp_dma_addr > dma_addr) {
+   pr_warn("PCI: failed to reserve iovas; ranges 
should be sorted\n");
+   return;
+   }
+   if (tmp_dma_addr != dma_addr) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad, dma_addr - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   tmp_dma_addr = window->res->end - window->offset;
+   }
+   /*
+* the last dma-range should honour based on the
+* 32/64-bit dma addresses.
+*/
+   if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   }
 }
 
 /**
-- 
1.9.1

[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-02 Thread Oza Pawandeep

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes this patch fixes the bug in of_dma_get_range,
which with as is, parses the PCI memory ranges and return wrong
size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which iova allocation space will honour PCI host
bridge limitations.

Bug: SOC-5216
Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
Signed-off-by: Oza Pawandeep <oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
Reviewed-by: vpx_checkpatch status <vpx_checkpa...@broadcom.com>
Reviewed-by: CCXSW <ccxswbu...@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobu...@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoket...@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..f7734fc 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 
*dma_addr, u64 *paddr, u64 *siz
int ret = 0;
u64 dmaaddr;
 
+#ifdef CONFIG_PCI
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /* ignore the empty ranges. */
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8);
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_err("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   goto out;
+   }
+#endif
+
if (!node)
return -EINVAL;
 
-- 
1.9.1

[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-02 Thread Oza Pawandeep

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Bug: SOC-5216
Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
Signed-off-by: Oza Pawandeep <oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
Reviewed-by: vpx_checkpatch status <vpx_checkpa...@broadcom.com>
Reviewed-by: CCXSW <ccxswbu...@broadcom.com>
Reviewed-by: Ray Jui <ray@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobu...@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoket...@broadcom.com>
Tested-by: CCXSW <ccxswbu...@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,

[PATCH 2/3] iommu/pci: reserve iova for PCI masters

2017-05-02 Thread Oza Pawandeep

this patch reserves the iova for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
<0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
<0x0090 0x 0x4 0x>, /* 16G @ 576G */
<0x00a0 0x 0x4 0x>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.

to address this problem, iommu has to avoid allocating iovas which
are reserved. which inturn does not allocate iova if it falls into hole.

Bug: SOC-5216
Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
Signed-off-by: Oza Pawandeep 
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
Reviewed-by: vpx_checkpatch status 
Reviewed-by: CCXSW 
Tested-by: vpx_autobuild status 
Tested-by: vpx_smoketest status 
Tested-by: CCXSW 
Reviewed-by: Scott Branden 

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
struct iova_domain *iovad)
 {
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct device_node *np = bridge->dev.parent->of_node;
struct resource_entry *window;
unsigned long lo, hi;
+   int ret;
+   dma_addr_t tmp_dma_addr = 0, dma_addr;
+   LIST_HEAD(res);
 
resource_list_for_each_entry(window, >windows) {
if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
hi = iova_pfn(iovad, window->res->end - window->offset);
reserve_iova(iovad, lo, hi);
}
+
+   /* PCI inbound memory reservation. */
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   dma_addr = res_dma->start - window->offset;
+   if (tmp_dma_addr > dma_addr) {
+   pr_warn("PCI: failed to reserve iovas; ranges 
should be sorted\n");
+   return;
+   }
+   if (tmp_dma_addr != dma_addr) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad, dma_addr - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   tmp_dma_addr = window->res->end - window->offset;
+   }
+   /*
+* the last dma-range should honour based on the
+* 32/64-bit dma addresses.
+*/
+   if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }
+   }
 }
 
 /**
-- 
1.9.1

[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-02 Thread Oza Pawandeep

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes this patch fixes the bug in of_dma_get_range,
which with as is, parses the PCI memory ranges and return wrong
size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

based on which iova allocation space will honour PCI host
bridge limitations.

Bug: SOC-5216
Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
Signed-off-by: Oza Pawandeep 
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
Reviewed-by: vpx_checkpatch status 
Reviewed-by: CCXSW 
Reviewed-by: Scott Branden 
Tested-by: vpx_autobuild status 
Tested-by: vpx_smoketest status 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..f7734fc 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 
*dma_addr, u64 *paddr, u64 *siz
int ret = 0;
u64 dmaaddr;
 
+#ifdef CONFIG_PCI
+   struct resource_entry *window;
+   LIST_HEAD(res);
+
+   if (!node)
+   return -EINVAL;
+
+   if (of_bus_pci_match(np)) {
+   *size = 0;
+   /*
+* PCI dma-ranges is not mandatory property.
+* many devices do no need to have it, since
+* host bridge does not require inbound memory
+* configuration or rather have design limitations.
+* so we look for dma-ranges, if missing we
+* just return the caller full size, and also
+* no dma-ranges suggests that, host bridge allows
+* whatever comes in, so we set dma_addr to 0.
+*/
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+
+   /* ignore the empty ranges. */
+   if (*size == 0) {
+   pr_debug("empty/zero size dma-ranges found for 
node(%s)\n",
+   np->full_name);
+   *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8);
+   *dma_addr = *paddr = 0;
+   ret = 0;
+   }
+
+   pr_err("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+   goto out;
+   }
+#endif
+
if (!node)
return -EINVAL;
 
-- 
1.9.1

[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-02 Thread Oza Pawandeep

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Bug: SOC-5216
Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
Signed-off-by: Oza Pawandeep 
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
Reviewed-by: vpx_checkpatch status 
Reviewed-by: CCXSW 
Reviewed-by: Ray Jui 
Tested-by: vpx_autobuild status 
Tested-by: vpx_smoketest status 
Tested-by: CCXSW 
Reviewed-by: Scott Branden 

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res,
+   res->start - range.pci_addr);
+   }
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int

[RFC PATH] of/pci/dma: fix DMA configruation for PCI masters

2017-04-22 Thread Oza Pawandeep

current device frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this patch served following purposes

1) exposes intrface to the pci host driver for thir inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of of_dma_get_ranges does not need to change.
and

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..ec21191 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -829,10 +830,30 @@ int of_dma_get_range(struct device_node *np, u64 
*dma_addr, u64 *paddr, u64 *siz
int len, naddr, nsize, pna;
int ret = 0;
u64 dmaaddr;
+   struct resource_entry *window;
+   LIST_HEAD(res);
 
if (!node)
return -EINVAL;
 
+   if (strcmp(np->name, "pci")) {
+   *size = 0;
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+   goto out;
+   }
+
while (1) {
naddr = of_n_addr_cells(node);
nsize = of_n_size_cells(node);
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..2aa9401 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+

[RFC PATH] of/pci/dma: fix DMA configruation for PCI masters

2017-04-22 Thread Oza Pawandeep

current device frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this patch served following purposes

1) exposes intrface to the pci host driver for thir inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of of_dma_get_ranges does not need to change.
and

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..ec21191 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -829,10 +830,30 @@ int of_dma_get_range(struct device_node *np, u64 
*dma_addr, u64 *paddr, u64 *siz
int len, naddr, nsize, pna;
int ret = 0;
u64 dmaaddr;
+   struct resource_entry *window;
+   LIST_HEAD(res);
 
if (!node)
return -EINVAL;
 
+   if (strcmp(np->name, "pci")) {
+   *size = 0;
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+
+   if (*size < resource_size(res_dma)) {
+   *dma_addr = res_dma->start - window->offset;
+   *paddr = res_dma->start;
+   *size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+   goto out;
+   }
+
while (1) {
naddr = of_n_addr_cells(node);
nsize = of_n_size_cells(node);
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..2aa9401 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+ np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if

[RFC PATCH 2/2] of/pci: call pci specific dma-ranges instead of memory-mapped.

2017-03-30 Thread Oza Pawandeep

it is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

this patch calls pci specific of_pci_dma_get_ranges, instead of
calling memory-mapped one, which returns wrong size and also
not meant for PCI world.

with this, it gets accurate resources back, and largest possible
inbound window size. with that largest possible dma_mask can be
generated.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/device.c b/drivers/of/device.c
index b1e6beb..d6a8dde 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "of_private.h"
@@ -89,6 +90,8 @@ void of_dma_configure(struct device *dev, struct device_node 
*np)
bool coherent;
unsigned long offset;
const struct iommu_ops *iommu;
+   struct resource_entry *window;
+   LIST_HEAD(res);
 
/*
 * Set default coherent_dma_mask to 32 bit.  Drivers are expected to
@@ -104,7 +107,24 @@ void of_dma_configure(struct device *dev, struct 
device_node *np)
if (!dev->dma_mask)
dev->dma_mask = >coherent_dma_mask;
 
-   ret = of_dma_get_range(np, _addr, , );
+   if (dev_is_pci(dev)) {
+   size = 0;
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+   if (size < resource_size(res_dma)) {
+   dma_addr = res_dma->start - 
window->offset;
+   paddr = res_dma->start;
+   size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+   }
+   else
+   ret = of_dma_get_range(np, _addr, , );
+
if (ret < 0) {
dma_addr = offset = 0;
size = dev->coherent_dma_mask + 1;
-- 
1.9.1

[RFC PATCH 2/2] of/pci: call pci specific dma-ranges instead of memory-mapped.

2017-03-30 Thread Oza Pawandeep

it is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

this patch calls pci specific of_pci_dma_get_ranges, instead of
calling memory-mapped one, which returns wrong size and also
not meant for PCI world.

with this, it gets accurate resources back, and largest possible
inbound window size. with that largest possible dma_mask can be
generated.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/device.c b/drivers/of/device.c
index b1e6beb..d6a8dde 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "of_private.h"
@@ -89,6 +90,8 @@ void of_dma_configure(struct device *dev, struct device_node 
*np)
bool coherent;
unsigned long offset;
const struct iommu_ops *iommu;
+   struct resource_entry *window;
+   LIST_HEAD(res);
 
/*
 * Set default coherent_dma_mask to 32 bit.  Drivers are expected to
@@ -104,7 +107,24 @@ void of_dma_configure(struct device *dev, struct 
device_node *np)
if (!dev->dma_mask)
dev->dma_mask = >coherent_dma_mask;
 
-   ret = of_dma_get_range(np, _addr, , );
+   if (dev_is_pci(dev)) {
+   size = 0;
+   ret = of_pci_get_dma_ranges(np, );
+   if (!ret) {
+   resource_list_for_each_entry(window, ) {
+   struct resource *res_dma = window->res;
+   if (size < resource_size(res_dma)) {
+   dma_addr = res_dma->start - 
window->offset;
+   paddr = res_dma->start;
+   size = resource_size(res_dma);
+   }
+   }
+   }
+   pci_free_resource_list();
+   }
+   else
+   ret = of_dma_get_range(np, _addr, , );
+
if (ret < 0) {
dma_addr = offset = 0;
size = dev->coherent_dma_mask + 1;
-- 
1.9.1

[RFC PATCH 1/2] of/pci: implement inbound dma-ranges for PCI

2017-03-30 Thread Oza Pawandeep

current device frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and other SOCs(rcar) have PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this exposes intrface not only to the pci host driver, but also
it aims to provide an interface to callers such as of_dma_configure.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

also this patch intends to handle multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..5299438 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,80 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a PCI host bridge 
device
+ * node and setup the resource mapping based on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for 
node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res, res->start - 
range.pci_addr);
+}
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..8509e3d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, struct 
list_head *resources)
+{
+   return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
-- 
1.9.1

[RFC PATCH 1/2] of/pci: implement inbound dma-ranges for PCI

2017-03-30 Thread Oza Pawandeep

current device frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and other SOCs(rcar) have PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this exposes intrface not only to the pci host driver, but also
it aims to provide an interface to callers such as of_dma_configure.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

also this patch intends to handle multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..5299438 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,80 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a PCI host bridge 
device
+ * node and setup the resource mapping based on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen;
+   int ret = 0;
+   const int na = 3, ns = 2;
+   struct resource *res;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for 
node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+
+   for_each_of_pci_range(, ) {
+   /*
+* If we failed translation or got a zero-sized region
+* then skip this range
+*/
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+   continue;
+
+   res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto parse_failed;
+   }
+
+   ret = of_pci_range_to_resource(, np, res);
+   if (ret) {
+   kfree(res);
+   continue;
+   }
+
+   pci_add_resource_offset(resources, res, res->start - 
range.pci_addr);
+}
+
+   return ret;
+
+parse_failed:
+   pci_free_resource_list(resources);
+out:
+   of_node_put(node);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..8509e3d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, struct 
list_head *resources)
+{
+   return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
-- 
1.9.1

[RFC PATCH 2/3] iommu/dma: account pci host bridge dma_mask for IOVA allocation

2017-03-24 Thread Oza Pawandeep

it is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

Reviewed-by: Anup Patel <anup.pa...@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>
Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+   def_bool y
+
 config SMP
def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
void *iommu;/* private IOMMU data */
 #endif
+   u64 parent_dma_mask;
bool dma_coherent;
 };
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void 
*virt, phys_addr_t phys)
__dma_flush_area(virt, PAGE_SIZE);
 }
 
+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 dma_addr_t *handle, gfp_t gfp,
 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
 }
 
+static int __iommu_set_dma_mask(struct device *dev, u64 mask)
+{
+   /* device is not DMA capable */
+   if (!dev->dma_mask)
+   return -EIO;
+
+   if (mask > dev->archdata.parent_dma_mask)
+   mask = dev->archdata.parent_dma_mask;
+
+   *dev->dma_mask = mask;
+
+   return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc = __iommu_alloc_attrs,
.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
.map_resource = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.mapping_error = iommu_dma_mapping_error,
+   .set_dma_mask = __iommu_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+   if (get_dma_ops(dev) == _dma_ops &&
+   mask > dev->archdata.parent_dma_mask)
+   mask = dev->archdata.parent_dma_mask;
+
+   dev->coherent_dma_mask = mask;
+   return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
if (!dev->dma_ops)
dev->dma_ops = _dma_ops;
 
+   dev->archdata.parent_dma_mask = size - 1;
+
dev->archdata.dma_coherent = coherent;
__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 }
diff --git a/drivers/of/device.c b/drivers/of/device.c
index d362a98..471dcdf 100644
--- a/drivers/of/devi

[RFC PATCH 3/3] of: fix node traversing in of_dma_get_range

2017-03-24 Thread Oza Pawandeep

it jumps to the parent node without examining the child node.
also with that, it throws "no dma-ranges found for node"
for pci dma-ranges.

this patch fixes device node traversing for dma-ranges.

Reviewed-by: Anup Patel <anup.pa...@broadcom.com>
Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..3293d55 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -836,9 +836,6 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, 
u64 *paddr, u64 *siz
while (1) {
naddr = of_n_addr_cells(node);
nsize = of_n_size_cells(node);
-   node = of_get_next_parent(node);
-   if (!node)
-   break;
 
ranges = of_get_property(node, "dma-ranges", );
 
@@ -852,6 +849,10 @@ int of_dma_get_range(struct device_node *np, u64 
*dma_addr, u64 *paddr, u64 *siz
 */
if (!ranges)
break;
+
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
}
 
if (!ranges) {
-- 
1.9.1

[RFC PATCH 2/3] iommu/dma: account pci host bridge dma_mask for IOVA allocation

2017-03-24 Thread Oza Pawandeep

it is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

Reviewed-by: Anup Patel 
Reviewed-by: Scott Branden 
Signed-off-by: Oza Pawandeep 

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+   def_bool y
+
 config SMP
def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
void *iommu;/* private IOMMU data */
 #endif
+   u64 parent_dma_mask;
bool dma_coherent;
 };
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void 
*virt, phys_addr_t phys)
__dma_flush_area(virt, PAGE_SIZE);
 }
 
+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 dma_addr_t *handle, gfp_t gfp,
 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
 }
 
+static int __iommu_set_dma_mask(struct device *dev, u64 mask)
+{
+   /* device is not DMA capable */
+   if (!dev->dma_mask)
+   return -EIO;
+
+   if (mask > dev->archdata.parent_dma_mask)
+   mask = dev->archdata.parent_dma_mask;
+
+   *dev->dma_mask = mask;
+
+   return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc = __iommu_alloc_attrs,
.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
.map_resource = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.mapping_error = iommu_dma_mapping_error,
+   .set_dma_mask = __iommu_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+   if (get_dma_ops(dev) == _dma_ops &&
+   mask > dev->archdata.parent_dma_mask)
+   mask = dev->archdata.parent_dma_mask;
+
+   dev->coherent_dma_mask = mask;
+   return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
if (!dev->dma_ops)
dev->dma_ops = _dma_ops;
 
+   dev->archdata.parent_dma_mask = size - 1;
+
dev->archdata.dma_coherent = coherent;
__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 }
diff --git a/drivers/of/device.c b/drivers/of/device.c
index d362a98..471dcdf 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -139,10 +139,8 @@ void of_dma_configure(struct device *d

[RFC PATCH 3/3] of: fix node traversing in of_dma_get_range

2017-03-24 Thread Oza Pawandeep

it jumps to the parent node without examining the child node.
also with that, it throws "no dma-ranges found for node"
for pci dma-ranges.

this patch fixes device node traversing for dma-ranges.

Reviewed-by: Anup Patel 
Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..3293d55 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -836,9 +836,6 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, 
u64 *paddr, u64 *siz
while (1) {
naddr = of_n_addr_cells(node);
nsize = of_n_size_cells(node);
-   node = of_get_next_parent(node);
-   if (!node)
-   break;
 
ranges = of_get_property(node, "dma-ranges", );
 
@@ -852,6 +849,10 @@ int of_dma_get_range(struct device_node *np, u64 
*dma_addr, u64 *paddr, u64 *siz
 */
if (!ranges)
break;
+
+   node = of_get_next_parent(node);
+   if (!node)
+   break;
}
 
if (!ranges) {
-- 
1.9.1

[RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-24 Thread Oza Pawandeep

it is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

current pcie frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

Reviewed-by: Anup Patel <anup.pa...@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>
Signed-off-by: Oza Pawandeep <oza@broadcom.com>

Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/drivers/of/device.c b/drivers/of/device.c
index b1e6beb..d362a98 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "of_private.h"
@@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct 
device_node *np)
if (!dev->dma_mask)
dev->dma_mask = >coherent_dma_mask;
 
-   ret = of_dma_get_range(np, _addr, , );
+   if (dev_is_pci(dev))
+   ret = of_pci_get_dma_ranges(np, _addr, , );
+   else
+   ret = of_dma_get_range(np, _addr, , );
+
if (ret < 0) {
dma_addr = offset = 0;
size = dev->coherent_dma_mask + 1;
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..c7f8626 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,52 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, 
u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen, ret = 0;
+   const int na = 3, ns = 2;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for 
node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+   *size = 0;
+
+   for_each_of_pci_range(, ) {
+   if (*size < range.size) {
+   *dma_addr = range.pci_addr;
+   *size = range.size;
+   *paddr = range.cpu_addr;
+   }
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+*dma_addr = range.pci_addr;
+*size = range.size;
+
+out:
+   of_node_put(node);
+   return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, 
u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_no

[RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-24 Thread Oza Pawandeep

it is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

current pcie frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges

this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

Reviewed-by: Anup Patel 
Reviewed-by: Scott Branden 
Signed-off-by: Oza Pawandeep 

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/of/device.c b/drivers/of/device.c
index b1e6beb..d362a98 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "of_private.h"
@@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct 
device_node *np)
if (!dev->dma_mask)
dev->dma_mask = >coherent_dma_mask;
 
-   ret = of_dma_get_range(np, _addr, , );
+   if (dev_is_pci(dev))
+   ret = of_pci_get_dma_ranges(np, _addr, , );
+   else
+   ret = of_dma_get_range(np, _addr, , );
+
if (ret < 0) {
dma_addr = offset = 0;
size = dev->coherent_dma_mask + 1;
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..c7f8626 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,52 @@ int of_pci_get_host_bridge_resources(struct device_node 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, 
u64 *size)
+{
+   struct device_node *node = of_node_get(np);
+   int rlen, ret = 0;
+   const int na = 3, ns = 2;
+   struct of_pci_range_parser parser;
+   struct of_pci_range range;
+
+   if (!node)
+   return -EINVAL;
+
+   parser.node = node;
+   parser.pna = of_n_addr_cells(node);
+   parser.np = parser.pna + na + ns;
+
+   parser.range = of_get_property(node, "dma-ranges", );
+
+   if (!parser.range) {
+   pr_debug("pcie device has no dma-ranges defined for 
node(%s)\n", np->full_name);
+   ret = -ENODEV;
+   goto out;
+   }
+
+   parser.end = parser.range + rlen / sizeof(__be32);
+   *size = 0;
+
+   for_each_of_pci_range(, ) {
+   if (*size < range.size) {
+   *dma_addr = range.pci_addr;
+   *size = range.size;
+   *paddr = range.cpu_addr;
+   }
+   }
+
+   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+*dma_addr, *paddr, *size);
+*dma_addr = range.pci_addr;
+*size = range.size;
+
+out:
+   of_node_put(node);
+   return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, 
u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct 
device_node *dev,
 {
return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,

[RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA allocation

2017-03-17 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges
since there is an absense of the same, the dma_mask used to remain 32bit 
because of
0 size return (parsed by of_dma_configure())

this patch also implements of_pci_get_dma_ranges to cater to pci world 
dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

conclusion: there are following problems
1) linux pci and iommu framework integration has glitches with respect to 
dma-ranges
2) pci linux framework look very uncertain about dma-ranges, rather binding is 
not defined
   the way it is defined for memory mapped devices.
   rcar and iproc based SOCs use their custom one dma-ranges
   (rather can be standard)
3) even if in case of default parser of_dma_get_ranges,:
   it throws and erro"
   "no dma-ranges found for node"
   because of the bug which exists.
   following lines should be moved to the end of while(1)
839 node = of_get_next_parent(node);
840 if (!node)
841 break;

Reviewed-by: Anup Patel <anup.pa...@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>
Signed-off-by: Oza Pawandeep <oza@broadcom.com>

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+   def_bool y
+
 config SMP
def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
void *iommu;/* private IOMMU data */
 #endif
+   u64 parent_dma_mask;
bool dma_coherent;
 };
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void 
*virt, phys_addr_t phys)
__dma_flush_area(virt, PAGE_SIZE);
 }
 
+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 dma_addr_t *handle, gfp_t gfp,
 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
 }
 
+static int __iommu_set_dma_mask(struct device *dev, u64 mask)
+{
+   /* device is not DMA capable */
+   if (!dev->dma_mask)
+   return -EIO;
+
+   if (mask > dev->archdata.parent_dma_mask)
+   mask = dev->archdata.parent_dma_mask;
+
+   *dev->dma_mask = mask;
+
+   return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc = __iommu_alloc_attrs,
.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
.map_resource = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.mapping_error = iommu_dma_mapping_error,
+   .set_dma_mask = __iommu_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)

[RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA allocation

2017-03-17 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges
in a way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped devices.
but no implementation exists for pci to take care of pcie based memory ranges.
in fact pci world doesnt seem to define standard dma-ranges
since there is an absense of the same, the dma_mask used to remain 32bit 
because of
0 size return (parsed by of_dma_configure())

this patch also implements of_pci_get_dma_ranges to cater to pci world 
dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7f.

conclusion: there are following problems
1) linux pci and iommu framework integration has glitches with respect to 
dma-ranges
2) pci linux framework look very uncertain about dma-ranges, rather binding is 
not defined
   the way it is defined for memory mapped devices.
   rcar and iproc based SOCs use their custom one dma-ranges
   (rather can be standard)
3) even if in case of default parser of_dma_get_ranges,:
   it throws and erro"
   "no dma-ranges found for node"
   because of the bug which exists.
   following lines should be moved to the end of while(1)
839 node = of_get_next_parent(node);
840 if (!node)
841 break;

Reviewed-by: Anup Patel 
Reviewed-by: Scott Branden 
Signed-off-by: Oza Pawandeep 

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+   def_bool y
+
 config SMP
def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
void *iommu;/* private IOMMU data */
 #endif
+   u64 parent_dma_mask;
bool dma_coherent;
 };
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void 
*virt, phys_addr_t phys)
__dma_flush_area(virt, PAGE_SIZE);
 }
 
+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 dma_addr_t *handle, gfp_t gfp,
 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
 }
 
+static int __iommu_set_dma_mask(struct device *dev, u64 mask)
+{
+   /* device is not DMA capable */
+   if (!dev->dma_mask)
+   return -EIO;
+
+   if (mask > dev->archdata.parent_dma_mask)
+   mask = dev->archdata.parent_dma_mask;
+
+   *dev->dma_mask = mask;
+
+   return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc = __iommu_alloc_attrs,
.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
.map_resource = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.mapping_error = iommu_dma_mapping_error,
+   .set_dma_mask = __iommu_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+   if (get_dma_ops(dev) == _dma_ops &&
+   mask > d

[RFC PATCH] iommu/dma: check pci host bridge dma_mask for IOVA allocation

2017-03-14 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

This patch tries to solve above described IOVA allocation
problem by:
1. Adding iommu_get_dma_mask() to get dma_mask of any device
2. For PCI device, iommu_get_dma_mask() compare dma_mask of
PCI device and corresponding PCI Host dma_mask (if set).
3. Use iommu_get_dma_mask() in IOMMU DMA ops implementation
instead of dma_get_mask()

Signed-off-by: Oza Pawandeep <oza@broadcom.com>
Reviewed-by: Anup Patel <anup.pa...@broadcom.com>
Reviewed-by: Scott Branden <scott.bran...@broadcom.com>
---
 drivers/iommu/dma-iommu.c | 44 
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..e93e536 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -108,6 +108,42 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_get_dma_cookie);
 
+static u64 __iommu_dma_mask(struct device *dev, bool is_coherent)
+{
+#ifdef CONFIG_PCI
+   u64 pci_hb_dma_mask;
+
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
+
+   if ((!is_coherent) && !(br->dev.dma_mask))
+   goto default_dev_dma_mask;
+
+   /* pci host bridge dma-mask. */
+   pci_hb_dma_mask = (!is_coherent) ? *br->dev.dma_mask :
+  br->dev.coherent_dma_mask;
+
+   if (pci_hb_dma_mask && ((pci_hb_dma_mask) < (*dev->dma_mask)))
+   return pci_hb_dma_mask;
+   }
+default_dev_dma_mask:
+#endif
+   return (!is_coherent) ? dma_get_mask(dev) :
+   dev->coherent_dma_mask;
+}
+
+static u64 __iommu_dma_get_coherent_mask(struct device *dev)
+{
+   return __iommu_dma_mask(dev, true);
+}
+
+static u64 __iommu_dma_get_mask(struct device *dev)
+{
+   return __iommu_dma_mask(dev, false);
+}
+
+
 /**
  * iommu_get_msi_cookie - Acquire just MSI remapping resources
  * @domain: IOMMU domain to prepare
@@ -461,7 +497,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
if (!pages)
return NULL;
 
-   iova = __alloc_iova(domain, size, dev->coherent_dma_mask, dev);
+   iova = __alloc_iova(domain, size, __iommu_dma_get_coherent_mask(dev), 
dev);
if (!iova)
goto out_free_pages;
 
@@ -532,7 +568,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
struct iova_domain *iovad = cookie_iovad(domain);
size_t iova_off = iova_offset(iovad, phys);
size_t len = iova_align(iovad, size + iova_off);
-   struct iova *iova = __alloc_iova(domain, len, dma_get_mask(dev), dev);
+   struct iova *iova = __alloc_iova(domain, len, 
__iommu_dma_get_mask(dev), dev);
 
if (!iova)
return DMA_ERROR_CODE;
@@ -690,7 +726,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist 
*sg,
prev = s;
}
 
-   iova = __alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
+   iova = __alloc_iova(domain, iova_len, __iommu_dma_get_mask(dev), dev);
if (!iova)
goto out_restore_sg;
 
@@ -760,7 +796,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
 
msi_page->phys = msi_addr;
if (iovad) {
-   iova = __alloc_iova(domain, size, dma_get_mask(dev), dev);
+   iova = __alloc_iova(domain, size, __iommu_dma_get_mask(dev), 
dev);
if (!iova)
goto out_free_page;
msi_page->iova = iova_dma_addr(iovad, iova);
-- 
1.9.1

[RFC PATCH] iommu/dma: check pci host bridge dma_mask for IOVA allocation

2017-03-14 Thread Oza Pawandeep

It is possible that PCI device supports 64-bit DMA addressing,
and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
however PCI host bridge may have limitations on the inbound
transaction addressing. As an example, consider NVME SSD device
connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask
when allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
PA for in-bound transactions only after PCI Host has forwarded
these transactions on SOC IO bus. This means on such ARM/ARM64
SOCs the IOVA of in-bound transactions has to honor the addressing
restrictions of the PCI Host.

This patch tries to solve above described IOVA allocation
problem by:
1. Adding iommu_get_dma_mask() to get dma_mask of any device
2. For PCI device, iommu_get_dma_mask() compare dma_mask of
PCI device and corresponding PCI Host dma_mask (if set).
3. Use iommu_get_dma_mask() in IOMMU DMA ops implementation
instead of dma_get_mask()

Signed-off-by: Oza Pawandeep 
Reviewed-by: Anup Patel 
Reviewed-by: Scott Branden 
---
 drivers/iommu/dma-iommu.c | 44 
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..e93e536 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -108,6 +108,42 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_get_dma_cookie);
 
+static u64 __iommu_dma_mask(struct device *dev, bool is_coherent)
+{
+#ifdef CONFIG_PCI
+   u64 pci_hb_dma_mask;
+
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
+
+   if ((!is_coherent) && !(br->dev.dma_mask))
+   goto default_dev_dma_mask;
+
+   /* pci host bridge dma-mask. */
+   pci_hb_dma_mask = (!is_coherent) ? *br->dev.dma_mask :
+  br->dev.coherent_dma_mask;
+
+   if (pci_hb_dma_mask && ((pci_hb_dma_mask) < (*dev->dma_mask)))
+   return pci_hb_dma_mask;
+   }
+default_dev_dma_mask:
+#endif
+   return (!is_coherent) ? dma_get_mask(dev) :
+   dev->coherent_dma_mask;
+}
+
+static u64 __iommu_dma_get_coherent_mask(struct device *dev)
+{
+   return __iommu_dma_mask(dev, true);
+}
+
+static u64 __iommu_dma_get_mask(struct device *dev)
+{
+   return __iommu_dma_mask(dev, false);
+}
+
+
 /**
  * iommu_get_msi_cookie - Acquire just MSI remapping resources
  * @domain: IOMMU domain to prepare
@@ -461,7 +497,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
if (!pages)
return NULL;
 
-   iova = __alloc_iova(domain, size, dev->coherent_dma_mask, dev);
+   iova = __alloc_iova(domain, size, __iommu_dma_get_coherent_mask(dev), 
dev);
if (!iova)
goto out_free_pages;
 
@@ -532,7 +568,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
struct iova_domain *iovad = cookie_iovad(domain);
size_t iova_off = iova_offset(iovad, phys);
size_t len = iova_align(iovad, size + iova_off);
-   struct iova *iova = __alloc_iova(domain, len, dma_get_mask(dev), dev);
+   struct iova *iova = __alloc_iova(domain, len, 
__iommu_dma_get_mask(dev), dev);
 
if (!iova)
return DMA_ERROR_CODE;
@@ -690,7 +726,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist 
*sg,
prev = s;
}
 
-   iova = __alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
+   iova = __alloc_iova(domain, iova_len, __iommu_dma_get_mask(dev), dev);
if (!iova)
goto out_restore_sg;
 
@@ -760,7 +796,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
 
msi_page->phys = msi_addr;
if (iovad) {
-   iova = __alloc_iova(domain, size, dma_get_mask(dev), dev);
+   iova = __alloc_iova(domain, size, __iommu_dma_get_mask(dev), 
dev);
if (!iova)
goto out_free_page;
msi_page->iova = iova_dma_addr(iovad, iova);
-- 
1.9.1

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

It seems odd to me to use BUG() for what you appear to be using it for..
not that I know exactly what that it mind you, but when you said when
some other gizmo in your box has a problem you crash the kernel, my head
tilted to the side - surely there's a more controlled response possible
than poking the big red self destruct button ;-)

Oza: 
We have to place red button as our last resort, if we don’t press we pass the 
time or miss the point where we can go back and debug.
So that is something by design.

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Friday, May 08, 2015 10:42 AM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Fri, 2015-05-08 at 04:16 +, Oza (Pawandeep) Oza wrote:
> So Mike, is this reason strong enough for you ?

Nope.  I think you did the right thing in removing your dependency on
jiffies reliability in a dying box.  You don't have to convince me of
anything though, CC timer subsystem maintainer, see what he says.

> I understand your point: solve the BUG, and I do tend to agree with you.
> 
> But by design and implementation, the BUG() is just a beginning of the end 
> for dying kernel.
> And what happens in between this 'the beginning' and 'the end' is not less 
> important. 
> (because say,  on our platform we want to get clean RAMDUMP to analyze what 
> happened, and for that we want to get clean reboot)

I don't see anybody else having any trouble getting crash dumps.  I
spent yet another long day just yesterday, rummaging through one.

> Also,
> If somebody's design is to legally Crash the kernel (e.g. where kernel is 
> actually not faulty).
> Then, I do expect that tick/timekeeping framework do its job as long as it 
> can do, and it should do, because kernel is not faulty.
> But in this case it doesn’t handover jiffies incrementing job sanely.

It seems odd to me to use BUG() for what you appear to be using it for..
not that I know exactly what that it mind you, but when you said when
some other gizmo in your box has a problem you crash the kernel, my head
tilted to the side - surely there's a more controlled response possible
than poking the big red self destruct button ;-)

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

So Mike, is this reason strong enough for you ?

I understand your point: solve the BUG, and I do tend to agree with you.

But by design and implementation, the BUG() is just a beginning of the end for 
dying kernel.
And what happens in between this 'the beginning' and 'the end' is not less 
important. 
(because say,  on our platform we want to get clean RAMDUMP to analyze what 
happened, and for that we want to get clean reboot)

Also,
If somebody's design is to legally Crash the kernel (e.g. where kernel is 
actually not faulty).
Then, I do expect that tick/timekeeping framework do its job as long as it can 
do, and it should do, because kernel is not faulty.
But in this case it doesn’t handover jiffies incrementing job sanely.

In other words, 
"no one can relies on jiffies, or rather the code which is based on jiffies 
will never forward progress in this path"

Regards,
-Oza

-Original Message-----
From: Oza (Pawandeep) Oza 
Sent: Thursday, May 07, 2015 2:17 PM
To: 'Mike Galbraith'
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

Oh ok.
So the reason why I cared was:

There is a code in our base which relies on jiffies, but since jiffies are not 
incrementing, the code waits there and loops forever.
And forward progress is on halt. (on cpu0, since that is the only cpu, which is 
alive)

We have changed the code to use mdelay and things move on.

But that means that in the patch which I mentioned, 
any code which relies on jiffies will stuck forever and will not allow rest of 
the code to get executed and hence no forward progress.
specially if that code is running with preempt_disable();

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 2:00 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 07:05 +0000, Oza (Pawandeep) Oza wrote:
> : )
> 
> Well, I am not sure, if problem was communicated clearly from my side.

I understood.  I just don't understand why you'd care deeply whether
CPU0 halts or eternally waits.  Both render it harmless and useless.

-Mike

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

Oh ok.
So the reason why I cared was:

There is a code in our base which relies on jiffies, but since jiffies are not 
incrementing, the code waits there and loops forever.
And forward progress is on halt. (on cpu0, since that is the only cpu, which is 
alive)

We have changed the code to use mdelay and things move on.

But that means that in the patch which I mentioned, 
any code which relies on jiffies will stuck forever and will not allow rest of 
the code to get executed and hence no forward progress.
specially if that code is running with preempt_disable();

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 2:00 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 07:05 +, Oza (Pawandeep) Oza wrote:
> : )
> 
> Well, I am not sure, if problem was communicated clearly from my side.

I understood.  I just don't understand why you'd care deeply whether
CPU0 halts or eternally waits.  Both render it harmless and useless.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

Mike,

Here is the code which will explain you what I meant to address.
The is just a WARN_ON in case if "any other cpu, other than this cpu, are all 
offline, and at the same time tick_do_timer_cpu is not set correctly)

Note: this patch is just to put forward the problem. (not an actual patch)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9142591..3aa4c8c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -112,6 +112,7 @@ static ktime_t tick_init_jiffy_update(void)
 static void tick_sched_do_timer(ktime_t now)
 {
int cpu = smp_processor_id();
+   int other_cpu, is_cpu_online = 0;

 #ifdef CONFIG_NO_HZ_COMMON
/*
@@ -125,6 +126,11 @@ static void tick_sched_do_timer(ktime_t now)
&& !tick_nohz_full_cpu(cpu))
tick_do_timer_cpu = cpu;
 #endif
+   for (other_cpu = 0; other_cpu < nr_cpu_ids = 0; other_cpu++) {
+   if (other_cpu != cpu)
+   is_cpu_online += cpu_online(other_cpu);
+   }
+   WARN_ON((tick_do_timer_cpu != cpu) && !is_cpu_online)

/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)

Regards,
-Oza

-Original Message-----
From: Oza (Pawandeep) Oza 
Sent: Thursday, May 07, 2015 12:36 PM
To: 'Mike Galbraith'
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

: )

Well, I am not sure, if problem was communicated clearly from my side.
Let me attempt it again.

If  variable tick_do_timer_cpu = 0. Things are fine.
If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment 
jiffies.  (but say if it is set to tick_do_timer_cpu=1, then core1 will 
increment jiffies)

If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 
will be stopped.

Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick.
For that tick_do_timer_cpu should be set to 0.

Which is not happening.

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 12:25 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote:
> Yes.
> But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies.
> do_timer should do the job until kernel takes its last breathe and more 
> precisely CPU0 take its last breathe by halting itself as its last 
> instruction.

Feel free to add a redundant timer subsystem lest we BUG() in there, and
whatever else you need to guarantee a perfect orderly death for your
box.  I prefer live boxen, would make that BUG() go away.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

: )

Well, I am not sure, if problem was communicated clearly from my side.
Let me attempt it again.

If  variable tick_do_timer_cpu = 0. Things are fine.
If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment 
jiffies.  (but say if it is set to tick_do_timer_cpu=1, then core1 will 
increment jiffies)

If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 
will be stopped.

Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick.
For that tick_do_timer_cpu should be set to 0.

Which is not happening.

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 12:25 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote:
> Yes.
> But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies.
> do_timer should do the job until kernel takes its last breathe and more 
> precisely CPU0 take its last breathe by halting itself as its last 
> instruction.

Feel free to add a redundant timer subsystem lest we BUG() in there, and
whatever else you need to guarantee a perfect orderly death for your
box.  I prefer live boxen, would make that BUG() go away.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

: )

Well, I am not sure, if problem was communicated clearly from my side.
Let me attempt it again.

If  variable tick_do_timer_cpu = 0. Things are fine.
If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment 
jiffies.  (but say if it is set to tick_do_timer_cpu=1, then core1 will 
increment jiffies)

If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 
will be stopped.

Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick.
For that tick_do_timer_cpu should be set to 0.

Which is not happening.

Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 12:25 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote:
 Yes.
 But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies.
 do_timer should do the job until kernel takes its last breathe and more 
 precisely CPU0 take its last breathe by halting itself as its last 
 instruction.

Feel free to add a redundant timer subsystem lest we BUG() in there, and
whatever else you need to guarantee a perfect orderly death for your
box.  I prefer live boxen, would make that BUG() go away.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

Mike,

Here is the code which will explain you what I meant to address.
The is just a WARN_ON in case if any other cpu, other than this cpu, are all 
offline, and at the same time tick_do_timer_cpu is not set correctly)

Note: this patch is just to put forward the problem. (not an actual patch)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9142591..3aa4c8c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -112,6 +112,7 @@ static ktime_t tick_init_jiffy_update(void)
 static void tick_sched_do_timer(ktime_t now)
 {
int cpu = smp_processor_id();
+   int other_cpu, is_cpu_online = 0;

 #ifdef CONFIG_NO_HZ_COMMON
/*
@@ -125,6 +126,11 @@ static void tick_sched_do_timer(ktime_t now)
 !tick_nohz_full_cpu(cpu))
tick_do_timer_cpu = cpu;
 #endif
+   for (other_cpu = 0; other_cpu  nr_cpu_ids = 0; other_cpu++) {
+   if (other_cpu != cpu)
+   is_cpu_online += cpu_online(other_cpu);
+   }
+   WARN_ON((tick_do_timer_cpu != cpu)  !is_cpu_online)

/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)

Regards,
-Oza


-Original Message-
From: Oza (Pawandeep) Oza 
Sent: Thursday, May 07, 2015 12:36 PM
To: 'Mike Galbraith'
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

: )

Well, I am not sure, if problem was communicated clearly from my side.
Let me attempt it again.

If  variable tick_do_timer_cpu = 0. Things are fine.
If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment 
jiffies.  (but say if it is set to tick_do_timer_cpu=1, then core1 will 
increment jiffies)

If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 
will be stopped.

Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick.
For that tick_do_timer_cpu should be set to 0.

Which is not happening.

Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 12:25 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote:
 Yes.
 But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies.
 do_timer should do the job until kernel takes its last breathe and more 
 precisely CPU0 take its last breathe by halting itself as its last 
 instruction.

Feel free to add a redundant timer subsystem lest we BUG() in there, and
whatever else you need to guarantee a perfect orderly death for your
box.  I prefer live boxen, would make that BUG() go away.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

Oh ok.
So the reason why I cared was:

There is a code in our base which relies on jiffies, but since jiffies are not 
incrementing, the code waits there and loops forever.
And forward progress is on halt. (on cpu0, since that is the only cpu, which is 
alive)

We have changed the code to use mdelay and things move on.

But that means that in the patch which I mentioned, 
any code which relies on jiffies will stuck forever and will not allow rest of 
the code to get executed and hence no forward progress.
specially if that code is running with preempt_disable();

Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 2:00 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 07:05 +, Oza (Pawandeep) Oza wrote:
 : )
 
 Well, I am not sure, if problem was communicated clearly from my side.

I understood.  I just don't understand why you'd care deeply whether
CPU0 halts or eternally waits.  Both render it harmless and useless.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

It seems odd to me to use BUG() for what you appear to be using it for..
not that I know exactly what that it mind you, but when you said when
some other gizmo in your box has a problem you crash the kernel, my head
tilted to the side - surely there's a more controlled response possible
than poking the big red self destruct button ;-)

Oza: 
We have to place red button as our last resort, if we don’t press we pass the 
time or miss the point where we can go back and debug.
So that is something by design.

Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Friday, May 08, 2015 10:42 AM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Fri, 2015-05-08 at 04:16 +, Oza (Pawandeep) Oza wrote:
 So Mike, is this reason strong enough for you ?

Nope.  I think you did the right thing in removing your dependency on
jiffies reliability in a dying box.  You don't have to convince me of
anything though, CC timer subsystem maintainer, see what he says.

 I understand your point: solve the BUG, and I do tend to agree with you.
 
 But by design and implementation, the BUG() is just a beginning of the end 
 for dying kernel.
 And what happens in between this 'the beginning' and 'the end' is not less 
 important. 
 (because say,  on our platform we want to get clean RAMDUMP to analyze what 
 happened, and for that we want to get clean reboot)

I don't see anybody else having any trouble getting crash dumps.  I
spent yet another long day just yesterday, rummaging through one.

 Also,
 If somebody's design is to legally Crash the kernel (e.g. where kernel is 
 actually not faulty).
 Then, I do expect that tick/timekeeping framework do its job as long as it 
 can do, and it should do, because kernel is not faulty.
 But in this case it doesn’t handover jiffies incrementing job sanely.

It seems odd to me to use BUG() for what you appear to be using it for..
not that I know exactly what that it mind you, but when you said when
some other gizmo in your box has a problem you crash the kernel, my head
tilted to the side - surely there's a more controlled response possible
than poking the big red self destruct button ;-)

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-07 Thread Oza (Pawandeep) Oza

So Mike, is this reason strong enough for you ?

I understand your point: solve the BUG, and I do tend to agree with you.

But by design and implementation, the BUG() is just a beginning of the end for 
dying kernel.
And what happens in between this 'the beginning' and 'the end' is not less 
important. 
(because say,  on our platform we want to get clean RAMDUMP to analyze what 
happened, and for that we want to get clean reboot)

Also,
If somebody's design is to legally Crash the kernel (e.g. where kernel is 
actually not faulty).
Then, I do expect that tick/timekeeping framework do its job as long as it can 
do, and it should do, because kernel is not faulty.
But in this case it doesn’t handover jiffies incrementing job sanely.

In other words, 
no one can relies on jiffies, or rather the code which is based on jiffies 
will never forward progress in this path

Regards,
-Oza


-Original Message-
From: Oza (Pawandeep) Oza 
Sent: Thursday, May 07, 2015 2:17 PM
To: 'Mike Galbraith'
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

Oh ok.
So the reason why I cared was:

There is a code in our base which relies on jiffies, but since jiffies are not 
incrementing, the code waits there and loops forever.
And forward progress is on halt. (on cpu0, since that is the only cpu, which is 
alive)

We have changed the code to use mdelay and things move on.

But that means that in the patch which I mentioned, 
any code which relies on jiffies will stuck forever and will not allow rest of 
the code to get executed and hence no forward progress.
specially if that code is running with preempt_disable();

Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 2:00 PM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 07:05 +, Oza (Pawandeep) Oza wrote:
 : )
 
 Well, I am not sure, if problem was communicated clearly from my side.

I understood.  I just don't understand why you'd care deeply whether
CPU0 halts or eternally waits.  Both render it harmless and useless.

-Mike

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-06 Thread Oza (Pawandeep) Oza

Yes.
But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies.
do_timer should do the job until kernel takes its last breathe and more 
precisely CPU0 take its last breathe by halting itself as its last instruction.

Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 11:25 AM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 05:12 +, Oza (Pawandeep) Oza wrote:

> But after Crash, jiffies do not increment.

Your kernel said "I'M DEAD", that's a good reason to believe it.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-06 Thread Oza (Pawandeep) Oza

Solution Statement: Fix the UTTERLY DEADLY bug.

Oza: 
that BUG() is LEGAL. Kernel is not a problem there.
Somebody else outside of kernel/ARM (some other HW raises the bug), and send 
indication to kernel that I am not alive.
So kernel choose to CRASH ITSELF.

So that is legal crash and wanted Crash.

But after Crash, jiffies do not increment.

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 10:39 AM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 04:37 +, Oza (Pawandeep) Oza wrote:
> Problem Statement: the timkeeping is stopped, do_timer is no more a
> job of cpu0.
> 
> The reason: the variable "tick_do_timer_cpu" is not set to correct CPU
> (cpu0)
> And when BUG() happens, the tick_do_timer_cpu variable stay set to 1,
> 2 or 3 (we have 4 cores)

Solution Statement: Fix the UTTERLY DEADLY bug.

-Mike

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-06 Thread Oza (Pawandeep) Oza

Hi Mike,

Let me explain the problem again.

Problem Statement: the timkeeping is stopped, do_timer is no more a job of cpu0.

The reason: the variable "tick_do_timer_cpu" is not set to correct CPU (cpu0)
And when BUG() happens, the tick_do_timer_cpu variable stay set to 1, 2 or 3 
(we have 4 cores)
And finally any code running on core0 (which relies on jiffies incrementing) 
doesn’t work because there is nobody to increment jiffies.

There is tick_handover_do_timer, and if that is called then things are fine, 
but that is also not getting called because it is tightly coupled with hotplug.
since cpu_down is not getting called, this handover is not happening. and the 
last status of the variable tick_do_timer_cpu is always
pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code 
relies on the increment of jiffies).

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 8:53 AM
To: pawandeep oza
Cc: linux-kernel@vger.kernel.org; malayasen rout; Oza (Pawandeep) Oza
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Wed, 2015-05-06 at 22:57 +0530, pawandeep oza wrote:

> but when say core0 has raised BUG..
...

> what is the right way to approach this problem

Look at the spot BUG() printed?  BUG() means "Way to go slick, the code
you fed me (file:line) is toxic.  Have a nice day, your ex-buddy core0".

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-06 Thread Oza (Pawandeep) Oza

Yes.
But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies.
do_timer should do the job until kernel takes its last breathe and more 
precisely CPU0 take its last breathe by halting itself as its last instruction.

Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 11:25 AM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 05:12 +, Oza (Pawandeep) Oza wrote:

 But after Crash, jiffies do not increment.

Your kernel said I'M DEAD, that's a good reason to believe it.

-Mike

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-06 Thread Oza (Pawandeep) Oza

Solution Statement: Fix the UTTERLY DEADLY bug.

Oza: 
that BUG() is LEGAL. Kernel is not a problem there.
Somebody else outside of kernel/ARM (some other HW raises the bug), and send 
indication to kernel that I am not alive.
So kernel choose to CRASH ITSELF.

So that is legal crash and wanted Crash.

But after Crash, jiffies do not increment.


Regards,
-Oza


-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 10:39 AM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Thu, 2015-05-07 at 04:37 +, Oza (Pawandeep) Oza wrote:
 Problem Statement: the timkeeping is stopped, do_timer is no more a
 job of cpu0.
 
 The reason: the variable tick_do_timer_cpu is not set to correct CPU
 (cpu0)
 And when BUG() happens, the tick_do_timer_cpu variable stay set to 1,
 2 or 3 (we have 4 cores)

Solution Statement: Fix the UTTERLY DEADLY bug.

-Mike

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

2015-05-06 Thread Oza (Pawandeep) Oza

Hi Mike,

Let me explain the problem again.

Problem Statement: the timkeeping is stopped, do_timer is no more a job of cpu0.

The reason: the variable tick_do_timer_cpu is not set to correct CPU (cpu0)
And when BUG() happens, the tick_do_timer_cpu variable stay set to 1, 2 or 3 
(we have 4 cores)
And finally any code running on core0 (which relies on jiffies incrementing) 
doesn’t work because there is nobody to increment jiffies.

There is tick_handover_do_timer, and if that is called then things are fine, 
but that is also not getting called because it is tightly coupled with hotplug.
since cpu_down is not getting called, this handover is not happening. and the 
last status of the variable tick_do_timer_cpu is always
pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code 
relies on the increment of jiffies).

Regards,
-Oza

-Original Message-
From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] 
Sent: Thursday, May 07, 2015 8:53 AM
To: pawandeep oza
Cc: linux-kernel@vger.kernel.org; malayasen rout; Oza (Pawandeep) Oza
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Wed, 2015-05-06 at 22:57 +0530, pawandeep oza wrote:

 but when say core0 has raised BUG..
...

 what is the right way to approach this problem

Look at the spot BUG() printed?  BUG() means Way to go slick, the code
you fed me (file:line) is toxic.  Have a nice day, your ex-buddy core0.

-Mike

< 1 2 3 4 5

401 - 476 of 476 matches

Mail list logo