[PATCH v6 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the following problems of_dma_get_range. 1) return of wrong size as 0. 2) not handling absence of dma-ranges which is valid for PCI master. 3) not handling multipe inbound windows. 4) in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..800731c 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +*/ + if (!ranges) + break; + } + + if (!ranges) { + pr_debug("no dma-ranges found for node(%s)\n", np->full_name); + r
[PATCH v6 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the following problems of_dma_get_range. 1) return of wrong size as 0. 2) not handling absence of dma-ranges which is valid for PCI master. 3) not handling multipe inbound windows. 4) in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..800731c 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +*/ + if (!ranges) + break; + } + + if (!ranges) { + pr_debug("no dma-ranges found for node(%s)\n", np->full_name); + ret = -ENODEV; +
[PATCH v5 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..b43e347 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +*/ + if (!ranges) + break; + } + + if (!ranges) { + pr_debug("no dma-ranges found for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + len /= sizeof(u32); + +
[PATCH v5 2/3] iommu/pci: reserve IOVA for PCI masters
this patch reserves the IOVA for PCI masters. ARM64 based SOCs may have scattered memory banks. such as iproc based SOC has <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */ <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */ <0x0090 0x 0x4 0x>, /* 16G @ 576G */ <0x00a0 0x 0x4 0x>; /* 16G @ 640G */ but incoming PCI transcation addressing capability is limited by host bridge, for example if max incoming window capability is 512 GB, then 0x0090 and 0x00a0 will fall beyond it. to address this problem, iommu has to avoid allocating IOVA which are reserved. which inturn does not allocate IOVA if it falls into hole. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..08764b0 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct iova_domain *iovad) { struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); + struct device_node *np = bridge->dev.parent->of_node; struct resource_entry *window; unsigned long lo, hi; + int ret; + dma_addr_t tmp_dma_addr = 0, dma_addr; + LIST_HEAD(res); resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM && @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* PCI inbound memory reservation. */ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + dma_addr = res_dma->start - window->offset; + if (tmp_dma_addr > dma_addr) { + pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n"); + return; + } + if (tmp_dma_addr != dma_addr) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, dma_addr - 1); + reserve_iova(iovad, lo, hi); + } + tmp_dma_addr = window->res->end - window->offset; + } + /* +* the last dma-range should honour based on the +* 32/64-bit dma addresses. +*/ + if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, + DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1); + reserve_iova(iovad, lo, hi); + } + } } /** -- 1.9.1
[PATCH v5 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..b43e347 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +*/ + if (!ranges) + break; + } + + if (!ranges) { + pr_debug("no dma-ranges found for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + len /= sizeof(u32); + + pna = of_n_addr_cells(node
[PATCH v5 2/3] iommu/pci: reserve IOVA for PCI masters
this patch reserves the IOVA for PCI masters. ARM64 based SOCs may have scattered memory banks. such as iproc based SOC has <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */ <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */ <0x0090 0x 0x4 0x>, /* 16G @ 576G */ <0x00a0 0x 0x4 0x>; /* 16G @ 640G */ but incoming PCI transcation addressing capability is limited by host bridge, for example if max incoming window capability is 512 GB, then 0x0090 and 0x00a0 will fall beyond it. to address this problem, iommu has to avoid allocating IOVA which are reserved. which inturn does not allocate IOVA if it falls into hole. Signed-off-by: Oza Pawandeep diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..08764b0 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct iova_domain *iovad) { struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); + struct device_node *np = bridge->dev.parent->of_node; struct resource_entry *window; unsigned long lo, hi; + int ret; + dma_addr_t tmp_dma_addr = 0, dma_addr; + LIST_HEAD(res); resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM && @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* PCI inbound memory reservation. */ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + dma_addr = res_dma->start - window->offset; + if (tmp_dma_addr > dma_addr) { + pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n"); + return; + } + if (tmp_dma_addr != dma_addr) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, dma_addr - 1); + reserve_iova(iovad, lo, hi); + } + tmp_dma_addr = window->res->end - window->offset; + } + /* +* the last dma-range should honour based on the +* 32/64-bit dma addresses. +*/ + if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, + DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1); + reserve_iova(iovad, lo, hi); + } + } } /** -- 1.9.1
[PATCH v5 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +stati
[PATCH v5 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inli
[PATCH v5 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v4: - something wrong with my mail client: not all teh patch went out, attempting again: Changes since v3: - minor change, redudant checkes removed Changes since v2: - removed internal review Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve IOVA for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 drivers/of/address.c | 216 -- drivers/of/of_pci.c | 77 + include/linux/of_pci.h| 7 ++ 4 files changed, 272 insertions(+), 63 deletions(-) -- 1.9.1
[PATCH v5 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v4: - something wrong with my mail client: not all teh patch went out, attempting again: Changes since v3: - minor change, redudant checkes removed Changes since v2: - removed internal review Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve IOVA for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 drivers/of/address.c | 216 -- drivers/of/of_pci.c | 77 + include/linux/of_pci.h| 7 ++ 4 files changed, 272 insertions(+), 63 deletions(-) -- 1.9.1
[PATCH v4 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v3: - redudant check removed - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Changes since v2: - internal review names removed. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v4 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v3: - redudant check removed - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Changes since v2: - internal review names removed. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v4 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..b43e347 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +*/ + if (!ranges) + break; + } + + if (!ranges) { + pr_debug("no dma-ranges found for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + len /= sizeof(u32); + +
[PATCH v4 2/3] iommu/pci: reserve IOVA for PCI masters
this patch reserves the IOVA for PCI masters. ARM64 based SOCs may have scattered memory banks. such as iproc based SOC has <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */ <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */ <0x0090 0x 0x4 0x>, /* 16G @ 576G */ <0x00a0 0x 0x4 0x>; /* 16G @ 640G */ but incoming PCI transcation addressing capability is limited by host bridge, for example if max incoming window capability is 512 GB, then 0x0090 and 0x00a0 will fall beyond it. to address this problem, iommu has to avoid allocating IOVA which are reserved. which inturn does not allocate IOVA if it falls into hole. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..08764b0 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct iova_domain *iovad) { struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); + struct device_node *np = bridge->dev.parent->of_node; struct resource_entry *window; unsigned long lo, hi; + int ret; + dma_addr_t tmp_dma_addr = 0, dma_addr; + LIST_HEAD(res); resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM && @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* PCI inbound memory reservation. */ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + dma_addr = res_dma->start - window->offset; + if (tmp_dma_addr > dma_addr) { + pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n"); + return; + } + if (tmp_dma_addr != dma_addr) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, dma_addr - 1); + reserve_iova(iovad, lo, hi); + } + tmp_dma_addr = window->res->end - window->offset; + } + /* +* the last dma-range should honour based on the +* 32/64-bit dma addresses. +*/ + if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, + DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1); + reserve_iova(iovad, lo, hi); + } + } } /** -- 1.9.1
[PATCH v4 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..b43e347 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,144 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +*/ + if (!ranges) + break; + } + + if (!ranges) { + pr_debug("no dma-ranges found for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + len /= sizeof(u32); + + pna = of_n_addr_cells(node
[PATCH v4 2/3] iommu/pci: reserve IOVA for PCI masters
this patch reserves the IOVA for PCI masters. ARM64 based SOCs may have scattered memory banks. such as iproc based SOC has <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */ <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */ <0x0090 0x 0x4 0x>, /* 16G @ 576G */ <0x00a0 0x 0x4 0x>; /* 16G @ 640G */ but incoming PCI transcation addressing capability is limited by host bridge, for example if max incoming window capability is 512 GB, then 0x0090 and 0x00a0 will fall beyond it. to address this problem, iommu has to avoid allocating IOVA which are reserved. which inturn does not allocate IOVA if it falls into hole. Signed-off-by: Oza Pawandeep diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..08764b0 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct iova_domain *iovad) { struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); + struct device_node *np = bridge->dev.parent->of_node; struct resource_entry *window; unsigned long lo, hi; + int ret; + dma_addr_t tmp_dma_addr = 0, dma_addr; + LIST_HEAD(res); resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM && @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* PCI inbound memory reservation. */ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + dma_addr = res_dma->start - window->offset; + if (tmp_dma_addr > dma_addr) { + pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n"); + return; + } + if (tmp_dma_addr != dma_addr) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, dma_addr - 1); + reserve_iova(iovad, lo, hi); + } + tmp_dma_addr = window->res->end - window->offset; + } + /* +* the last dma-range should honour based on the +* 32/64-bit dma addresses. +*/ + if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, + DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1); + reserve_iova(iovad, lo, hi); + } + } } /** -- 1.9.1
[PATCH v4 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +stati
[PATCH v4 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inli
[PATCH v3 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v2: - Remove internal detailes such as reviewed by Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v3 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v2: - Remove internal detailes such as reviewed by Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v3 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..cc0fc28 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + } + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +
[PATCH v3 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..cc0fc28 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + } + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +
[PATCH v3 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +stati
[PATCH v3 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inli
[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v2: - Remove internal detailes such as reviewed by Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v2: - Remove internal detailes such as reviewed by Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +stati
[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..cc0fc28 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + } + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +
[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inli
[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..cc0fc28 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + } + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +
[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..cc0fc28 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + } + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +
[PATCH v2 0/3] OF/PCI address PCI inbound memory limitations
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means, on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. Current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices but no implementation exists for pci devices. For e.g. iproc based SOCs and other SOCs (such as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; This patchset reserves the IOVA ranges for PCI masters based on the PCI world dma-ranges. fix of_dma_get_range to cater to PCI dma-ranges. fix of_dma_get_range which currently returns size 0 for PCI devices. IOVA allocation patch: [PATCH 2/3] iommu/pci: reserve iova for PCI masters Fix of_dma_get_range bug and address PCI master. [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific Base patch for both of the above patches: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Changes since v1: - Remove internal GERRIT details from patch descriptions - address Rob's comments. - Add a get_dma_ranges() function to of_bus struct.. - Convert existing contents of this function to of_bus_default_dma_get_ranges and adding that to the default of_bus struct. - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges. - no revison for [PATCH 2/3] iommu/pci: reserve iova for PCI masters; since under discussion with Robin Oza Pawandeep (3): of/pci/dma: fix DMA configuration for PCI masters iommu/pci: reserve iova for PCI masters PCI/of fix of_dma_get_range; get PCI specific dma-ranges drivers/iommu/dma-iommu.c | 35 + drivers/of/address.c | 52 drivers/of/of_pci.c | 77 +++ include/linux/of_pci.h| 7 + 4 files changed, 171 insertions(+) -- 1.9.1
[PATCH v2 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which IOVA allocation space will honour PCI host bridge limitations. the implementation hooks bus specific callbacks for getting dma-ranges. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..cc0fc28 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,6 +47,8 @@ struct of_bus { int na, int ns, int pna); int (*translate)(__be32 *addr, u64 offset, int na); unsigned int(*get_flags)(const __be32 *addr); + int (*get_dma_ranges)(struct device_node *np, + u64 *dma_addr, u64 *paddr, u64 *size); }; /* @@ -171,6 +174,146 @@ static int of_bus_pci_translate(__be32 *addr, u64 offset, int na) { return of_bus_default_translate(addr + 1, offset, na - 1); } + +static int of_bus_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, +u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int ret = 0; + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* +* return the largest possible size, +* since PCI master allows everything. +*/ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1; + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + } + + of_node_put(node); + + return ret; +} + +static int get_dma_ranges(struct device_node *np, u64 *dma_addr, + u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + const __be32 *ranges = NULL; + int len, naddr, nsize, pna; + int ret = 0; + u64 dmaaddr; + + if (!node) + return -EINVAL; + + while (1) { + naddr = of_n_addr_cells(node); + nsize = of_n_size_cells(node); + node = of_get_next_parent(node); + if (!node) + break; + + ranges = of_get_property(node, "dma-ranges", ); + + /* Ignore empty ranges, they imply no translation required */ + if (ranges && len > 0) + break; + + /* +* At least empty ranges has to be defined for parent node if +* DMA is supported +
[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep <oza@broadcom.com> Reviewed-by: Ray Jui <ray@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_
[PATCH v2 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and OF framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep Reviewed-by: Ray Jui Reviewed-by: Scott Branden diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static
[PATCH 2/3] iommu/pci: reserve iova for PCI masters
this patch reserves the iova for PCI masters. ARM64 based SOCs may have scattered memory banks. such as iproc based SOC has <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */ <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */ <0x0090 0x 0x4 0x>, /* 16G @ 576G */ <0x00a0 0x 0x4 0x>; /* 16G @ 640G */ but incoming PCI transcation addressing capability is limited by host bridge, for example if max incoming window capability is 512 GB, then 0x0090 and 0x00a0 will fall beyond it. to address this problem, iommu has to avoid allocating iovas which are reserved. which inturn does not allocate iova if it falls into hole. Bug: SOC-5216 Change-Id: Icbfc99a045d730be143fef427098c937b9d46353 Signed-off-by: Oza Pawandeep <oza@broadcom.com> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760 Reviewed-by: vpx_checkpatch status <vpx_checkpa...@broadcom.com> Reviewed-by: CCXSW <ccxswbu...@broadcom.com> Tested-by: vpx_autobuild status <vpx_autobu...@broadcom.com> Tested-by: vpx_smoketest status <vpx_smoket...@broadcom.com> Tested-by: CCXSW <ccxswbu...@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..08764b0 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct iova_domain *iovad) { struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); + struct device_node *np = bridge->dev.parent->of_node; struct resource_entry *window; unsigned long lo, hi; + int ret; + dma_addr_t tmp_dma_addr = 0, dma_addr; + LIST_HEAD(res); resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM && @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* PCI inbound memory reservation. */ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + dma_addr = res_dma->start - window->offset; + if (tmp_dma_addr > dma_addr) { + pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n"); + return; + } + if (tmp_dma_addr != dma_addr) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, dma_addr - 1); + reserve_iova(iovad, lo, hi); + } + tmp_dma_addr = window->res->end - window->offset; + } + /* +* the last dma-range should honour based on the +* 32/64-bit dma addresses. +*/ + if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, + DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1); + reserve_iova(iovad, lo, hi); + } + } } /** -- 1.9.1
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which iova allocation space will honour PCI host bridge limitations. Bug: SOC-5216 Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6 Signed-off-by: Oza Pawandeep <oza@broadcom.com> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762 Reviewed-by: vpx_checkpatch status <vpx_checkpa...@broadcom.com> Reviewed-by: CCXSW <ccxswbu...@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> Tested-by: vpx_autobuild status <vpx_autobu...@broadcom.com> Tested-by: vpx_smoketest status <vpx_smoket...@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..f7734fc 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz int ret = 0; u64 dmaaddr; +#ifdef CONFIG_PCI + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* ignore the empty ranges. */ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8); + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_err("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + goto out; + } +#endif + if (!node) return -EINVAL; -- 1.9.1
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Bug: SOC-5216 Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e Signed-off-by: Oza Pawandeep <oza@broadcom.com> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428 Reviewed-by: vpx_checkpatch status <vpx_checkpa...@broadcom.com> Reviewed-by: CCXSW <ccxswbu...@broadcom.com> Reviewed-by: Ray Jui <ray@broadcom.com> Tested-by: vpx_autobuild status <vpx_autobu...@broadcom.com> Tested-by: vpx_smoketest status <vpx_smoket...@broadcom.com> Tested-by: CCXSW <ccxswbu...@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev,
[PATCH 2/3] iommu/pci: reserve iova for PCI masters
this patch reserves the iova for PCI masters. ARM64 based SOCs may have scattered memory banks. such as iproc based SOC has <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */ <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */ <0x0090 0x 0x4 0x>, /* 16G @ 576G */ <0x00a0 0x 0x4 0x>; /* 16G @ 640G */ but incoming PCI transcation addressing capability is limited by host bridge, for example if max incoming window capability is 512 GB, then 0x0090 and 0x00a0 will fall beyond it. to address this problem, iommu has to avoid allocating iovas which are reserved. which inturn does not allocate iova if it falls into hole. Bug: SOC-5216 Change-Id: Icbfc99a045d730be143fef427098c937b9d46353 Signed-off-by: Oza Pawandeep Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760 Reviewed-by: vpx_checkpatch status Reviewed-by: CCXSW Tested-by: vpx_autobuild status Tested-by: vpx_smoketest status Tested-by: CCXSW Reviewed-by: Scott Branden diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..08764b0 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct iova_domain *iovad) { struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); + struct device_node *np = bridge->dev.parent->of_node; struct resource_entry *window; unsigned long lo, hi; + int ret; + dma_addr_t tmp_dma_addr = 0, dma_addr; + LIST_HEAD(res); resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM && @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* PCI inbound memory reservation. */ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + dma_addr = res_dma->start - window->offset; + if (tmp_dma_addr > dma_addr) { + pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n"); + return; + } + if (tmp_dma_addr != dma_addr) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, dma_addr - 1); + reserve_iova(iovad, lo, hi); + } + tmp_dma_addr = window->res->end - window->offset; + } + /* +* the last dma-range should honour based on the +* 32/64-bit dma addresses. +*/ + if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) { + lo = iova_pfn(iovad, tmp_dma_addr); + hi = iova_pfn(iovad, + DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1); + reserve_iova(iovad, lo, hi); + } + } } /** -- 1.9.1
[PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
current device framework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch fixes this patch fixes the bug in of_dma_get_range, which with as is, parses the PCI memory ranges and return wrong size as 0. in order to get largest possible dma_mask. this patch also retuns the largest possible size based on dma-ranges, for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. based on which iova allocation space will honour PCI host bridge limitations. Bug: SOC-5216 Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6 Signed-off-by: Oza Pawandeep Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762 Reviewed-by: vpx_checkpatch status Reviewed-by: CCXSW Reviewed-by: Scott Branden Tested-by: vpx_autobuild status Tested-by: vpx_smoketest status diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..f7734fc 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz int ret = 0; u64 dmaaddr; +#ifdef CONFIG_PCI + struct resource_entry *window; + LIST_HEAD(res); + + if (!node) + return -EINVAL; + + if (of_bus_pci_match(np)) { + *size = 0; + /* +* PCI dma-ranges is not mandatory property. +* many devices do no need to have it, since +* host bridge does not require inbound memory +* configuration or rather have design limitations. +* so we look for dma-ranges, if missing we +* just return the caller full size, and also +* no dma-ranges suggests that, host bridge allows +* whatever comes in, so we set dma_addr to 0. +*/ + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + + /* ignore the empty ranges. */ + if (*size == 0) { + pr_debug("empty/zero size dma-ranges found for node(%s)\n", + np->full_name); + *size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8); + *dma_addr = *paddr = 0; + ret = 0; + } + + pr_err("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); + goto out; + } +#endif + if (!node) return -EINVAL; -- 1.9.1
[PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
current device framework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. (child-bus-address, parent-bus-address, length). of_dma_configure is specifically written to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; this patch serves following: 1) exposes interface to the pci host driver for their inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. because PCI RC drivers do not call APIs such as dma_set_coherent_mask() and hence rather it shows its addressing capabilities based on dma-ranges. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of for e.g. of_dma_get_ranges does not need to change. 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Bug: SOC-5216 Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e Signed-off-by: Oza Pawandeep Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428 Reviewed-by: vpx_checkpatch status Reviewed-by: CCXSW Reviewed-by: Ray Jui Tested-by: vpx_autobuild status Tested-by: vpx_smoketest status Tested-by: CCXSW Reviewed-by: Scott Branden diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..ed6e69a 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -EINVAL; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, + res->start - range.pci_addr); + } + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..617b90d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int
[RFC PATH] of/pci/dma: fix DMA configruation for PCI masters
current device frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch served following purposes 1) exposes intrface to the pci host driver for thir inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of of_dma_get_ranges does not need to change. and 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..ec21191 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -829,10 +830,30 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz int len, naddr, nsize, pna; int ret = 0; u64 dmaaddr; + struct resource_entry *window; + LIST_HEAD(res); if (!node) return -EINVAL; + if (strcmp(np->name, "pci")) { + *size = 0; + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + goto out; + } + while (1) { naddr = of_n_addr_cells(node); nsize = of_n_size_cells(node); diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..2aa9401 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); +
[RFC PATH] of/pci/dma: fix DMA configruation for PCI masters
current device frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and other SOCs(suc as rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch served following purposes 1) exposes intrface to the pci host driver for thir inbound memory ranges 2) provide an interface to callers such as of_dma_get_ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. 3) this patch handles multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. the new function returns the resources in a standard and unform way 4) this way the callers of of_dma_get_ranges does not need to change. and 5) leaves scope of adding PCI flag handling for inbound memory by the new function. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..ec21191 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -829,10 +830,30 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz int len, naddr, nsize, pna; int ret = 0; u64 dmaaddr; + struct resource_entry *window; + LIST_HEAD(res); if (!node) return -EINVAL; + if (strcmp(np->name, "pci")) { + *size = 0; + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + + if (*size < resource_size(res_dma)) { + *dma_addr = res_dma->start - window->offset; + *paddr = res_dma->start; + *size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + goto out; + } + while (1) { naddr = of_n_addr_cells(node); nsize = of_n_size_cells(node); diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..2aa9401 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a + * PCI host bridge device node and setup the resource mapping based + * on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", + np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if
[RFC PATCH 2/2] of/pci: call pci specific dma-ranges instead of memory-mapped.
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch calls pci specific of_pci_dma_get_ranges, instead of calling memory-mapped one, which returns wrong size and also not meant for PCI world. with this, it gets accurate resources back, and largest possible inbound window size. with that largest possible dma_mask can be generated. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/device.c b/drivers/of/device.c index b1e6beb..d6a8dde 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "of_private.h" @@ -89,6 +90,8 @@ void of_dma_configure(struct device *dev, struct device_node *np) bool coherent; unsigned long offset; const struct iommu_ops *iommu; + struct resource_entry *window; + LIST_HEAD(res); /* * Set default coherent_dma_mask to 32 bit. Drivers are expected to @@ -104,7 +107,24 @@ void of_dma_configure(struct device *dev, struct device_node *np) if (!dev->dma_mask) dev->dma_mask = >coherent_dma_mask; - ret = of_dma_get_range(np, _addr, , ); + if (dev_is_pci(dev)) { + size = 0; + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + if (size < resource_size(res_dma)) { + dma_addr = res_dma->start - window->offset; + paddr = res_dma->start; + size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + } + else + ret = of_dma_get_range(np, _addr, , ); + if (ret < 0) { dma_addr = offset = 0; size = dev->coherent_dma_mask + 1; -- 1.9.1
[RFC PATCH 2/2] of/pci: call pci specific dma-ranges instead of memory-mapped.
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch calls pci specific of_pci_dma_get_ranges, instead of calling memory-mapped one, which returns wrong size and also not meant for PCI world. with this, it gets accurate resources back, and largest possible inbound window size. with that largest possible dma_mask can be generated. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/device.c b/drivers/of/device.c index b1e6beb..d6a8dde 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "of_private.h" @@ -89,6 +90,8 @@ void of_dma_configure(struct device *dev, struct device_node *np) bool coherent; unsigned long offset; const struct iommu_ops *iommu; + struct resource_entry *window; + LIST_HEAD(res); /* * Set default coherent_dma_mask to 32 bit. Drivers are expected to @@ -104,7 +107,24 @@ void of_dma_configure(struct device *dev, struct device_node *np) if (!dev->dma_mask) dev->dma_mask = >coherent_dma_mask; - ret = of_dma_get_range(np, _addr, , ); + if (dev_is_pci(dev)) { + size = 0; + ret = of_pci_get_dma_ranges(np, ); + if (!ret) { + resource_list_for_each_entry(window, ) { + struct resource *res_dma = window->res; + if (size < resource_size(res_dma)) { + dma_addr = res_dma->start - window->offset; + paddr = res_dma->start; + size = resource_size(res_dma); + } + } + } + pci_free_resource_list(); + } + else + ret = of_dma_get_range(np, _addr, , ); + if (ret < 0) { dma_addr = offset = 0; size = dev->coherent_dma_mask + 1; -- 1.9.1
[RFC PATCH 1/2] of/pci: implement inbound dma-ranges for PCI
current device frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and other SOCs(rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this exposes intrface not only to the pci host driver, but also it aims to provide an interface to callers such as of_dma_configure. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. also this patch intends to handle multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5299438 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,80 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a PCI host bridge device + * node and setup the resource mapping based on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, res->start - range.pci_addr); +} + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..8509e3d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inline int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + return -EINVAL; +} #endif #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI) -- 1.9.1
[RFC PATCH 1/2] of/pci: implement inbound dma-ranges for PCI
current device frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and other SOCs(rcar) have PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this exposes intrface not only to the pci host driver, but also it aims to provide an interface to callers such as of_dma_configure. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. also this patch intends to handle multiple inbound windows and dma-ranges. it is left to the caller, how it wants to use them. Signed-off-by: Oza Pawandeep diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5299438 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,80 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +/** + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT + * @np: device node of the host bridge having the dma-ranges property + * @resources: list where the range of resources will be added after DT parsing + * + * It is the caller's job to free the @resources list. + * + * This function will parse the "dma-ranges" property of a PCI host bridge device + * node and setup the resource mapping based on its content. + * + * It returns zero if the range parsing has been successful or a standard error + * value if it failed. + */ + +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + struct device_node *node = of_node_get(np); + int rlen; + int ret = 0; + const int na = 3, ns = 2; + struct resource *res; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + + for_each_of_pci_range(, ) { + /* +* If we failed translation or got a zero-sized region +* then skip this range +*/ + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) + continue; + + res = kzalloc(sizeof(struct resource), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto parse_failed; + } + + ret = of_pci_range_to_resource(, np, res); + if (ret) { + kfree(res); + continue; + } + + pci_add_resource_offset(resources, res, res->start - range.pci_addr); +} + + return ret; + +parse_failed: + pci_free_resource_list(resources); +out: + of_node_put(node); + return ret; +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..8509e3d 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inline int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources) +{ + return -EINVAL; +} #endif #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI) -- 1.9.1
[RFC PATCH 2/3] iommu/dma: account pci host bridge dma_mask for IOVA allocation
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch is inspired by http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html http://www.spinics.net/lists/arm-kernel/msg566947.html but above inspiraiton solves the half of the problem. the rest of the problem is descrbied below, what we face on iproc based SOCs. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup Patel <anup.pa...@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE config NEED_SG_DMA_LENGTH def_bool y +config ARCH_HAS_DMA_SET_COHERENT_MASK + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,6 +20,7 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ #endif + u64 parent_dma_mask; bool dma_coherent; }; diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 81cdb2e..5845ecd 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void *virt, phys_addr_t phys) __dma_flush_area(virt, PAGE_SIZE); } + static void *__iommu_alloc_attrs(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, unsigned long attrs) @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev, iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs); } +static int __iommu_set_dma_mask(struct device *dev, u64 mask) +{ + /* device is not DMA capable */ + if (!dev->dma_mask) + return -EIO; + + if (mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + *dev->dma_mask = mask; + + return 0; +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = __iommu_alloc_attrs, .free = __iommu_free_attrs, @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev, .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .mapping_error = iommu_dma_mapping_error, + .set_dma_mask = __iommu_set_dma_mask, }; +int dma_set_coherent_mask(struct device *dev, u64 mask) +{ + if (get_dma_ops(dev) == _dma_ops && + mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + dev->coherent_dma_mask = mask; + return 0; +} +EXPORT_SYMBOL(dma_set_coherent_mask); + + /* * TODO: Right now __iommu_setup_dma_ops() gets called too early to do * everything it needs to - the device is only partially created and the @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, if (!dev->dma_ops) dev->dma_ops = _dma_ops; + dev->archdata.parent_dma_mask = size - 1; + dev->archdata.dma_coherent = coherent; __iommu_setup_dma_ops(dev, dma_base, size, iommu); } diff --git a/drivers/of/device.c b/drivers/of/device.c index d362a98..471dcdf 100644 --- a/drivers/of/devi
[RFC PATCH 3/3] of: fix node traversing in of_dma_get_range
it jumps to the parent node without examining the child node. also with that, it throws "no dma-ranges found for node" for pci dma-ranges. this patch fixes device node traversing for dma-ranges. Reviewed-by: Anup Patel <anup.pa...@broadcom.com> Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..3293d55 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -836,9 +836,6 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz while (1) { naddr = of_n_addr_cells(node); nsize = of_n_size_cells(node); - node = of_get_next_parent(node); - if (!node) - break; ranges = of_get_property(node, "dma-ranges", ); @@ -852,6 +849,10 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz */ if (!ranges) break; + + node = of_get_next_parent(node); + if (!node) + break; } if (!ranges) { -- 1.9.1
[RFC PATCH 2/3] iommu/dma: account pci host bridge dma_mask for IOVA allocation
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch is inspired by http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html http://www.spinics.net/lists/arm-kernel/msg566947.html but above inspiraiton solves the half of the problem. the rest of the problem is descrbied below, what we face on iproc based SOCs. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup Patel Reviewed-by: Scott Branden Signed-off-by: Oza Pawandeep diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE config NEED_SG_DMA_LENGTH def_bool y +config ARCH_HAS_DMA_SET_COHERENT_MASK + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,6 +20,7 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ #endif + u64 parent_dma_mask; bool dma_coherent; }; diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 81cdb2e..5845ecd 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void *virt, phys_addr_t phys) __dma_flush_area(virt, PAGE_SIZE); } + static void *__iommu_alloc_attrs(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, unsigned long attrs) @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev, iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs); } +static int __iommu_set_dma_mask(struct device *dev, u64 mask) +{ + /* device is not DMA capable */ + if (!dev->dma_mask) + return -EIO; + + if (mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + *dev->dma_mask = mask; + + return 0; +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = __iommu_alloc_attrs, .free = __iommu_free_attrs, @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev, .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .mapping_error = iommu_dma_mapping_error, + .set_dma_mask = __iommu_set_dma_mask, }; +int dma_set_coherent_mask(struct device *dev, u64 mask) +{ + if (get_dma_ops(dev) == _dma_ops && + mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + dev->coherent_dma_mask = mask; + return 0; +} +EXPORT_SYMBOL(dma_set_coherent_mask); + + /* * TODO: Right now __iommu_setup_dma_ops() gets called too early to do * everything it needs to - the device is only partially created and the @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, if (!dev->dma_ops) dev->dma_ops = _dma_ops; + dev->archdata.parent_dma_mask = size - 1; + dev->archdata.dma_coherent = coherent; __iommu_setup_dma_ops(dev, dma_base, size, iommu); } diff --git a/drivers/of/device.c b/drivers/of/device.c index d362a98..471dcdf 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -139,10 +139,8 @@ void of_dma_configure(struct device *d
[RFC PATCH 3/3] of: fix node traversing in of_dma_get_range
it jumps to the parent node without examining the child node. also with that, it throws "no dma-ranges found for node" for pci dma-ranges. this patch fixes device node traversing for dma-ranges. Reviewed-by: Anup Patel Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..3293d55 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -836,9 +836,6 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz while (1) { naddr = of_n_addr_cells(node); nsize = of_n_size_cells(node); - node = of_get_next_parent(node); - if (!node) - break; ranges = of_get_property(node, "dma-ranges", ); @@ -852,6 +849,10 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz */ if (!ranges) break; + + node = of_get_next_parent(node); + if (!node) + break; } if (!ranges) { -- 1.9.1
[RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup Patel <anup.pa...@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> Signed-off-by: Oza Pawandeep <oza@broadcom.com> Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/drivers/of/device.c b/drivers/of/device.c index b1e6beb..d362a98 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "of_private.h" @@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct device_node *np) if (!dev->dma_mask) dev->dma_mask = >coherent_dma_mask; - ret = of_dma_get_range(np, _addr, , ); + if (dev_is_pci(dev)) + ret = of_pci_get_dma_ranges(np, _addr, , ); + else + ret = of_dma_get_range(np, _addr, , ); + if (ret < 0) { dma_addr = offset = 0; size = dev->coherent_dma_mask + 1; diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..c7f8626 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,52 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int rlen, ret = 0; + const int na = 3, ns = 2; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + *size = 0; + + for_each_of_pci_range(, ) { + if (*size < range.size) { + *dma_addr = range.pci_addr; + *size = range.size; + *paddr = range.cpu_addr; + } + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); +*dma_addr = range.pci_addr; +*size = range.size; + +out: + of_node_put(node); + return ret; + +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..907ace0 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct device_no
[RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup Patel Reviewed-by: Scott Branden Signed-off-by: Oza Pawandeep Signed-off-by: Oza Pawandeep diff --git a/drivers/of/device.c b/drivers/of/device.c index b1e6beb..d362a98 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "of_private.h" @@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct device_node *np) if (!dev->dma_mask) dev->dma_mask = >coherent_dma_mask; - ret = of_dma_get_range(np, _addr, , ); + if (dev_is_pci(dev)) + ret = of_pci_get_dma_ranges(np, _addr, , ); + else + ret = of_dma_get_range(np, _addr, , ); + if (ret < 0) { dma_addr = offset = 0; size = dev->coherent_dma_mask + 1; diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..c7f8626 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,52 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int rlen, ret = 0; + const int na = 3, ns = 2; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + *size = 0; + + for_each_of_pci_range(, ) { + if (*size < range.size) { + *dma_addr = range.pci_addr; + *size = range.size; + *paddr = range.cpu_addr; + } + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); +*dma_addr = range.pci_addr; +*size = range.size; + +out: + of_node_put(node); + return ret; + +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..907ace0 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inline int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr,
[RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA allocation
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch is inspired by http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html http://www.spinics.net/lists/arm-kernel/msg566947.html but above inspiraiton solves the half of the problem. the rest of the problem is descrbied below, what we face on iproc based SOCs. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges since there is an absense of the same, the dma_mask used to remain 32bit because of 0 size return (parsed by of_dma_configure()) this patch also implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. conclusion: there are following problems 1) linux pci and iommu framework integration has glitches with respect to dma-ranges 2) pci linux framework look very uncertain about dma-ranges, rather binding is not defined the way it is defined for memory mapped devices. rcar and iproc based SOCs use their custom one dma-ranges (rather can be standard) 3) even if in case of default parser of_dma_get_ranges,: it throws and erro" "no dma-ranges found for node" because of the bug which exists. following lines should be moved to the end of while(1) 839 node = of_get_next_parent(node); 840 if (!node) 841 break; Reviewed-by: Anup Patel <anup.pa...@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> Signed-off-by: Oza Pawandeep <oza@broadcom.com> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE config NEED_SG_DMA_LENGTH def_bool y +config ARCH_HAS_DMA_SET_COHERENT_MASK + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,6 +20,7 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ #endif + u64 parent_dma_mask; bool dma_coherent; }; diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 81cdb2e..5845ecd 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void *virt, phys_addr_t phys) __dma_flush_area(virt, PAGE_SIZE); } + static void *__iommu_alloc_attrs(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, unsigned long attrs) @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev, iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs); } +static int __iommu_set_dma_mask(struct device *dev, u64 mask) +{ + /* device is not DMA capable */ + if (!dev->dma_mask) + return -EIO; + + if (mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + *dev->dma_mask = mask; + + return 0; +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = __iommu_alloc_attrs, .free = __iommu_free_attrs, @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev, .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .mapping_error = iommu_dma_mapping_error, + .set_dma_mask = __iommu_set_dma_mask, }; +int dma_set_coherent_mask(struct device *dev, u64 mask)
[RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA allocation
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch is inspired by http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html http://www.spinics.net/lists/arm-kernel/msg566947.html but above inspiraiton solves the half of the problem. the rest of the problem is descrbied below, what we face on iproc based SOCs. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges since there is an absense of the same, the dma_mask used to remain 32bit because of 0 size return (parsed by of_dma_configure()) this patch also implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. conclusion: there are following problems 1) linux pci and iommu framework integration has glitches with respect to dma-ranges 2) pci linux framework look very uncertain about dma-ranges, rather binding is not defined the way it is defined for memory mapped devices. rcar and iproc based SOCs use their custom one dma-ranges (rather can be standard) 3) even if in case of default parser of_dma_get_ranges,: it throws and erro" "no dma-ranges found for node" because of the bug which exists. following lines should be moved to the end of while(1) 839 node = of_get_next_parent(node); 840 if (!node) 841 break; Reviewed-by: Anup Patel Reviewed-by: Scott Branden Signed-off-by: Oza Pawandeep diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE config NEED_SG_DMA_LENGTH def_bool y +config ARCH_HAS_DMA_SET_COHERENT_MASK + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,6 +20,7 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ #endif + u64 parent_dma_mask; bool dma_coherent; }; diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 81cdb2e..5845ecd 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void *virt, phys_addr_t phys) __dma_flush_area(virt, PAGE_SIZE); } + static void *__iommu_alloc_attrs(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, unsigned long attrs) @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev, iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs); } +static int __iommu_set_dma_mask(struct device *dev, u64 mask) +{ + /* device is not DMA capable */ + if (!dev->dma_mask) + return -EIO; + + if (mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + *dev->dma_mask = mask; + + return 0; +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = __iommu_alloc_attrs, .free = __iommu_free_attrs, @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev, .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .mapping_error = iommu_dma_mapping_error, + .set_dma_mask = __iommu_set_dma_mask, }; +int dma_set_coherent_mask(struct device *dev, u64 mask) +{ + if (get_dma_ops(dev) == _dma_ops && + mask > d
[RFC PATCH] iommu/dma: check pci host bridge dma_mask for IOVA allocation
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. This patch tries to solve above described IOVA allocation problem by: 1. Adding iommu_get_dma_mask() to get dma_mask of any device 2. For PCI device, iommu_get_dma_mask() compare dma_mask of PCI device and corresponding PCI Host dma_mask (if set). 3. Use iommu_get_dma_mask() in IOMMU DMA ops implementation instead of dma_get_mask() Signed-off-by: Oza Pawandeep <oza@broadcom.com> Reviewed-by: Anup Patel <anup.pa...@broadcom.com> Reviewed-by: Scott Branden <scott.bran...@broadcom.com> --- drivers/iommu/dma-iommu.c | 44 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..e93e536 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -108,6 +108,42 @@ int iommu_get_dma_cookie(struct iommu_domain *domain) } EXPORT_SYMBOL(iommu_get_dma_cookie); +static u64 __iommu_dma_mask(struct device *dev, bool is_coherent) +{ +#ifdef CONFIG_PCI + u64 pci_hb_dma_mask; + + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus); + + if ((!is_coherent) && !(br->dev.dma_mask)) + goto default_dev_dma_mask; + + /* pci host bridge dma-mask. */ + pci_hb_dma_mask = (!is_coherent) ? *br->dev.dma_mask : + br->dev.coherent_dma_mask; + + if (pci_hb_dma_mask && ((pci_hb_dma_mask) < (*dev->dma_mask))) + return pci_hb_dma_mask; + } +default_dev_dma_mask: +#endif + return (!is_coherent) ? dma_get_mask(dev) : + dev->coherent_dma_mask; +} + +static u64 __iommu_dma_get_coherent_mask(struct device *dev) +{ + return __iommu_dma_mask(dev, true); +} + +static u64 __iommu_dma_get_mask(struct device *dev) +{ + return __iommu_dma_mask(dev, false); +} + + /** * iommu_get_msi_cookie - Acquire just MSI remapping resources * @domain: IOMMU domain to prepare @@ -461,7 +497,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, if (!pages) return NULL; - iova = __alloc_iova(domain, size, dev->coherent_dma_mask, dev); + iova = __alloc_iova(domain, size, __iommu_dma_get_coherent_mask(dev), dev); if (!iova) goto out_free_pages; @@ -532,7 +568,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, struct iova_domain *iovad = cookie_iovad(domain); size_t iova_off = iova_offset(iovad, phys); size_t len = iova_align(iovad, size + iova_off); - struct iova *iova = __alloc_iova(domain, len, dma_get_mask(dev), dev); + struct iova *iova = __alloc_iova(domain, len, __iommu_dma_get_mask(dev), dev); if (!iova) return DMA_ERROR_CODE; @@ -690,7 +726,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, prev = s; } - iova = __alloc_iova(domain, iova_len, dma_get_mask(dev), dev); + iova = __alloc_iova(domain, iova_len, __iommu_dma_get_mask(dev), dev); if (!iova) goto out_restore_sg; @@ -760,7 +796,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, msi_page->phys = msi_addr; if (iovad) { - iova = __alloc_iova(domain, size, dma_get_mask(dev), dev); + iova = __alloc_iova(domain, size, __iommu_dma_get_mask(dev), dev); if (!iova) goto out_free_page; msi_page->iova = iova_dma_addr(iovad, iova); -- 1.9.1
[RFC PATCH] iommu/dma: check pci host bridge dma_mask for IOVA allocation
It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. This patch tries to solve above described IOVA allocation problem by: 1. Adding iommu_get_dma_mask() to get dma_mask of any device 2. For PCI device, iommu_get_dma_mask() compare dma_mask of PCI device and corresponding PCI Host dma_mask (if set). 3. Use iommu_get_dma_mask() in IOMMU DMA ops implementation instead of dma_get_mask() Signed-off-by: Oza Pawandeep Reviewed-by: Anup Patel Reviewed-by: Scott Branden --- drivers/iommu/dma-iommu.c | 44 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 48d36ce..e93e536 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -108,6 +108,42 @@ int iommu_get_dma_cookie(struct iommu_domain *domain) } EXPORT_SYMBOL(iommu_get_dma_cookie); +static u64 __iommu_dma_mask(struct device *dev, bool is_coherent) +{ +#ifdef CONFIG_PCI + u64 pci_hb_dma_mask; + + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus); + + if ((!is_coherent) && !(br->dev.dma_mask)) + goto default_dev_dma_mask; + + /* pci host bridge dma-mask. */ + pci_hb_dma_mask = (!is_coherent) ? *br->dev.dma_mask : + br->dev.coherent_dma_mask; + + if (pci_hb_dma_mask && ((pci_hb_dma_mask) < (*dev->dma_mask))) + return pci_hb_dma_mask; + } +default_dev_dma_mask: +#endif + return (!is_coherent) ? dma_get_mask(dev) : + dev->coherent_dma_mask; +} + +static u64 __iommu_dma_get_coherent_mask(struct device *dev) +{ + return __iommu_dma_mask(dev, true); +} + +static u64 __iommu_dma_get_mask(struct device *dev) +{ + return __iommu_dma_mask(dev, false); +} + + /** * iommu_get_msi_cookie - Acquire just MSI remapping resources * @domain: IOMMU domain to prepare @@ -461,7 +497,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, if (!pages) return NULL; - iova = __alloc_iova(domain, size, dev->coherent_dma_mask, dev); + iova = __alloc_iova(domain, size, __iommu_dma_get_coherent_mask(dev), dev); if (!iova) goto out_free_pages; @@ -532,7 +568,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, struct iova_domain *iovad = cookie_iovad(domain); size_t iova_off = iova_offset(iovad, phys); size_t len = iova_align(iovad, size + iova_off); - struct iova *iova = __alloc_iova(domain, len, dma_get_mask(dev), dev); + struct iova *iova = __alloc_iova(domain, len, __iommu_dma_get_mask(dev), dev); if (!iova) return DMA_ERROR_CODE; @@ -690,7 +726,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, prev = s; } - iova = __alloc_iova(domain, iova_len, dma_get_mask(dev), dev); + iova = __alloc_iova(domain, iova_len, __iommu_dma_get_mask(dev), dev); if (!iova) goto out_restore_sg; @@ -760,7 +796,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, msi_page->phys = msi_addr; if (iovad) { - iova = __alloc_iova(domain, size, dma_get_mask(dev), dev); + iova = __alloc_iova(domain, size, __iommu_dma_get_mask(dev), dev); if (!iova) goto out_free_page; msi_page->iova = iova_dma_addr(iovad, iova); -- 1.9.1
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
It seems odd to me to use BUG() for what you appear to be using it for.. not that I know exactly what that it mind you, but when you said when some other gizmo in your box has a problem you crash the kernel, my head tilted to the side - surely there's a more controlled response possible than poking the big red self destruct button ;-) Oza: We have to place red button as our last resort, if we don’t press we pass the time or miss the point where we can go back and debug. So that is something by design. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Friday, May 08, 2015 10:42 AM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Fri, 2015-05-08 at 04:16 +, Oza (Pawandeep) Oza wrote: > So Mike, is this reason strong enough for you ? Nope. I think you did the right thing in removing your dependency on jiffies reliability in a dying box. You don't have to convince me of anything though, CC timer subsystem maintainer, see what he says. > I understand your point: solve the BUG, and I do tend to agree with you. > > But by design and implementation, the BUG() is just a beginning of the end > for dying kernel. > And what happens in between this 'the beginning' and 'the end' is not less > important. > (because say, on our platform we want to get clean RAMDUMP to analyze what > happened, and for that we want to get clean reboot) I don't see anybody else having any trouble getting crash dumps. I spent yet another long day just yesterday, rummaging through one. > Also, > If somebody's design is to legally Crash the kernel (e.g. where kernel is > actually not faulty). > Then, I do expect that tick/timekeeping framework do its job as long as it > can do, and it should do, because kernel is not faulty. > But in this case it doesn’t handover jiffies incrementing job sanely. It seems odd to me to use BUG() for what you appear to be using it for.. not that I know exactly what that it mind you, but when you said when some other gizmo in your box has a problem you crash the kernel, my head tilted to the side - surely there's a more controlled response possible than poking the big red self destruct button ;-) -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
So Mike, is this reason strong enough for you ? I understand your point: solve the BUG, and I do tend to agree with you. But by design and implementation, the BUG() is just a beginning of the end for dying kernel. And what happens in between this 'the beginning' and 'the end' is not less important. (because say, on our platform we want to get clean RAMDUMP to analyze what happened, and for that we want to get clean reboot) Also, If somebody's design is to legally Crash the kernel (e.g. where kernel is actually not faulty). Then, I do expect that tick/timekeeping framework do its job as long as it can do, and it should do, because kernel is not faulty. But in this case it doesn’t handover jiffies incrementing job sanely. In other words, "no one can relies on jiffies, or rather the code which is based on jiffies will never forward progress in this path" Regards, -Oza -Original Message----- From: Oza (Pawandeep) Oza Sent: Thursday, May 07, 2015 2:17 PM To: 'Mike Galbraith' Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 Oh ok. So the reason why I cared was: There is a code in our base which relies on jiffies, but since jiffies are not incrementing, the code waits there and loops forever. And forward progress is on halt. (on cpu0, since that is the only cpu, which is alive) We have changed the code to use mdelay and things move on. But that means that in the patch which I mentioned, any code which relies on jiffies will stuck forever and will not allow rest of the code to get executed and hence no forward progress. specially if that code is running with preempt_disable(); Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 2:00 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 07:05 +0000, Oza (Pawandeep) Oza wrote: > : ) > > Well, I am not sure, if problem was communicated clearly from my side. I understood. I just don't understand why you'd care deeply whether CPU0 halts or eternally waits. Both render it harmless and useless. -Mike N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Oh ok. So the reason why I cared was: There is a code in our base which relies on jiffies, but since jiffies are not incrementing, the code waits there and loops forever. And forward progress is on halt. (on cpu0, since that is the only cpu, which is alive) We have changed the code to use mdelay and things move on. But that means that in the patch which I mentioned, any code which relies on jiffies will stuck forever and will not allow rest of the code to get executed and hence no forward progress. specially if that code is running with preempt_disable(); Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 2:00 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 07:05 +, Oza (Pawandeep) Oza wrote: > : ) > > Well, I am not sure, if problem was communicated clearly from my side. I understood. I just don't understand why you'd care deeply whether CPU0 halts or eternally waits. Both render it harmless and useless. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Mike, Here is the code which will explain you what I meant to address. The is just a WARN_ON in case if "any other cpu, other than this cpu, are all offline, and at the same time tick_do_timer_cpu is not set correctly) Note: this patch is just to put forward the problem. (not an actual patch) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 9142591..3aa4c8c 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -112,6 +112,7 @@ static ktime_t tick_init_jiffy_update(void) static void tick_sched_do_timer(ktime_t now) { int cpu = smp_processor_id(); + int other_cpu, is_cpu_online = 0; #ifdef CONFIG_NO_HZ_COMMON /* @@ -125,6 +126,11 @@ static void tick_sched_do_timer(ktime_t now) && !tick_nohz_full_cpu(cpu)) tick_do_timer_cpu = cpu; #endif + for (other_cpu = 0; other_cpu < nr_cpu_ids = 0; other_cpu++) { + if (other_cpu != cpu) + is_cpu_online += cpu_online(other_cpu); + } + WARN_ON((tick_do_timer_cpu != cpu) && !is_cpu_online) /* Check, if the jiffies need an update */ if (tick_do_timer_cpu == cpu) Regards, -Oza -Original Message----- From: Oza (Pawandeep) Oza Sent: Thursday, May 07, 2015 12:36 PM To: 'Mike Galbraith' Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 : ) Well, I am not sure, if problem was communicated clearly from my side. Let me attempt it again. If variable tick_do_timer_cpu = 0. Things are fine. If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment jiffies. (but say if it is set to tick_do_timer_cpu=1, then core1 will increment jiffies) If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 will be stopped. Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick. For that tick_do_timer_cpu should be set to 0. Which is not happening. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 12:25 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote: > Yes. > But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies. > do_timer should do the job until kernel takes its last breathe and more > precisely CPU0 take its last breathe by halting itself as its last > instruction. Feel free to add a redundant timer subsystem lest we BUG() in there, and whatever else you need to guarantee a perfect orderly death for your box. I prefer live boxen, would make that BUG() go away. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
: ) Well, I am not sure, if problem was communicated clearly from my side. Let me attempt it again. If variable tick_do_timer_cpu = 0. Things are fine. If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment jiffies. (but say if it is set to tick_do_timer_cpu=1, then core1 will increment jiffies) If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 will be stopped. Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick. For that tick_do_timer_cpu should be set to 0. Which is not happening. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 12:25 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote: > Yes. > But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies. > do_timer should do the job until kernel takes its last breathe and more > precisely CPU0 take its last breathe by halting itself as its last > instruction. Feel free to add a redundant timer subsystem lest we BUG() in there, and whatever else you need to guarantee a perfect orderly death for your box. I prefer live boxen, would make that BUG() go away. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
: ) Well, I am not sure, if problem was communicated clearly from my side. Let me attempt it again. If variable tick_do_timer_cpu = 0. Things are fine. If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment jiffies. (but say if it is set to tick_do_timer_cpu=1, then core1 will increment jiffies) If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 will be stopped. Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick. For that tick_do_timer_cpu should be set to 0. Which is not happening. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 12:25 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote: Yes. But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies. do_timer should do the job until kernel takes its last breathe and more precisely CPU0 take its last breathe by halting itself as its last instruction. Feel free to add a redundant timer subsystem lest we BUG() in there, and whatever else you need to guarantee a perfect orderly death for your box. I prefer live boxen, would make that BUG() go away. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Mike, Here is the code which will explain you what I meant to address. The is just a WARN_ON in case if any other cpu, other than this cpu, are all offline, and at the same time tick_do_timer_cpu is not set correctly) Note: this patch is just to put forward the problem. (not an actual patch) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 9142591..3aa4c8c 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -112,6 +112,7 @@ static ktime_t tick_init_jiffy_update(void) static void tick_sched_do_timer(ktime_t now) { int cpu = smp_processor_id(); + int other_cpu, is_cpu_online = 0; #ifdef CONFIG_NO_HZ_COMMON /* @@ -125,6 +126,11 @@ static void tick_sched_do_timer(ktime_t now) !tick_nohz_full_cpu(cpu)) tick_do_timer_cpu = cpu; #endif + for (other_cpu = 0; other_cpu nr_cpu_ids = 0; other_cpu++) { + if (other_cpu != cpu) + is_cpu_online += cpu_online(other_cpu); + } + WARN_ON((tick_do_timer_cpu != cpu) !is_cpu_online) /* Check, if the jiffies need an update */ if (tick_do_timer_cpu == cpu) Regards, -Oza -Original Message- From: Oza (Pawandeep) Oza Sent: Thursday, May 07, 2015 12:36 PM To: 'Mike Galbraith' Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 : ) Well, I am not sure, if problem was communicated clearly from my side. Let me attempt it again. If variable tick_do_timer_cpu = 0. Things are fine. If it is some other value say for e.g. 1, 2 or 3 then core0 does not increment jiffies. (but say if it is set to tick_do_timer_cpu=1, then core1 will increment jiffies) If cpu1 ,2 and 3 are sent smp_send_stop and as a result of that cpu1, 2 and 3 will be stopped. Now only cpu0 is alive, cpu0 should increment jiffies upon each time tick. For that tick_do_timer_cpu should be set to 0. Which is not happening. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 12:25 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 05:58 +, Oza (Pawandeep) Oza wrote: Yes. But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies. do_timer should do the job until kernel takes its last breathe and more precisely CPU0 take its last breathe by halting itself as its last instruction. Feel free to add a redundant timer subsystem lest we BUG() in there, and whatever else you need to guarantee a perfect orderly death for your box. I prefer live boxen, would make that BUG() go away. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Oh ok. So the reason why I cared was: There is a code in our base which relies on jiffies, but since jiffies are not incrementing, the code waits there and loops forever. And forward progress is on halt. (on cpu0, since that is the only cpu, which is alive) We have changed the code to use mdelay and things move on. But that means that in the patch which I mentioned, any code which relies on jiffies will stuck forever and will not allow rest of the code to get executed and hence no forward progress. specially if that code is running with preempt_disable(); Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 2:00 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 07:05 +, Oza (Pawandeep) Oza wrote: : ) Well, I am not sure, if problem was communicated clearly from my side. I understood. I just don't understand why you'd care deeply whether CPU0 halts or eternally waits. Both render it harmless and useless. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
It seems odd to me to use BUG() for what you appear to be using it for.. not that I know exactly what that it mind you, but when you said when some other gizmo in your box has a problem you crash the kernel, my head tilted to the side - surely there's a more controlled response possible than poking the big red self destruct button ;-) Oza: We have to place red button as our last resort, if we don’t press we pass the time or miss the point where we can go back and debug. So that is something by design. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Friday, May 08, 2015 10:42 AM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Fri, 2015-05-08 at 04:16 +, Oza (Pawandeep) Oza wrote: So Mike, is this reason strong enough for you ? Nope. I think you did the right thing in removing your dependency on jiffies reliability in a dying box. You don't have to convince me of anything though, CC timer subsystem maintainer, see what he says. I understand your point: solve the BUG, and I do tend to agree with you. But by design and implementation, the BUG() is just a beginning of the end for dying kernel. And what happens in between this 'the beginning' and 'the end' is not less important. (because say, on our platform we want to get clean RAMDUMP to analyze what happened, and for that we want to get clean reboot) I don't see anybody else having any trouble getting crash dumps. I spent yet another long day just yesterday, rummaging through one. Also, If somebody's design is to legally Crash the kernel (e.g. where kernel is actually not faulty). Then, I do expect that tick/timekeeping framework do its job as long as it can do, and it should do, because kernel is not faulty. But in this case it doesn’t handover jiffies incrementing job sanely. It seems odd to me to use BUG() for what you appear to be using it for.. not that I know exactly what that it mind you, but when you said when some other gizmo in your box has a problem you crash the kernel, my head tilted to the side - surely there's a more controlled response possible than poking the big red self destruct button ;-) -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
So Mike, is this reason strong enough for you ? I understand your point: solve the BUG, and I do tend to agree with you. But by design and implementation, the BUG() is just a beginning of the end for dying kernel. And what happens in between this 'the beginning' and 'the end' is not less important. (because say, on our platform we want to get clean RAMDUMP to analyze what happened, and for that we want to get clean reboot) Also, If somebody's design is to legally Crash the kernel (e.g. where kernel is actually not faulty). Then, I do expect that tick/timekeeping framework do its job as long as it can do, and it should do, because kernel is not faulty. But in this case it doesn’t handover jiffies incrementing job sanely. In other words, no one can relies on jiffies, or rather the code which is based on jiffies will never forward progress in this path Regards, -Oza -Original Message- From: Oza (Pawandeep) Oza Sent: Thursday, May 07, 2015 2:17 PM To: 'Mike Galbraith' Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 Oh ok. So the reason why I cared was: There is a code in our base which relies on jiffies, but since jiffies are not incrementing, the code waits there and loops forever. And forward progress is on halt. (on cpu0, since that is the only cpu, which is alive) We have changed the code to use mdelay and things move on. But that means that in the patch which I mentioned, any code which relies on jiffies will stuck forever and will not allow rest of the code to get executed and hence no forward progress. specially if that code is running with preempt_disable(); Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 2:00 PM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 07:05 +, Oza (Pawandeep) Oza wrote: : ) Well, I am not sure, if problem was communicated clearly from my side. I understood. I just don't understand why you'd care deeply whether CPU0 halts or eternally waits. Both render it harmless and useless. -Mike N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Yes. But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies. do_timer should do the job until kernel takes its last breathe and more precisely CPU0 take its last breathe by halting itself as its last instruction. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 11:25 AM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 05:12 +, Oza (Pawandeep) Oza wrote: > But after Crash, jiffies do not increment. Your kernel said "I'M DEAD", that's a good reason to believe it. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Solution Statement: Fix the UTTERLY DEADLY bug. Oza: that BUG() is LEGAL. Kernel is not a problem there. Somebody else outside of kernel/ARM (some other HW raises the bug), and send indication to kernel that I am not alive. So kernel choose to CRASH ITSELF. So that is legal crash and wanted Crash. But after Crash, jiffies do not increment. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 10:39 AM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 04:37 +, Oza (Pawandeep) Oza wrote: > Problem Statement: the timkeeping is stopped, do_timer is no more a > job of cpu0. > > The reason: the variable "tick_do_timer_cpu" is not set to correct CPU > (cpu0) > And when BUG() happens, the tick_do_timer_cpu variable stay set to 1, > 2 or 3 (we have 4 cores) Solution Statement: Fix the UTTERLY DEADLY bug. -Mike N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Hi Mike, Let me explain the problem again. Problem Statement: the timkeeping is stopped, do_timer is no more a job of cpu0. The reason: the variable "tick_do_timer_cpu" is not set to correct CPU (cpu0) And when BUG() happens, the tick_do_timer_cpu variable stay set to 1, 2 or 3 (we have 4 cores) And finally any code running on core0 (which relies on jiffies incrementing) doesn’t work because there is nobody to increment jiffies. There is tick_handover_do_timer, and if that is called then things are fine, but that is also not getting called because it is tightly coupled with hotplug. since cpu_down is not getting called, this handover is not happening. and the last status of the variable tick_do_timer_cpu is always pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code relies on the increment of jiffies). Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 8:53 AM To: pawandeep oza Cc: linux-kernel@vger.kernel.org; malayasen rout; Oza (Pawandeep) Oza Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Wed, 2015-05-06 at 22:57 +0530, pawandeep oza wrote: > but when say core0 has raised BUG.. ... > what is the right way to approach this problem Look at the spot BUG() printed? BUG() means "Way to go slick, the code you fed me (file:line) is toxic. Have a nice day, your ex-buddy core0". -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Yes. But dying kernel doesn’t mean it CAN NOT INCREMENT jiffies. do_timer should do the job until kernel takes its last breathe and more precisely CPU0 take its last breathe by halting itself as its last instruction. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 11:25 AM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 05:12 +, Oza (Pawandeep) Oza wrote: But after Crash, jiffies do not increment. Your kernel said I'M DEAD, that's a good reason to believe it. -Mike
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Solution Statement: Fix the UTTERLY DEADLY bug. Oza: that BUG() is LEGAL. Kernel is not a problem there. Somebody else outside of kernel/ARM (some other HW raises the bug), and send indication to kernel that I am not alive. So kernel choose to CRASH ITSELF. So that is legal crash and wanted Crash. But after Crash, jiffies do not increment. Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 10:39 AM To: Oza (Pawandeep) Oza Cc: pawandeep oza; linux-kernel@vger.kernel.org; malayasen rout Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Thu, 2015-05-07 at 04:37 +, Oza (Pawandeep) Oza wrote: Problem Statement: the timkeeping is stopped, do_timer is no more a job of cpu0. The reason: the variable tick_do_timer_cpu is not set to correct CPU (cpu0) And when BUG() happens, the tick_do_timer_cpu variable stay set to 1, 2 or 3 (we have 4 cores) Solution Statement: Fix the UTTERLY DEADLY bug. -Mike N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17
Hi Mike, Let me explain the problem again. Problem Statement: the timkeeping is stopped, do_timer is no more a job of cpu0. The reason: the variable tick_do_timer_cpu is not set to correct CPU (cpu0) And when BUG() happens, the tick_do_timer_cpu variable stay set to 1, 2 or 3 (we have 4 cores) And finally any code running on core0 (which relies on jiffies incrementing) doesn’t work because there is nobody to increment jiffies. There is tick_handover_do_timer, and if that is called then things are fine, but that is also not getting called because it is tightly coupled with hotplug. since cpu_down is not getting called, this handover is not happening. and the last status of the variable tick_do_timer_cpu is always pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code relies on the increment of jiffies). Regards, -Oza -Original Message- From: Mike Galbraith [mailto:umgwanakikb...@gmail.com] Sent: Thursday, May 07, 2015 8:53 AM To: pawandeep oza Cc: linux-kernel@vger.kernel.org; malayasen rout; Oza (Pawandeep) Oza Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 On Wed, 2015-05-06 at 22:57 +0530, pawandeep oza wrote: but when say core0 has raised BUG.. ... what is the right way to approach this problem Look at the spot BUG() printed? BUG() means Way to go slick, the code you fed me (file:line) is toxic. Have a nice day, your ex-buddy core0. -Mike