Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-19 Thread Dave Young
On 07/20/18 at 01:06pm, lijiang wrote:
> 在 2018年07月14日 01:08, Borislav Petkov 写道:
> > On Mon, Jul 09, 2018 at 09:55:35PM +0800, lijiang wrote:
> >> About this issue, i want to use an example to describe it.
> >> /* drivers/iommu/amd_iommu_init.c */
> >> static u8 __iomem * __init iommu_map_mmio_space(u64 address, u64 end)
> > 
> > Those addresses come from the IVHD header which is an ACPI table. So the
> > dump kernel can find that out too.
> > Sure. I might understand your means, that will have to find all address out 
> > in
> order to cover any cases in kdump kernel, those address  might include MMIO
> space, HPET, ACPI device table, ERST, and so on... 
> 
> >> Obviously, the iommu mmio space is not encrypted, and the device
> >> mmio space is outside kdump kernel. We know that the old memory is
> >> encrypted, and the old memory is also outside kdump kernel. For the
> >> current case, e820__get_entry_type() and walk_iomem_res_desc() can't
> >> get the desired result, so we can't also decide whether encryption
> >> or not according to this result(rules). If we want to know whether
> >> encryption or not by deducing the address, we will need to read the
> >> content of memory and have a reference value for comparison, then
> >> what's a reference value? Sometimes we don't know that.
> > 
> > Again, if we don't know that how is the *caller* supposed to know
> > whether the memory is encrypted or not? Because
> > 
> > "we" == "caller"
> > 
> > in the kdump kernel.
> > 
> > And the more important question is, why are we dumping MMIO space of the
> > previous kernel *at* *all*? That doesn't make any sense to me.
> > 
> Sorry for my late reply.
> Here, it doesn't need to dump MMIO space of the previous kernel, when the
> kdump kernel boot, the MMIO address will be remapped in decryption manners,
> but the MMIO address don't belong to the range of the crash reserved memory,
> for the kdump kernel, the MMIO space(address) and IOMMU device table(address)
> are outside address, whereas, the IOMMU device table is encrypted in the first
> kernel, the kdump kernel will need to copy the content of IOMMU device table
> from the first kernel when the kdump kernel boot, so the IOMMU device table 
> will
> be remapped in encryption manners.
> So some of them require to be remapped in encryption manners, and 
> some(address)
> require to be remapped in decryption manners.
> 

There could be some misunderstanding here.  From the code
copy_device_table in amd_iommu_init.c,  iommu table entry is retrieved
by read mmio address, then use memremap to map the entry address for
copying the device table, so the thing related to your patch is the dev
table entry address not the mmio address.

As for why need copy the old dev table, I think it is for addressing
on-flight DMA issue,  just like the git log of below commit although the
commit is for Intel IOMMU but I think AMD IOMMU solution is similar:

commit 091d42e43d21b6ca7ec39bf5f9e17bc0bd8d4312
Author: Joerg Roedel 
Date:   Fri Jun 12 11:56:10 2015 +0200

iommu/vt-d: Copy translation tables from old kernel

If we are in a kdump kernel and find translation enabled in
the iommu, try to copy the translation tables from the old
kernel to preserve the mappings until the device driver
takes over.

Baoquan knows more about this I think he can correct if I'm wrong.

Thanks
Dave

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-19 Thread lijiang
在 2018年07月14日 01:08, Borislav Petkov 写道:
> On Mon, Jul 09, 2018 at 09:55:35PM +0800, lijiang wrote:
>> About this issue, i want to use an example to describe it.
>> /* drivers/iommu/amd_iommu_init.c */
>> static u8 __iomem * __init iommu_map_mmio_space(u64 address, u64 end)
> 
> Those addresses come from the IVHD header which is an ACPI table. So the
> dump kernel can find that out too.
> Sure. I might understand your means, that will have to find all address out in
order to cover any cases in kdump kernel, those address  might include MMIO
space, HPET, ACPI device table, ERST, and so on... 

>> Obviously, the iommu mmio space is not encrypted, and the device
>> mmio space is outside kdump kernel. We know that the old memory is
>> encrypted, and the old memory is also outside kdump kernel. For the
>> current case, e820__get_entry_type() and walk_iomem_res_desc() can't
>> get the desired result, so we can't also decide whether encryption
>> or not according to this result(rules). If we want to know whether
>> encryption or not by deducing the address, we will need to read the
>> content of memory and have a reference value for comparison, then
>> what's a reference value? Sometimes we don't know that.
> 
> Again, if we don't know that how is the *caller* supposed to know
> whether the memory is encrypted or not? Because
> 
> "we" == "caller"
> 
> in the kdump kernel.
> 
> And the more important question is, why are we dumping MMIO space of the
> previous kernel *at* *all*? That doesn't make any sense to me.
> 
Sorry for my late reply.
Here, it doesn't need to dump MMIO space of the previous kernel, when the
kdump kernel boot, the MMIO address will be remapped in decryption manners,
but the MMIO address don't belong to the range of the crash reserved memory,
for the kdump kernel, the MMIO space(address) and IOMMU device table(address)
are outside address, whereas, the IOMMU device table is encrypted in the first
kernel, the kdump kernel will need to copy the content of IOMMU device table
from the first kernel when the kdump kernel boot, so the IOMMU device table will
be remapped in encryption manners.
So some of them require to be remapped in encryption manners, and some(address)
require to be remapped in decryption manners.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: DMA mappings and crossing boundaries

2018-07-19 Thread Benjamin Herrenschmidt
On Wed, 2018-07-11 at 14:51 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-07-04 at 13:57 +0100, Robin Murphy wrote:
> > 
> > > Could it ? I wouldn't think dma_map_page is allows to cross page
> > > boundaries ... what about single() ? The main worry is people using
> > > these things on kmalloc'ed memory
> > 
> > Oh, absolutely - the underlying operation is just "prepare for DMA 
> > to/from this physically-contiguous region"; the only real difference 
> > between map_page and map_single is for the sake of the usual "might be 
> > highmem" vs. "definitely lowmem" dichotomy. Nobody's policing any limits 
> > on the size and offset parameters (in fact, if anyone asks I would say 
> > the outcome of the big "offset > PAGE_SIZE" debate for dma_map_sg a few 
> > months back is valid for dma_map_page too, however silly it may seem).
> 
> I think this is a very bad idea though. We should thrive to avoid
> coalescing before the iommu layer and we should avoid creating sglists
> that cross natural alignment boundaries.

Ping ? Jens, Christoph ?

I really really don't like how sg_alloc_table_from_pages() coalesces
the sglist before it gets mapped. This will potentially create huge
constraints and fragmentation in the IOMMU allocator by passing large
chunks to it.

> I had a look at sg_alloc_table_from_pages() and it basically freaks me
> out.
> 
> Back in the old days, we used to have the block layer aggressively
> coalesce requests iirc, but that was changed to let the iommu layer do
> it.
> 
> If you pass to dma_map_sg() something already heavily coalesced, such
> as what sg_alloc_table_from_pages() can do since it seems to have
> absolutely no limits there, you will create a significant fragmentation
> problem in the iommu allocator.
> 
> Instead, we should probably avoid such coalescing at that level and
> instead let the iommu opportunistically coalesce at the point of
> mapping which it does just fine.
> 
> We've been racking our brains here and we can't find a way to implement
> something we want that is both very performance efficient (no global
> locks etc...) and works with boundary crossing mappings.
> 
> We could always fallback to classic small page mappings but that is
> quite suboptimal.
> 
> I also notice that sg_alloc_table_from_pages() doesn't try to prevent
> crossing the 4G boundary. I remember pretty clearly that it was
> explicitely forbidden to create such a crossing.
> 
> > Of course, given that the allocators tend to give out size/order-aligned 
> > chunks, I think you'd have to be pretty tricksy to get two allocations 
> > to line up either side of a large power-of-two boundary *and* go out of 
> > your way to then make a single request spanning both, but it's certainly 
> > not illegal. Realistically, the kind of "scrape together a large buffer 
> > from smaller pieces" code which is liable to hit a boundary-crossing 
> > case by sheer chance is almost certainly going to be taking the 
> > sg_alloc_table_from_pages() + dma_map_sg() route for convenience, rather 
> > than implementing its own merging and piecemeal mapping.
> 
> Yes and I  think what sg_alloc_table_from_pages() does is quite wrong.
> 
> > > > Conceptually it looks pretty easy to extend the allocation constraints
> > > > to cope with that - even the pathological worst case would have an
> > > > absolute upper bound of 3 IOMMU entries for any one physical region -
> > > > but if in practice it's a case of mapping arbitrary CPU pages to 32-bit
> > > > DMA addresses having only 4 1GB slots to play with, I can't really see a
> > > > way to make that practical :(
> > > 
> > > No we are talking about 40-ish-bits of address space, so there's a bit
> > > of leeway. Of course no scheme will work if the user app tries to map
> > > more than the GPU can possibly access.
> > > 
> > > But with newer AMD adding a few more bits and nVidia being at 47-bits,
> > > I think we have some margin, it's just that they can't reach our
> > > discontiguous memory with a normal 'bypass' mapping and I'd rather not
> > > teach Linux about every single way our HW can scatter memory accross
> > > nodes, so an "on demand" mechanism is by far the most flexible way to
> > > deal with all configurations.
> > > 
> > > > Maybe the best compromise would be some sort of hybrid scheme which
> > > > makes sure that one of the IOMMU entries always covers the SWIOTLB
> > > > buffer, and invokes software bouncing for the awkward cases.
> > > 
> > > Hrm... not too sure about that. I'm happy to limit that scheme to well
> > > known GPU vendor/device IDs, and SW bouncing is pointless in these
> > > cases. It would be nice if we could have some kind of guarantee that a
> > > single mapping or sglist entry never crossed a specific boundary
> > > though... We more/less have that for 4G already (well, we are supposed
> > > to at least). Who are the main potential problematic subsystems here ?
> > > I'm thinking network skb allocation pools ... and page cache if it
> > 

Re: [PATCH 1/3] dts: arm64/sdm845: Add node for arm,mmu-500

2018-07-19 Thread Rob Herring
On Thu, Jul 19, 2018 at 11:54 AM Vivek Gautam
 wrote:
>
> Add device node for arm,mmu-500 available on sdm845.
> This MMU-500 with single TCU and multiple TBU architecture
> is shared among all the peripherals except gpu on sdm845.
>
> Signed-off-by: Vivek Gautam 
> ---
>  arch/arm64/boot/dts/qcom/sdm845-mtp.dts |  4 ++
>  arch/arm64/boot/dts/qcom/sdm845.dtsi| 73 
> +
>  2 files changed, 77 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts 
> b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
> index 6d651f314193..13b50dff440f 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
> +++ b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
> @@ -58,3 +58,7 @@
> bias-pull-up;
> };
>  };
> +
> +_smmu {
> +   status = "okay";
> +};
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 00722b533a92..70ca18ae6cb3 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -980,6 +980,79 @@
> cell-index = <0>;
> };
>
> +   apps_smmu: arm,smmu@1500 {

iommu@...

> +   compatible = "arm,mmu-500";

Really unmodified by QCom? Would be better to have SoC specific compatible.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iommu/arm-smmu: Error out only if not enough context interrupts

2018-07-19 Thread Vivek Gautam
Currently we check if the number of context banks is not equal to
num_context_interrupts. However, there are booloaders such as, one
on sdm845 that reserves few context banks and thus kernel views
less than the total available context banks.
So, although the hardware definition in device tree would mention
the correct number of context interrupts, this number can be
greater than the number of context banks visible to smmu in kernel.
We should therefore error out only when the number of context banks
is greater than the available number of context interrupts.

Signed-off-by: Vivek Gautam 
Suggested-by: Tomasz Figa 
Cc: Robin Murphy 
Cc: Will Deacon 
---
 drivers/iommu/arm-smmu.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7c69736a30f8..4cb53bf4f423 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2229,12 +2229,19 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   if (smmu->version == ARM_SMMU_V2 &&
-   smmu->num_context_banks != smmu->num_context_irqs) {
-   dev_err(dev,
-   "found only %d context interrupt(s) but %d required\n",
-   smmu->num_context_irqs, smmu->num_context_banks);
-   return -ENODEV;
+   if (smmu->version == ARM_SMMU_V2) {
+   if (smmu->num_context_banks > smmu->num_context_irqs) {
+   dev_err(dev,
+ "found only %d context irq(s) but %d required\n",
+ smmu->num_context_irqs, smmu->num_context_banks);
+   return -ENODEV;
+   } else if (smmu->num_context_banks < smmu->num_context_irqs) {
+   /* loose extra context interrupts */
+   dev_notice(dev,
+ "found %d context irq(s) but only %d required\n",
+ smmu->num_context_irqs, smmu->num_context_banks);
+   smmu->num_context_irqs = smmu->num_context_banks;
+   }
}
 
for (i = 0; i < smmu->num_global_irqs; ++i) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] dts: arm64/sdm845: Add node for qcom,smmu-v2

2018-07-19 Thread Vivek Gautam
Add device node for qcom,smmu-v2 available on sdm845.
This smmu is available only to GPU device.

Signed-off-by: Vivek Gautam 
---

One power domain required for this GPU smmu - GPU_CX_GDSC,
commented out in the node is coming from gpu clock controller [1].

[1] https://lore.kernel.org/patchwork/cover/962208/

 arch/arm64/boot/dts/qcom/sdm845-mtp.dts |  4 
 arch/arm64/boot/dts/qcom/sdm845.dtsi| 23 +++
 2 files changed, 27 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts 
b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
index 13b50dff440f..969aa76bdceb 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
+++ b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
@@ -62,3 +62,7 @@
 _smmu {
status = "okay";
 };
+
+_smmu {
+   status = "okay";
+};
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 70ca18ae6cb3..db9f9e84d162 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -980,6 +980,29 @@
cell-index = <0>;
};
 
+   gpu_smmu: arm,smmu@504 {
+   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   reg = <0x504 0x1>;
+   #iommu-cells = <1>;
+   #global-interrupts = <2>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   clock-names = "bus", "iface";
+   clocks = < GCC_GPU_MEMNOC_GFX_CLK>,
+< GCC_GPU_CFG_AHB_CLK>;
+
+   /* power-domains = < GPU_CX_GDSC>; */
+   status = "disabled";
+   };
+
apps_smmu: arm,smmu@1500 {
compatible = "arm,mmu-500";
reg = <0x1500 0x8>;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] dts: arm64/sdm845: Add node for arm,mmu-500

2018-07-19 Thread Vivek Gautam
Add device node for arm,mmu-500 available on sdm845.
This MMU-500 with single TCU and multiple TBU architecture
is shared among all the peripherals except gpu on sdm845.

Signed-off-by: Vivek Gautam 
---
 arch/arm64/boot/dts/qcom/sdm845-mtp.dts |  4 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi| 73 +
 2 files changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts 
b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
index 6d651f314193..13b50dff440f 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
+++ b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
@@ -58,3 +58,7 @@
bias-pull-up;
};
 };
+
+_smmu {
+   status = "okay";
+};
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 00722b533a92..70ca18ae6cb3 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -980,6 +980,79 @@
cell-index = <0>;
};
 
+   apps_smmu: arm,smmu@1500 {
+   compatible = "arm,mmu-500";
+   reg = <0x1500 0x8>;
+   #iommu-cells = <2>;
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   status = "disabled";
+   };
+
apss_shared: mailbox@1799 {
compatible = "qcom,sdm845-apss-shared";
reg = <0x1799 0x1000>;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] Enable smmu support on sdm845

2018-07-19 Thread Vivek Gautam
This series enables apps-smmu (arm,mmu-500) and gpu-smmu (qcom,smmu-v2)
on sdm845. gpu-smmu needs one power domain from gpu clock controller
whose driver was sent by Amit [1].

This series also contains a patch to split the check for num_context_irqs
and num_context_banks in two. We now error out only if num_context_irqs
is greater than num_context_banks.
There are cases such as on sdm845, in which few of the context banks are
reserved by the bootloader (secure software), and thus kernel view
lesser number than the total available context banks.
So, we shouldn't error out simply when the num_context_irqs is not equal
to num_context_banks.

[1] https://lore.kernel.org/patchwork/cover/962208/

Vivek Gautam (3):
  dts: arm64/sdm845: Add node for arm,mmu-500
  dts: arm64/sdm845: Add node for qcom,smmu-v2
  iommu/arm-smmu: Error out only if not enough context interrupts

 arch/arm64/boot/dts/qcom/sdm845-mtp.dts |  8 +++
 arch/arm64/boot/dts/qcom/sdm845.dtsi| 96 +
 drivers/iommu/arm-smmu.c| 19 ---
 3 files changed, 117 insertions(+), 6 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-19 Thread Sudip Mukherjee
Hi Mathias,

On Thu, Jul 19, 2018 at 06:42:19PM +0300, Mathias Nyman wrote:
> > > As first aid I could try to implement checks that make sure the flushed 
> > > URBs
> > > trb pointers really are on the current endpoint ring, and also add some 
> > > warning
> > > if we are we are dropping endpoints with URBs still queued.
> > 
> > Yes, please. I think your first-aid will be a much better option than
> > the hacky patch I am using atm.
> > 
> 
> Attached a patch that checks canceled URB td/trb pointers.
> I haven't tested it at all (well compiles and boots, but new code never 
> exercised)
> 
> Does it work for you?

No, not exactly. :(

I can see your message getting printed.
[  249.518394] xhci_hcd :00:14.0: Canceled URB td not found on endpoint ring
[  249.518431] xhci_hcd :00:14.0: Canceled URB td not found on endpoint ring

But I can see the message from slub debug again:

[  348.279986] 
=
[  348.279993] BUG kmalloc-96 (Tainted: G U O   ): Poison overwritten
[  348.279995] 
-

[  348.279997] Disabling lock debugging due to kernel taint
[  348.28] INFO: 0xe5acda60-0xe5acda67. First byte 0x60 instead of 0x6b
[  348.280012] INFO: Allocated in xhci_ring_alloc.constprop.14+0x31/0x125 
[xhci_hcd] age=129264 cpu=0 pid=33
[  348.280019]  ___slab_alloc.constprop.24+0x1fc/0x292
[  348.280023]  __slab_alloc.isra.18.constprop.23+0x1c/0x25
[  348.280026]  kmem_cache_alloc_trace+0x78/0x141
[  348.280032]  xhci_ring_alloc.constprop.14+0x31/0x125 [xhci_hcd]
[  348.280038]  xhci_endpoint_init+0x25f/0x30a [xhci_hcd]
[  348.280044]  xhci_add_endpoint+0x126/0x149 [xhci_hcd]
[  348.280057]  usb_hcd_alloc_bandwidth+0x26a/0x2a0 [usbcore]
[  348.280067]  usb_set_interface+0xeb/0x25d [usbcore]
[  348.280071]  btusb_work+0xeb/0x324 [btusb]
[  348.280076]  process_one_work+0x163/0x2b2
[  348.280080]  worker_thread+0x1a9/0x25c
[  348.280083]  kthread+0xf8/0xfd
[  348.280087]  ret_from_fork+0x2e/0x38
[  348.280095] INFO: Freed in xhci_ring_free+0xa7/0xc6 [xhci_hcd] age=98722 
cpu=0 pid=33
[  348.280098]  __slab_free+0x4b/0x27a
[  348.280100]  kfree+0x12e/0x155
[  348.280106]  xhci_ring_free+0xa7/0xc6 [xhci_hcd]
[  348.280112]  xhci_free_endpoint_ring+0x16/0x20 [xhci_hcd]
[  348.280118]  xhci_check_bandwidth+0x1c2/0x211 [xhci_hcd]
[  348.280129]  usb_hcd_alloc_bandwidth+0x205/0x2a0 [usbcore]
[  348.280139]  usb_set_interface+0xeb/0x25d [usbcore]
[  348.280142]  btusb_work+0x228/0x324 [btusb]
[  348.280145]  process_one_work+0x163/0x2b2
[  348.280148]  worker_thread+0x1a9/0x25c
[  348.280151]  kthread+0xf8/0xfd
[  348.280154]  ret_from_fork+0x2e/0x38
[  348.280158] INFO: Slab 0xf46e0fe0 objects=29 used=29 fp=0x  (null) 
flags=0x40008100
[  348.280160] INFO: Object 0xe5acda48 @offset=6728 fp=0xe5acd700

[  348.280164] Redzone e5acda40: bb bb bb bb bb bb bb bb
  
[  348.280167] Object e5acda48: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
[  348.280169] Object e5acda58: 6b 6b 6b 6b 6b 6b 6b 6b 60 da ac e5 60 da ac e5 
 `...`...
[  348.280171] Object e5acda68: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
[  348.280174] Object e5acda78: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
[  348.280176] Object e5acda88: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
[  348.280179] Object e5acda98: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 
 kkk.
[  348.280181] Redzone e5acdaa8: bb bb bb bb
  
[  348.280183] Padding e5acdb50: 5a 5a 5a 5a 5a 5a 5a 5a
  
[  348.280188] CPU: 0 PID: 133 Comm: weston Tainted: GBU O
4.14.55-20180712+ #2
[  348.280190] Hardware name: xxx, BIOS 2017.01-00087-g43e04de 08/30/2017
[  348.280192] Call Trace:
[  348.280199]  dump_stack+0x47/0x5b
[  348.280202]  print_trailer+0x12b/0x133
[  348.280206]  check_bytes_and_report+0x6c/0xae
[  348.280210]  check_object+0x10a/0x1db
[  348.280214]  alloc_debug_processing+0x79/0x123
[  348.280218]  ___slab_alloc.constprop.24+0x1fc/0x292
[  348.280224]  ? drm_mode_atomic_ioctl+0x374/0x75e
[  348.280227]  ? drm_mode_atomic_ioctl+0x374/0x75e
[  348.280231]  ? drm_mode_object_get+0x28/0x3a
[  348.280235]  ? __radix_tree_lookup+0x27/0x7e
[  348.280238]  ? drm_mode_object_get+0x28/0x3a
[  348.280242]  ? drm_mode_object_put+0x28/0x4c
[  348.280246]  __slab_alloc.isra.18.constprop.23+0x1c/0x25
[  348.280249]  ? __slab_alloc.isra.18.constprop.23+0x1c/0x25
[  348.280253]  kmem_cache_alloc_trace+0x78/0x141
[  348.280257]  ? drm_mode_atomic_ioctl+0x374/0x75e
[  348.280261]  drm_mode_atomic_ioctl+0x374/0x75e
[  348.280267]  ? drm_atomic_set_property+0x442/0x442
[  348.280272]  drm_ioctl_kernel+0x52/0x88
[  348.280275]  drm_ioctl+0x1fc/0x2c1
[  348.280279]  ? drm_atomic_set_property+0x442/0x442
[  

Re: usb HC busted?

2018-07-19 Thread Mathias Nyman

As first aid I could try to implement checks that make sure the flushed URBs
trb pointers really are on the current endpoint ring, and also add some warning
if we are we are dropping endpoints with URBs still queued.


Yes, please. I think your first-aid will be a much better option than
the hacky patch I am using atm.



Attached a patch that checks canceled URB td/trb pointers.
I haven't tested it at all (well compiles and boots, but new code never 
exercised)

Does it work for you?



But we need to fix this properly as well.
xhci needs to be more in sync with usb core in usb_set_interface(), currently 
xhci
has the altssetting up and running when usb core hasn't event started flushing 
endpoints.


I am able to reproduce this on almost all cycles, so I can always test
the fix for you after you are fully back from your holiday.


Nice, thanks

-Mathias
>From a7d4af3129a91811c95ea642f6c916b1c1ca6d46 Mon Sep 17 00:00:00 2001
From: Mathias Nyman 
Date: Thu, 19 Jul 2018 18:06:18 +0300
Subject: [PATCH] xhci: when dequeing a URB make sure it exists on the current
 endpoint ring.

If the endpoint ring has been reallocated since the URB was enqueued,
then URB may contain TD and TRB pointers to a already freed ring.
If this the case then manuallt return the URB, and don't try to stop
the ring. It would be useless.

This can happened if endpoint is not flushed before it is dropped and
re-added, which is the case in usb_set_interface() as xhci does
things in an odd order.

Signed-off-by: Mathias Nyman 
---
 drivers/usb/host/xhci.c | 43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 711da33..5bedab7 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -37,6 +37,21 @@ static unsigned int quirks;
 module_param(quirks, uint, S_IRUGO);
 MODULE_PARM_DESC(quirks, "Bit flags for quirks to be enabled as default");
 
+static bool td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
+{
+	struct xhci_segment *seg = ring->first_seg;
+
+	if (!td || !td->start_seg)
+		return false;
+	do {
+		if (seg == td->start_seg)
+			return true;
+		seg = seg->next;
+	} while (seg && seg != ring->first_seg);
+
+	return false;
+}
+
 /* TODO: copied from ehci-hcd.c - can this be refactored? */
 /*
  * xhci_handshake - spin reading hc until handshake completes or fails
@@ -1467,19 +1482,16 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 		goto done;
 	}
 
+	/* check ring is not re-allocated since URB was enqueued */
+	if (!td_on_ring(_priv->td[0], ep_ring)) {
+		xhci_err(xhci, "Canceled URB td not found on endpoint ring");
+		goto err_unlink_giveback;
+	}
+
 	if (xhci->xhc_state & XHCI_STATE_HALTED) {
 		xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
-"HC halted, freeing TD manually.");
-		for (i = urb_priv->num_tds_done;
-		 i < urb_priv->num_tds;
-		 i++) {
-			td = _priv->td[i];
-			if (!list_empty(>td_list))
-list_del_init(>td_list);
-			if (!list_empty(>cancelled_td_list))
-list_del_init(>cancelled_td_list);
-		}
-		goto err_giveback;
+			   "HC halted, freeing TD manually.");
+		goto err_unlink_giveback;
 	}
 
 	i = urb_priv->num_tds_done;
@@ -1519,6 +1531,15 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	spin_unlock_irqrestore(>lock, flags);
 	return ret;
 
+err_unlink_giveback:
+	for (i = urb_priv->num_tds_done; i < urb_priv->num_tds; i++) {
+		td = _priv->td[i];
+		if (!list_empty(>td_list))
+			list_del_init(>td_list);
+		if (!list_empty(>cancelled_td_list))
+			list_del_init(>cancelled_td_list);
+	}
+
 err_giveback:
 	if (urb_priv)
 		xhci_urb_free_priv(urb_priv);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: usb HC busted?

2018-07-19 Thread Alan Stern
On Thu, 19 Jul 2018, Mathias Nyman wrote:

> xhci driver will set up all the endpoints for the new altsetting already in
> usb_hcd_alloc_bandwidth().
> 
> New endpoints will be ready and rings running after this. I don't know the 
> exact
> history behind this, but I assume it is because xhci does all of the steps to
> drop/add, disable/enable endpoints and check bandwidth in a single configure
> endpoint command, that will return errors if there is not enough bandwidth.

That's right; Sarah and I spent some time going over this while she was 
working on it.  But it looks like the approach isn't adequate.

> This command is issued in hcd->driver->check_bandwidth()
> This means that xhci doesn't really do much in hcd->driver->endpoint_disable 
> or
> hcd->driver->endpoint_enable
> 
> It also means that xhci driver assumes rings are empty when
> hcd->driver->check_bandwidth is called. It will bluntly free dropped rings.
> If there are URBs left on a endpoint ring that was dropped+added
> (freed+reallocated) then those URBs will contain pointers to freed ring,
> causing issues when usb_hcd_flush_endpoint() cancels those URBs.
> 
> usb_set_interface()
>usb_hcd_alloc_bandwidth()
>  hcd->driver->drop_endpoint()
>  hcd->driver->add_endpoint() // allocates new rings
>  hcd->driver->check_bandwidth() // issues configure endpoint command, 
> free rings.
>usb_disable_interface(iface, true)
>  usb_disable_endpoint()
>usb_hcd_flush_endpoint() // will access freed ring if URBs found!!
>usb_hcd_disable_endpoint()
>  hcd->driver->endpoint_disable()  // xhci does nothing
>usb_enable_interface(iface, true)
>  usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci 
> side.
> 
> As first aid I could try to implement checks that make sure the flushed URBs
> trb pointers really are on the current endpoint ring, and also add some 
> warning
> if we are we are dropping endpoints with URBs still queued.
> 
> But we need to fix this properly as well.
> xhci needs to be more in sync with usb core in usb_set_interface(), currently 
> xhci
> has the altssetting up and running when usb core hasn't event started 
> flushing endpoints.

Absolutely.  The core tries to be compatible with host controller
drivers that either allocate bandwidth as it is requested or else
allocate bandwidth all at once when an altsetting is installed.  

xhci-hcd falls into the second category.  However, this approach
requires the bandwidth verification for the new altsetting to be
performed before the old altsetting has been disabled, and the xHCI
hardware can't do this.

We may need to change the core so that the old endpoints are disabled 
before the bandwidth check is done, instead of after.  Of course, this 
leads to an awkward situation if the check fails -- we'd probably have 
to go back and re-install the old altsetting.

Alan Stern

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: Relax warnings for per-device areas

2018-07-19 Thread Christoph Hellwig
On Wed, Jul 18, 2018 at 02:55:59PM -0700, Geoff Levand wrote:
> Hi,
> 
> On 07/17/2018 09:33 AM, Fredrik Noring wrote:
> >>> Here are three other regressions related to the coherent mask 
> >>> WARN_ON_ONCE:
> >>
> >> They are a pretty strong indication that yes, you should really set
> >> the coherent mask if you ever do coherent allocations..
> > 
> > I'm unfortunately unfamiliar with the USB drivers for the PS3. Geoff, what
> > do you think about setting a coherent mask?
> 
> I guess we need to add a dma_set_coherent_mask() call in the driver
> probe routines.  I'll send out a patch.

Yes, exactly!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


use the generic dma-noncoherent code for sh

2018-07-19 Thread Christoph Hellwig
Hi all,

can you review these patches to switch sh to use the generic
dma-noncoherent code?  All the requirements are in mainline already
and we've switched various architectures over to it already.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] sh: use generic dma_noncoherent_ops

2018-07-19 Thread Christoph Hellwig
Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/Kconfig   |  3 +-
 arch/sh/include/asm/Kbuild|  1 +
 arch/sh/include/asm/dma-mapping.h | 26 ---
 arch/sh/kernel/Makefile   |  2 +-
 arch/sh/kernel/dma-coherent.c | 23 +
 arch/sh/kernel/dma-nommu.c| 78 ---
 6 files changed, 15 insertions(+), 118 deletions(-)
 delete mode 100644 arch/sh/include/asm/dma-mapping.h
 delete mode 100644 arch/sh/kernel/dma-nommu.c

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index c9993a0cdc7e..da4db4b5359f 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -51,7 +51,6 @@ config SUPERH
select HAVE_ARCH_AUDITSYSCALL
select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_NMI
-   select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
 
help
@@ -164,6 +163,8 @@ config DMA_COHERENT
 
 config DMA_NONCOHERENT
def_bool !DMA_COHERENT
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select DMA_NONCOHERENT_OPS
 
 config PGTABLE_LEVELS
default 3 if X2TLB
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 46dd82ab2c29..6a5609a55965 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -2,6 +2,7 @@ generic-y += compat.h
 generic-y += current.h
 generic-y += delay.h
 generic-y += div64.h
+generic-y += dma-mapping.h
 generic-y += emergency-restart.h
 generic-y += exec.h
 generic-y += irq_regs.h
diff --git a/arch/sh/include/asm/dma-mapping.h 
b/arch/sh/include/asm/dma-mapping.h
deleted file mode 100644
index 1ebc6a4eb1c5..
--- a/arch/sh/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_SH_DMA_MAPPING_H
-#define __ASM_SH_DMA_MAPPING_H
-
-extern const struct dma_map_ops nommu_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-#ifdef CONFIG_DMA_NONCOHERENT
-   return _dma_ops;
-#else
-   return _direct_ops;
-#endif
-}
-
-extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
-   dma_addr_t *dma_addr, gfp_t flag,
-   unsigned long attrs);
-extern void dma_generic_free_coherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle,
- unsigned long attrs);
-
-void sh_sync_dma_for_device(void *vaddr, size_t size,
-   enum dma_data_direction dir);
-
-#endif /* __ASM_SH_DMA_MAPPING_H */
diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index d5ddb64bfffe..59673f8a3379 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -45,7 +45,7 @@ obj-$(CONFIG_DUMP_CODE)   += disassemble.o
 obj-$(CONFIG_HIBERNATION)  += swsusp.o
 obj-$(CONFIG_DWARF_UNWINDER)   += dwarf.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_event.o perf_callchain.o
-obj-$(CONFIG_DMA_NONCOHERENT)  += dma-nommu.o dma-coherent.o
+obj-$(CONFIG_DMA_NONCOHERENT)  += dma-coherent.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
 
 ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c
index 0e7720b5d495..088ecd188b71 100644
--- a/arch/sh/kernel/dma-coherent.c
+++ b/arch/sh/kernel/dma-coherent.c
@@ -7,14 +7,13 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
 
-void *dma_generic_alloc_coherent(struct device *dev, size_t size,
-dma_addr_t *dma_handle, gfp_t gfp,
-unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   gfp_t gfp, unsigned long attrs)
 {
void *ret, *ret_nocache;
int order = get_order(size);
@@ -29,7 +28,8 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t 
size,
 * Pages from the page allocator may have data present in
 * cache. So flush the cache before using uncached memory.
 */
-   sh_sync_dma_for_device(ret, size, DMA_BIDIRECTIONAL);
+   arch_sync_dma_for_device(dev, virt_to_phys(ret), size,
+   DMA_BIDIRECTIONAL);
 
ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size);
if (!ret_nocache) {
@@ -46,9 +46,8 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t 
size,
return ret_nocache;
 }
 
-void dma_generic_free_coherent(struct device *dev, size_t size,
-  void *vaddr, dma_addr_t dma_handle,
-  unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+   dma_addr_t dma_handle, unsigned long attrs)
 {
int order = get_order(size);
unsigned long pfn = (dma_handle >> PAGE_SHIFT);
@@ -63,12 +62,12 @@ void dma_generic_free_coherent(struct 

[PATCH 1/5] sh: simplify get_arch_dma_ops

2018-07-19 Thread Christoph Hellwig
Remove the indirection through the dma_ops variable, and just return
nommu_dma_ops directly from get_arch_dma_ops.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/include/asm/dma-mapping.h |  5 ++---
 arch/sh/kernel/dma-nommu.c|  8 +---
 arch/sh/mm/consistent.c   |  3 ---
 arch/sh/mm/init.c | 10 --
 4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/arch/sh/include/asm/dma-mapping.h 
b/arch/sh/include/asm/dma-mapping.h
index 41167931e5d9..149e71f95be7 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -2,12 +2,11 @@
 #ifndef __ASM_SH_DMA_MAPPING_H
 #define __ASM_SH_DMA_MAPPING_H
 
-extern const struct dma_map_ops *dma_ops;
-extern void no_iommu_init(void);
+extern const struct dma_map_ops nommu_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-   return dma_ops;
+   return _dma_ops;
 }
 
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index 3e3a32fc676e..79a9edafa5b0 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -79,10 +79,4 @@ const struct dma_map_ops nommu_dma_ops = {
.sync_sg_for_device = nommu_sync_sg_for_device,
 #endif
 };
-
-void __init no_iommu_init(void)
-{
-   if (dma_ops)
-   return;
-   dma_ops = _dma_ops;
-}
+EXPORT_SYMBOL(nommu_dma_ops);
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index fceb2adfcac7..e9d422bd42a5 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -20,9 +20,6 @@
 #include 
 #include 
 
-const struct dma_map_ops *dma_ops;
-EXPORT_SYMBOL(dma_ops);
-
 void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 dma_addr_t *dma_handle, gfp_t gfp,
 unsigned long attrs)
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 4034035fbede..7713c084d040 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -339,22 +339,12 @@ void __init paging_init(void)
free_area_init_nodes(max_zone_pfns);
 }
 
-/*
- * Early initialization for any I/O MMUs we might have.
- */
-static void __init iommu_init(void)
-{
-   no_iommu_init();
-}
-
 unsigned int mem_init_done = 0;
 
 void __init mem_init(void)
 {
pg_data_t *pgdat;
 
-   iommu_init();
-
high_memory = NULL;
for_each_online_pgdat(pgdat)
high_memory = max_t(void *, high_memory,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case

2018-07-19 Thread Christoph Hellwig
This is a slight change in behavior as we avoid the detour through the
virtual mapping for the coherent allocator, but if this CPU really is
coherent that should be the right thing to do.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/Kconfig   | 1 +
 arch/sh/include/asm/dma-mapping.h | 4 
 arch/sh/kernel/Makefile   | 4 ++--
 arch/sh/kernel/dma-nommu.c| 4 
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index dd4f3d3e644f..c9993a0cdc7e 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -159,6 +159,7 @@ config SWAP_IO_SPACE
bool
 
 config DMA_COHERENT
+   select DMA_DIRECT_OPS
bool
 
 config DMA_NONCOHERENT
diff --git a/arch/sh/include/asm/dma-mapping.h 
b/arch/sh/include/asm/dma-mapping.h
index 149e71f95be7..1ebc6a4eb1c5 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -6,7 +6,11 @@ extern const struct dma_map_ops nommu_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
+#ifdef CONFIG_DMA_NONCOHERENT
return _dma_ops;
+#else
+   return _direct_ops;
+#endif
 }
 
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index dc80041f7363..cb5f1bfb52de 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -12,7 +12,7 @@ endif
 
 CFLAGS_REMOVE_return_address.o = -pg
 
-obj-y  := debugtraps.o dma-nommu.o dumpstack.o \
+obj-y  := debugtraps.o dumpstack.o \
   idle.o io.o irq.o irq_$(BITS).o kdebugfs.o   \
   machvec.o nmi_debug.o process.o  \
   process_$(BITS).o ptrace.o ptrace_$(BITS).o  \
@@ -45,7 +45,7 @@ obj-$(CONFIG_DUMP_CODE)   += disassemble.o
 obj-$(CONFIG_HIBERNATION)  += swsusp.o
 obj-$(CONFIG_DWARF_UNWINDER)   += dwarf.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_event.o perf_callchain.o
-
+obj-$(CONFIG_DMA_NONCOHERENT)  += dma-nommu.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
 
 ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index 79a9edafa5b0..d8689b1cb743 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -51,7 +51,6 @@ static int nommu_map_sg(struct device *dev, struct 
scatterlist *sg,
return nents;
 }
 
-#ifdef CONFIG_DMA_NONCOHERENT
 static void nommu_sync_single_for_device(struct device *dev, dma_addr_t addr,
  size_t size, enum dma_data_direction dir)
 {
@@ -67,16 +66,13 @@ static void nommu_sync_sg_for_device(struct device *dev, 
struct scatterlist *sg,
for_each_sg(sg, s, nelems, i)
sh_sync_dma_for_device(sg_virt(s), s->length, dir);
 }
-#endif
 
 const struct dma_map_ops nommu_dma_ops = {
.alloc  = dma_generic_alloc_coherent,
.free   = dma_generic_free_coherent,
.map_page   = nommu_map_page,
.map_sg = nommu_map_sg,
-#ifdef CONFIG_DMA_NONCOHERENT
.sync_single_for_device = nommu_sync_single_for_device,
.sync_sg_for_device = nommu_sync_sg_for_device,
-#endif
 };
 EXPORT_SYMBOL(nommu_dma_ops);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5] sh: split arch/sh/mm/consistent.c

2018-07-19 Thread Christoph Hellwig
Half of the file just contains platform device memory setup code which
is required for all builds, and half contains helpers for dma coherent
allocation, which is only needed if CONFIG_DMA_NONCOHERENT is enabled.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/kernel/Makefile   |  2 +-
 arch/sh/kernel/dma-coherent.c | 85 +++
 arch/sh/mm/consistent.c   | 80 -
 3 files changed, 86 insertions(+), 81 deletions(-)
 create mode 100644 arch/sh/kernel/dma-coherent.c

diff --git a/arch/sh/kernel/Makefile b/arch/sh/kernel/Makefile
index cb5f1bfb52de..d5ddb64bfffe 100644
--- a/arch/sh/kernel/Makefile
+++ b/arch/sh/kernel/Makefile
@@ -45,7 +45,7 @@ obj-$(CONFIG_DUMP_CODE)   += disassemble.o
 obj-$(CONFIG_HIBERNATION)  += swsusp.o
 obj-$(CONFIG_DWARF_UNWINDER)   += dwarf.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_event.o perf_callchain.o
-obj-$(CONFIG_DMA_NONCOHERENT)  += dma-nommu.o
+obj-$(CONFIG_DMA_NONCOHERENT)  += dma-nommu.o dma-coherent.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
 
 ccflags-y := -Werror
diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c
new file mode 100644
index ..0e7720b5d495
--- /dev/null
+++ b/arch/sh/kernel/dma-coherent.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2004 - 2007  Paul Mundt
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void *dma_generic_alloc_coherent(struct device *dev, size_t size,
+dma_addr_t *dma_handle, gfp_t gfp,
+unsigned long attrs)
+{
+   void *ret, *ret_nocache;
+   int order = get_order(size);
+
+   gfp |= __GFP_ZERO;
+
+   ret = (void *)__get_free_pages(gfp, order);
+   if (!ret)
+   return NULL;
+
+   /*
+* Pages from the page allocator may have data present in
+* cache. So flush the cache before using uncached memory.
+*/
+   sh_sync_dma_for_device(ret, size, DMA_BIDIRECTIONAL);
+
+   ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size);
+   if (!ret_nocache) {
+   free_pages((unsigned long)ret, order);
+   return NULL;
+   }
+
+   split_page(pfn_to_page(virt_to_phys(ret) >> PAGE_SHIFT), order);
+
+   *dma_handle = virt_to_phys(ret);
+   if (!WARN_ON(!dev))
+   *dma_handle - PFN_PHYS(dev->dma_pfn_offset);
+
+   return ret_nocache;
+}
+
+void dma_generic_free_coherent(struct device *dev, size_t size,
+  void *vaddr, dma_addr_t dma_handle,
+  unsigned long attrs)
+{
+   int order = get_order(size);
+   unsigned long pfn = (dma_handle >> PAGE_SHIFT);
+   int k;
+
+   if (!WARN_ON(!dev))
+   pfn + dev->dma_pfn_offset;
+
+   for (k = 0; k < (1 << order); k++)
+   __free_pages(pfn_to_page(pfn + k), 0);
+
+   iounmap(vaddr);
+}
+
+void sh_sync_dma_for_device(void *vaddr, size_t size,
+   enum dma_data_direction direction)
+{
+   void *addr = sh_cacheop_vaddr(vaddr);
+
+   switch (direction) {
+   case DMA_FROM_DEVICE:   /* invalidate only */
+   __flush_invalidate_region(addr, size);
+   break;
+   case DMA_TO_DEVICE: /* writeback only */
+   __flush_wback_region(addr, size);
+   break;
+   case DMA_BIDIRECTIONAL: /* writeback and invalidate */
+   __flush_purge_region(addr, size);
+   break;
+   default:
+   BUG();
+   }
+}
+EXPORT_SYMBOL(sh_sync_dma_for_device);
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index 1622ae6b9dbd..792f36129062 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -1,10 +1,6 @@
 /*
- * arch/sh/mm/consistent.c
- *
  * Copyright (C) 2004 - 2007  Paul Mundt
  *
- * Declared coherent memory functions based on arch/x86/kernel/pci-dma_32.c
- *
  * This file is subject to the terms and conditions of the GNU General Public
  * License.  See the file "COPYING" in the main directory of this archive
  * for more details.
@@ -13,83 +9,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-
-void *dma_generic_alloc_coherent(struct device *dev, size_t size,
-dma_addr_t *dma_handle, gfp_t gfp,
-unsigned long attrs)
-{
-   void *ret, *ret_nocache;
-   int order = get_order(size);
-
-   gfp |= __GFP_ZERO;
-
-   ret = (void *)__get_free_pages(gfp, order);
-   if (!ret)
-   return NULL;
-
-   /*
-* Pages from the page allocator may have data present in
-* 

[PATCH 2/5] sh: introduce a sh_cacheop_vaddr helper

2018-07-19 Thread Christoph Hellwig
And use it in the maple bus code to avoid a dma API dependency.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/include/asm/cacheflush.h | 7 +++
 arch/sh/mm/consistent.c  | 6 +-
 drivers/sh/maple/maple.c | 7 ---
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index d103ab5a4e4b..b932e42ef028 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -101,5 +101,12 @@ void kunmap_coherent(void *kvaddr);
 
 void cpu_cache_init(void);
 
+static inline void *sh_cacheop_vaddr(void *vaddr)
+{
+   if (__in_29bit_mode())
+   vaddr = (void *)CAC_ADDR((unsigned long)vaddr);
+   return vaddr;
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_SH_CACHEFLUSH_H */
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index e9d422bd42a5..1622ae6b9dbd 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -74,10 +74,7 @@ void dma_generic_free_coherent(struct device *dev, size_t 
size,
 void sh_sync_dma_for_device(void *vaddr, size_t size,
enum dma_data_direction direction)
 {
-   void *addr;
-
-   addr = __in_29bit_mode() ?
-  (void *)CAC_ADDR((unsigned long)vaddr) : vaddr;
+   void *addr = sh_cacheop_vaddr(vaddr);
 
switch (direction) {
case DMA_FROM_DEVICE:   /* invalidate only */
@@ -93,7 +90,6 @@ void sh_sync_dma_for_device(void *vaddr, size_t size,
BUG();
}
 }
-EXPORT_SYMBOL(sh_sync_dma_for_device);
 
 static int __init memchunk_setup(char *str)
 {
diff --git a/drivers/sh/maple/maple.c b/drivers/sh/maple/maple.c
index 2e45988d1259..e5d7fb81ad66 100644
--- a/drivers/sh/maple/maple.c
+++ b/drivers/sh/maple/maple.c
@@ -300,8 +300,8 @@ static void maple_send(void)
mutex_unlock(_wlist_lock);
if (maple_packets > 0) {
for (i = 0; i < (1 << MAPLE_DMA_PAGES); i++)
-   sh_sync_dma_for_device(maple_sendbuf + i * PAGE_SIZE,
-  PAGE_SIZE, DMA_BIDIRECTIONAL);
+   __flush_purge_region(maple_sendbuf + i * PAGE_SIZE,
+   PAGE_SIZE);
}
 
 finish:
@@ -642,7 +642,8 @@ static void maple_dma_handler(struct work_struct *work)
list_for_each_entry_safe(mq, nmq, _sentq, list) {
mdev = mq->dev;
recvbuf = mq->recvbuf->buf;
-   sh_sync_dma_for_device(recvbuf, 0x400, DMA_FROM_DEVICE);
+   __flush_invalidate_region(sh_cacheop_vaddr(recvbuf),
+   0x400);
code = recvbuf[0];
kfree(mq->sendbuf);
list_del_init(>list);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] openrisc: remove the no-op unmap_page and unmap_sg DMA operations

2018-07-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/kernel/dma.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index 47601274abf7..7cadff93d179 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -171,14 +171,6 @@ or1k_map_page(struct device *dev, struct page *page,
return addr;
 }
 
-static void
-or1k_unmap_page(struct device *dev, dma_addr_t dma_handle,
-   size_t size, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   /* Nothing special to do here... */
-}
-
 static int
 or1k_map_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir,
@@ -195,19 +187,6 @@ or1k_map_sg(struct device *dev, struct scatterlist *sg,
return nents;
 }
 
-static void
-or1k_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
-   struct scatterlist *s;
-   int i;
-
-   for_each_sg(sg, s, nents, i) {
-   or1k_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, 0);
-   }
-}
-
 static void
 or1k_sync_single_for_device(struct device *dev,
dma_addr_t dma_handle, size_t size,
@@ -226,9 +205,7 @@ const struct dma_map_ops or1k_dma_map_ops = {
.alloc = or1k_dma_alloc,
.free = or1k_dma_free,
.map_page = or1k_map_page,
-   .unmap_page = or1k_unmap_page,
.map_sg = or1k_map_sg,
-   .unmap_sg = or1k_unmap_sg,
.sync_single_for_device = or1k_sync_single_for_device,
 };
 EXPORT_SYMBOL(or1k_dma_map_ops);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] openrisc: use generic dma_noncoherent_ops

2018-07-19 Thread Christoph Hellwig
Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/Kconfig   |  2 +
 arch/openrisc/include/asm/Kbuild|  1 +
 arch/openrisc/include/asm/dma-mapping.h | 35 -
 arch/openrisc/kernel/dma.c  | 65 -
 4 files changed, 12 insertions(+), 91 deletions(-)
 delete mode 100644 arch/openrisc/include/asm/dma-mapping.h

diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 9ecad05bfc73..65e3c574c9d3 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -6,6 +6,8 @@
 
 config OPENRISC
def_bool y
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select DMA_NONCOHERENT_OPS
select OF
select OF_EARLY_FLATTREE
select IRQ_DOMAIN
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index 65964d390b10..eb87cd8327c8 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -7,6 +7,7 @@ generic-y += current.h
 generic-y += device.h
 generic-y += div64.h
 generic-y += dma.h
+generic-y += dma-mapping.h
 generic-y += emergency-restart.h
 generic-y += exec.h
 generic-y += extable.h
diff --git a/arch/openrisc/include/asm/dma-mapping.h 
b/arch/openrisc/include/asm/dma-mapping.h
deleted file mode 100644
index e212a1f0b6d2..
--- a/arch/openrisc/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,35 +0,0 @@
-/*
- * OpenRISC Linux
- *
- * Linux architectural port borrowing liberally from similar works of
- * others.  All original copyrights apply as per the original source
- * declaration.
- *
- * OpenRISC implementation:
- * Copyright (C) 2010-2011 Jonas Bonn 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#ifndef __ASM_OPENRISC_DMA_MAPPING_H
-#define __ASM_OPENRISC_DMA_MAPPING_H
-
-/*
- * See Documentation/DMA-API-HOWTO.txt and
- * Documentation/DMA-API.txt for documentation.
- */
-
-#include 
-#include 
-
-extern const struct dma_map_ops or1k_dma_map_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   return _dma_map_ops;
-}
-
-#endif /* __ASM_OPENRISC_DMA_MAPPING_H */
diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index d6a0bf1fa713..159336adfa2f 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -19,9 +19,7 @@
  * the only thing implemented properly.  The rest need looking into...
  */
 
-#include 
-#include 
-#include 
+#include 
 
 #include 
 #include 
@@ -80,10 +78,9 @@ page_clear_nocache(pte_t *pte, unsigned long addr,
  * is being ignored for now; uncached but write-combined memory is a
  * missing feature of the OR1K.
  */
-static void *
-or1k_dma_alloc(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t gfp,
-  unsigned long attrs)
+void *
+arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   gfp_t gfp, unsigned long attrs)
 {
unsigned long va;
void *page;
@@ -115,9 +112,9 @@ or1k_dma_alloc(struct device *dev, size_t size,
return (void *)va;
 }
 
-static void
-or1k_dma_free(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_handle, unsigned long attrs)
+void
+arch_dma_free(struct device *dev, size_t size, void *vaddr,
+   dma_addr_t dma_handle, unsigned long attrs)
 {
unsigned long va = (unsigned long)vaddr;
struct mm_walk walk = {
@@ -133,13 +130,10 @@ or1k_dma_free(struct device *dev, size_t size, void 
*vaddr,
free_pages_exact(vaddr, size);
 }
 
-static void
-or1k_sync_single_for_device(struct device *dev,
-   dma_addr_t dma_handle, size_t size,
-   enum dma_data_direction dir)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t addr, size_t 
size,
+   enum dma_data_direction dir)
 {
unsigned long cl;
-   dma_addr_t addr = dma_handle;
struct cpuinfo_or1k *cpuinfo = _or1k[smp_processor_id()];
 
switch (dir) {
@@ -163,45 +157,4 @@ or1k_sync_single_for_device(struct device *dev,
 */
break;
}
-
-}
-
-static dma_addr_t
-or1k_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
-   unsigned long cl;
-   dma_addr_t addr = page_to_phys(page) + offset;
-   struct cpuinfo_or1k *cpuinfo = _or1k[smp_processor_id()];
-
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   or1k_sync_single_for_device(dev, addr, size, dir);
-   return addr;
 }
-
-static int
-or1k_map_sg(struct device *dev, struct scatterlist *sg,
-   int nents, enum 

[PATCH 1/4] openrisc: remove the sync_single_for_cpu DMA operation

2018-07-19 Thread Christoph Hellwig
openrisc does all the required cache maintainance at dma map time, and none
at unmap time.  It thus has to implement sync_single_for_device to match
the map cace for buffer reuse, but there is no point in doing another
invalidation in the sync_single_cpu_case, which in terms of cache
maintainance is equivalent to the unmap case.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/kernel/dma.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index ec7fd45704d2..47601274abf7 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -208,20 +208,6 @@ or1k_unmap_sg(struct device *dev, struct scatterlist *sg,
}
 }
 
-static void
-or1k_sync_single_for_cpu(struct device *dev,
-dma_addr_t dma_handle, size_t size,
-enum dma_data_direction dir)
-{
-   unsigned long cl;
-   dma_addr_t addr = dma_handle;
-   struct cpuinfo_or1k *cpuinfo = _or1k[smp_processor_id()];
-
-   /* Invalidate the dcache for the requested range */
-   for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size)
-   mtspr(SPR_DCBIR, cl);
-}
-
 static void
 or1k_sync_single_for_device(struct device *dev,
dma_addr_t dma_handle, size_t size,
@@ -243,7 +229,6 @@ const struct dma_map_ops or1k_dma_map_ops = {
.unmap_page = or1k_unmap_page,
.map_sg = or1k_map_sg,
.unmap_sg = or1k_unmap_sg,
-   .sync_single_for_cpu = or1k_sync_single_for_cpu,
.sync_single_for_device = or1k_sync_single_for_device,
 };
 EXPORT_SYMBOL(or1k_dma_map_ops);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] openrisc: fix cache maintainance the the sync_single_for_device DMA operation

2018-07-19 Thread Christoph Hellwig
The cache maintaince in the sync_single_for_device operation should be
equivalent to the map_page operation to facilitate reusing buffers.  Fix the
openrisc implementation by moving the cache maintaince performed in map_page
into the sync_single method, and calling that from map_page.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/kernel/dma.c | 42 +-
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index 7cadff93d179..d6a0bf1fa713 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -133,19 +133,15 @@ or1k_dma_free(struct device *dev, size_t size, void 
*vaddr,
free_pages_exact(vaddr, size);
 }
 
-static dma_addr_t
-or1k_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
+static void
+or1k_sync_single_for_device(struct device *dev,
+   dma_addr_t dma_handle, size_t size,
+   enum dma_data_direction dir)
 {
unsigned long cl;
-   dma_addr_t addr = page_to_phys(page) + offset;
+   dma_addr_t addr = dma_handle;
struct cpuinfo_or1k *cpuinfo = _or1k[smp_processor_id()];
 
-   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
-   return addr;
-
switch (dir) {
case DMA_TO_DEVICE:
/* Flush the dcache for the requested range */
@@ -168,6 +164,20 @@ or1k_map_page(struct device *dev, struct page *page,
break;
}
 
+}
+
+static dma_addr_t
+or1k_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   unsigned long cl;
+   dma_addr_t addr = page_to_phys(page) + offset;
+   struct cpuinfo_or1k *cpuinfo = _or1k[smp_processor_id()];
+
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   or1k_sync_single_for_device(dev, addr, size, dir);
return addr;
 }
 
@@ -187,20 +197,6 @@ or1k_map_sg(struct device *dev, struct scatterlist *sg,
return nents;
 }
 
-static void
-or1k_sync_single_for_device(struct device *dev,
-   dma_addr_t dma_handle, size_t size,
-   enum dma_data_direction dir)
-{
-   unsigned long cl;
-   dma_addr_t addr = dma_handle;
-   struct cpuinfo_or1k *cpuinfo = _or1k[smp_processor_id()];
-
-   /* Flush the dcache for the requested range */
-   for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size)
-   mtspr(SPR_DCBFR, cl);
-}
-
 const struct dma_map_ops or1k_dma_map_ops = {
.alloc = or1k_dma_alloc,
.free = or1k_dma_free,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


use the generic dma-noncoherent code for openrisc

2018-07-19 Thread Christoph Hellwig
Hi Openrisc maintainers,

can you review these patches to switch openrisc to use the generic
dma-noncoherent code?  All the requirements are in mainline already
and we've switched various architectures over to it already.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] hexagon: use generic dma_noncoherent_ops

2018-07-19 Thread Christoph Hellwig
Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig 
---
 arch/hexagon/Kconfig   |   2 +
 arch/hexagon/include/asm/Kbuild|   1 +
 arch/hexagon/include/asm/dma-mapping.h |  40 ---
 arch/hexagon/kernel/dma.c  | 148 ++---
 4 files changed, 11 insertions(+), 180 deletions(-)
 delete mode 100644 arch/hexagon/include/asm/dma-mapping.h

diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 37adb2003033..bcbdcb32935c 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -4,6 +4,7 @@ comment "Linux Kernel Configuration for Hexagon"
 
 config HEXAGON
def_bool y
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select HAVE_OPROFILE
# Other pending projects/to-do items.
# select HAVE_REGS_AND_STACK_ACCESS_API
@@ -28,6 +29,7 @@ config HEXAGON
select GENERIC_CLOCKEVENTS_BROADCAST
select MODULES_USE_ELF_RELA
select GENERIC_CPU_DEVICES
+   select DMA_NONCOHERENT_OPS
---help---
  Qualcomm Hexagon is a processor architecture designed for high
  performance and low power across a wide variety of applications.
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index dd2fd9c0d292..47c4da3d64a4 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -6,6 +6,7 @@ generic-y += compat.h
 generic-y += current.h
 generic-y += device.h
 generic-y += div64.h
+generic-y += dma-mapping.h
 generic-y += emergency-restart.h
 generic-y += extable.h
 generic-y += fb.h
diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
deleted file mode 100644
index 263f6acbfb0f..
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
- * DMA operations for the Hexagon architecture
- *
- * Copyright (c) 2010-2011, The Linux Foundation. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
- * 02110-1301, USA.
- */
-
-#ifndef _ASM_DMA_MAPPING_H
-#define _ASM_DMA_MAPPING_H
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct device;
-
-extern const struct dma_map_ops *dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   return dma_ops;
-}
-
-#endif
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index 9e46556a227d..ffc4ae8e126f 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -18,32 +18,19 @@
  * 02110-1301, USA.
  */
 
-#include 
-#include 
+#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-#define HEXAGON_MAPPING_ERROR  0
-
-const struct dma_map_ops *dma_ops;
-EXPORT_SYMBOL(dma_ops);
-
-static inline void *dma_addr_to_virt(dma_addr_t dma_addr)
-{
-   return phys_to_virt((unsigned long) dma_addr);
-}
-
 static struct gen_pool *coherent_pool;
 
 
 /* Allocates from a pool of uncached memory that was reserved at boot time */
 
-static void *hexagon_dma_alloc_coherent(struct device *dev, size_t size,
-dma_addr_t *dma_addr, gfp_t flag,
-unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_addr,
+   gfp_t flag, unsigned long attrs)
 {
void *ret;
 
@@ -75,58 +62,17 @@ static void *hexagon_dma_alloc_coherent(struct device *dev, 
size_t size,
return ret;
 }
 
-static void hexagon_free_coherent(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_addr, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+   dma_addr_t dma_addr, unsigned long attrs)
 {
gen_pool_free(coherent_pool, (unsigned long) vaddr, size);
 }
 
-static int check_addr(const char *name, struct device *hwdev,
- dma_addr_t bus, size_t size)
-{
-   if (hwdev && hwdev->dma_mask && !dma_capable(hwdev, bus, size)) {
-   if (*hwdev->dma_mask >= DMA_BIT_MASK(32))
-   printk(KERN_ERR
-   "%s: overflow %Lx+%zu of device mask %Lx\n",
-   name, (long long)bus, size,
-   (long long)*hwdev->dma_mask);
-   return 0;
-   }
-   return 1;

[PATCH 2/3] hexagon: implement the sync_sg_for_device DMA operation

2018-07-19 Thread Christoph Hellwig
This methods needs to provide the equivalent of sync_single_for_device
for each S/G list element, but was missing.

Signed-off-by: Christoph Hellwig 
---
 arch/hexagon/kernel/dma.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index d2b717f352f4..9e46556a227d 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -188,6 +188,18 @@ static void hexagon_sync_single_for_device(struct device 
*dev,
dma_sync(dma_addr_to_virt(dma_handle), size, dir);
 }
 
+static void hexagon_sync_sg_for_device(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sgl, sg, nents, i)
+   hexagon_sync_single_for_device(dev, sg_dma_address(sg),
+   sg->length, dir);
+}
+
+
 static int hexagon_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
return dma_addr == HEXAGON_MAPPING_ERROR;
@@ -199,6 +211,7 @@ const struct dma_map_ops hexagon_dma_ops = {
.map_sg = hexagon_map_sg,
.map_page   = hexagon_map_page,
.sync_single_for_device = hexagon_sync_single_for_device,
+   .sync_sg_for_device = hexagon_sync_sg_for_device,
.mapping_error  = hexagon_mapping_error,
 };
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] hexagon: remove the sync_single_for_cpu DMA operation

2018-07-19 Thread Christoph Hellwig
hexagon does all the required cache maintainance at dma map time, and none
at unmap time.  It thus has to implement sync_single_for_device to match
the map cace for buffer reuse, but there is no point in doing another
invalidation in the sync_single_cpu_case, which in terms of cache
maintainance is equivalent to the unmap case.

Signed-off-by: Christoph Hellwig 
---
 arch/hexagon/kernel/dma.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index 77459df34e2e..d2b717f352f4 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -181,13 +181,6 @@ static dma_addr_t hexagon_map_page(struct device *dev, 
struct page *page,
return bus;
 }
 
-static void hexagon_sync_single_for_cpu(struct device *dev,
-   dma_addr_t dma_handle, size_t size,
-   enum dma_data_direction dir)
-{
-   dma_sync(dma_addr_to_virt(dma_handle), size, dir);
-}
-
 static void hexagon_sync_single_for_device(struct device *dev,
dma_addr_t dma_handle, size_t size,
enum dma_data_direction dir)
@@ -205,7 +198,6 @@ const struct dma_map_ops hexagon_dma_ops = {
.free   = hexagon_free_coherent,
.map_sg = hexagon_map_sg,
.map_page   = hexagon_map_page,
-   .sync_single_for_cpu = hexagon_sync_single_for_cpu,
.sync_single_for_device = hexagon_sync_single_for_device,
.mapping_error  = hexagon_mapping_error,
 };
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


use the generic dma-noncoherent code for hexagon

2018-07-19 Thread Christoph Hellwig
Hi Richard,

can you review these patches to switch hexagon to use the generic
dma-noncoherent code?  All the requirements are in mainline already
and we've switched various architectures over to it already.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] microblaze: use generic dma_noncoherent_ops

2018-07-19 Thread Christoph Hellwig
Switch to the generic noncoherent direct mapping implementation.

This removes the direction-based optimizations in
sync_{single,sg}_for_{cpu,device} which were marked untestested and
do not match the usually very well tested {un,}map_{single,sg}
implementations.

Signed-off-by: Christoph Hellwig 
---
 arch/microblaze/Kconfig   |   4 +
 arch/microblaze/include/asm/Kbuild|   1 +
 arch/microblaze/include/asm/dma-mapping.h |  28 -
 arch/microblaze/include/asm/pgtable.h |   2 -
 arch/microblaze/kernel/dma.c  | 144 ++
 arch/microblaze/mm/consistent.c   |   9 +-
 6 files changed, 22 insertions(+), 166 deletions(-)
 delete mode 100644 arch/microblaze/include/asm/dma-mapping.h

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index d14782100088..848e31a86ba5 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -1,6 +1,8 @@
 config MICROBLAZE
def_bool y
select ARCH_HAS_GCOV_PROFILE_ALL
+   select ARCH_HAS_SYNC_DMA_FOR_CPU
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select ARCH_WANT_IPC_PARSE_VERSION
@@ -8,6 +10,8 @@ config MICROBLAZE
select TIMER_OF
select CLONE_BACKWARDS3
select COMMON_CLK
+   select DMA_NONCOHERENT_OPS
+   select DMA_NONCOHERENT_MMAP
select GENERIC_ATOMIC64
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_DEVICES
diff --git a/arch/microblaze/include/asm/Kbuild 
b/arch/microblaze/include/asm/Kbuild
index fe6a6c6e5003..569ba9e670c1 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += bugs.h
 generic-y += compat.h
 generic-y += device.h
 generic-y += div64.h
+generic-y += dma-mapping.h
 generic-y += emergency-restart.h
 generic-y += exec.h
 generic-y += extable.h
diff --git a/arch/microblaze/include/asm/dma-mapping.h 
b/arch/microblaze/include/asm/dma-mapping.h
deleted file mode 100644
index add50c1373bf..
--- a/arch/microblaze/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,28 +0,0 @@
-/*
- * Implements the generic device dma API for microblaze and the pci
- *
- * Copyright (C) 2009-2010 Michal Simek 
- * Copyright (C) 2009-2010 PetaLogix
- *
- * This file is subject to the terms and conditions of the GNU General
- * Public License. See the file COPYING in the main directory of this
- * archive for more details.
- *
- * This file is base on powerpc and x86 dma-mapping.h versions
- * Copyright (C) 2004 IBM
- */
-
-#ifndef _ASM_MICROBLAZE_DMA_MAPPING_H
-#define _ASM_MICROBLAZE_DMA_MAPPING_H
-
-/*
- * Available generic sets of operations
- */
-extern const struct dma_map_ops dma_nommu_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   return _nommu_ops;
-}
-
-#endif /* _ASM_MICROBLAZE_DMA_MAPPING_H */
diff --git a/arch/microblaze/include/asm/pgtable.h 
b/arch/microblaze/include/asm/pgtable.h
index db8b1fa83452..8a2e654b709f 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -553,8 +553,6 @@ void __init *early_get_page(void);
 
 extern unsigned long ioremap_bot, ioremap_base;
 
-void *consistent_alloc(gfp_t gfp, size_t size, dma_addr_t *dma_handle);
-void consistent_free(size_t size, void *vaddr);
 void consistent_sync(void *vaddr, size_t size, int direction);
 void consistent_sync_page(struct page *page, unsigned long offset,
size_t size, int direction);
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index 3145e7dc8ab1..71032cf64669 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -8,29 +8,15 @@
  */
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
-static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flag,
-  unsigned long attrs)
-{
-   return consistent_alloc(flag, size, dma_handle);
-}
-
-static void dma_nommu_free_coherent(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-unsigned long attrs)
-{
-   consistent_free(size, vaddr);
-}
-
-static inline void __dma_sync(unsigned long paddr,
- size_t size, enum dma_data_direction direction)
+static void __dma_sync(struct device *dev, phys_addr_t paddr, size_t size,
+   enum dma_data_direction direction)
 {
switch (direction) {
case DMA_TO_DEVICE:
@@ -45,113 +31,21 @@ static inline void __dma_sync(unsigned long paddr,
}
 }
 
-static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
-int nents, enum dma_data_direction direction,
-unsigned long attrs)
+void 

Re: [PATCH] nios2: use generic dma_noncoherent_ops

2018-07-19 Thread Christoph Hellwig
On Thu, Jul 19, 2018 at 01:48:55PM +0800, Ley Foon Tan wrote:
> Do you have git repo for this series of patches?

This patch should apply without the need for any other patch
to Linus' latest tree, but I have a tree with all the conversions
here:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/generic-dma-noncoherent.3
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-19 Thread Sudip Mukherjee
Hi Mathias,

On Thu, Jul 19, 2018 at 01:59:01PM +0300, Mathias Nyman wrote:
> On 17.07.2018 18:10, Sudip Mukherjee wrote:
> > Hi Alan, Greg,
> > 
> > On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote:
> > > On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> > > > Hi Alan,
> > > > 
> > > > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > > > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > > > > 
> > > > > > I did some more debugging. Tested with a KASAN enabled kernel and 
> > > > > > that
> > > > > > shows the problem. The report is attached.
> > > > > > 

> > 
> > And, my hacky patch worked as I prevented it from calling
> > usb_disable_interface() in this particular case.
> > 
> 
> Back for a few days, looking at this

I hope you had a good holiday. :)

> 
> xhci driver will set up all the endpoints for the new altsetting already in
> usb_hcd_alloc_bandwidth().
> 

> 
> As first aid I could try to implement checks that make sure the flushed URBs
> trb pointers really are on the current endpoint ring, and also add some 
> warning
> if we are we are dropping endpoints with URBs still queued.

Yes, please. I think your first-aid will be a much better option than
the hacky patch I am using atm.

> 
> But we need to fix this properly as well.
> xhci needs to be more in sync with usb core in usb_set_interface(), currently 
> xhci
> has the altssetting up and running when usb core hasn't event started 
> flushing endpoints.

I am able to reproduce this on almost all cycles, so I can always test
the fix for you after you are fully back from your holiday.

--
Regards
Sudip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-19 Thread Mathias Nyman

On 17.07.2018 18:10, Sudip Mukherjee wrote:

Hi Alan, Greg,

On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote:

On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:

Hi Alan,

On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:

On Tue, 17 Jul 2018, Sudip Mukherjee wrote:


I did some more debugging. Tested with a KASAN enabled kernel and that
shows the problem. The report is attached.

To my understanding:

btusb_work() is calling usb_set_interface() with alternate = 0. which
again calls usb_hcd_alloc_bandwidth() and that frees the rings by
xhci_free_endpoint_ring().


That doesn't sound like the right thing to do.  The rings shouldn't be
freed until xhci_endpoint_disable() is called.

On the other hand, there doesn't appear to be any
xhci_endpoint_disable() routine, although a comment refers to it.
Maybe this is the real problem?


one of your old mail might help :)

https://www.spinics.net/lists/linux-usb/msg98123.html


Wrote too soon.

Is it the one you are looking for -
usb_disable_endpoint() is in drivers/usb/core/message.c


I think now I understand what the problem is.
usb_set_interface() calls usb_disable_interface() which again calls
usb_disable_endpoint(). This usb_disable_endpoint() gets the pointer
to 'ep', marks it as NULL and sends the pointer to usb_hcd_flush_endpoint().
After flushing the endpoints usb_disable_endpoint() calls
usb_hcd_disable_endpoint() which tries to do:
if (hcd->driver->endpoint_disable)
hcd->driver->endpoint_disable(hcd, ep);
but there is no endpoint_disable() callback in xhci, so the endpoint is
never marked as disabled. So, next time usb_hcd_flush_endpoint() is
called I get this corruption.
And this is exactly where I used to see the problem happening.

And, my hacky patch worked as I prevented it from calling
usb_disable_interface() in this particular case.



Back for a few days, looking at this

xhci driver will set up all the endpoints for the new altsetting already in
usb_hcd_alloc_bandwidth().

New endpoints will be ready and rings running after this. I don't know the exact
history behind this, but I assume it is because xhci does all of the steps to
drop/add, disable/enable endpoints and check bandwidth in a single configure
endpoint command, that will return errors if there is not enough bandwidth.
This command is issued in hcd->driver->check_bandwidth()
This means that xhci doesn't really do much in hcd->driver->endpoint_disable or
hcd->driver->endpoint_enable

It also means that xhci driver assumes rings are empty when
hcd->driver->check_bandwidth is called. It will bluntly free dropped rings.
If there are URBs left on a endpoint ring that was dropped+added
(freed+reallocated) then those URBs will contain pointers to freed ring,
causing issues when usb_hcd_flush_endpoint() cancels those URBs.

usb_set_interface()
  usb_hcd_alloc_bandwidth()
hcd->driver->drop_endpoint()
hcd->driver->add_endpoint() // allocates new rings
hcd->driver->check_bandwidth() // issues configure endpoint command, free 
rings.
  usb_disable_interface(iface, true)
usb_disable_endpoint()
  usb_hcd_flush_endpoint() // will access freed ring if URBs found!!
  usb_hcd_disable_endpoint()
hcd->driver->endpoint_disable()  // xhci does nothing
  usb_enable_interface(iface, true)
usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side.

As first aid I could try to implement checks that make sure the flushed URBs
trb pointers really are on the current endpoint ring, and also add some warning
if we are we are dropping endpoints with URBs still queued.

But we need to fix this properly as well.
xhci needs to be more in sync with usb core in usb_set_interface(), currently 
xhci
has the altssetting up and running when usb core hasn't event started flushing 
endpoints.

-Mathias
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v13 4/4] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

2018-07-19 Thread Vivek Gautam
qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements. This smmu core is used with
multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Rob Herring 
Reviewed-by: Tomasz Figa 
---

Change since v12:
 - Removed IOMMU_OF_DECLARE() definition after Rob H's patch that is merged
   in driver-core-next.

 .../devicetree/bindings/iommu/arm,smmu.txt | 42 ++
 drivers/iommu/arm-smmu.c   | 13 +++
 2 files changed, 55 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..7c71a6ed465a 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,10 +17,19 @@ conditions.
 "arm,mmu-401"
 "arm,mmu-500"
 "cavium,smmu-v2"
+"qcom,-smmu-v2", "qcom,smmu-v2"
 
   depending on the particular implementation and/or the
   version of the architecture implemented.
 
+  A number of Qcom SoCs use qcom,smmu-v2 version of the IP.
+  "qcom,-smmu-v2" represents a soc specific compatible
+  string that should be present along with the "qcom,smmu-v2"
+  to facilitate SoC specific clocks/power connections and to
+  address specific bug fixes.
+  An example string would be -
+  "qcom,msm8996-smmu-v2", "qcom,smmu-v2".
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
@@ -71,6 +80,22 @@ conditions.
   or using stream matching with #iommu-cells = <2>, and
   may be ignored if present in such cases.
 
+- clock-names:List of the names of clocks input to the device. The
+  required list depends on particular implementation and
+  is as follows:
+  - for "qcom,smmu-v2":
+- "bus": clock required for downstream bus access and
+ for the smmu ptw,
+- "iface": clock required to access smmu's registers
+   through the TCU's programming interface.
+  - unspecified for other implementations.
+
+- clocks: Specifiers for all clocks listed in the clock-names property,
+  as per generic clock bindings.
+
+- power-domains:  Specifiers for power domains required to be powered on for
+  the SMMU to operate, as per generic power domain bindings.
+
 ** Deprecated properties:
 
 - mmu-masters (deprecated in favour of the generic "iommus" binding) :
@@ -137,3 +162,20 @@ conditions.
 iommu-map = <0  0 0x400>;
 ...
 };
+
+   /* Qcom's arm,smmu-v2 implementation */
+   smmu4: iommu {
+   compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
+   reg = <0xd0 0x1>;
+
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+;
+   #iommu-cells = <1>;
+   power-domains = < MDSS_GDSC>;
+
+   clocks = < SMMU_MDP_AXI_CLK>,
+< SMMU_MDP_AHB_CLK>;
+   clock-names = "bus", "iface";
+   };
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 25ff3bdbf7a3..7c69736a30f8 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -119,6 +119,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+   QCOM_SMMUV2,
 };
 
 struct arm_smmu_s2cr {
@@ -1971,6 +1972,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, 
GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
+static const char * const qcom_smmuv2_clks[] = {
+   "bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_smmuv2 = {
+   .version = ARM_SMMU_V2,
+   .model = QCOM_SMMUV2,
+   .clks = qcom_smmuv2_clks,
+   .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
+};
+
 static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = _generic_v1 },
{ .compatible = "arm,smmu-v2", .data = _generic_v2 },
@@ -1978,6 +1990,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401", .data = _mmu401 },
{ .compatible = "arm,mmu-500", .data = _mmu500 },
{ .compatible = "cavium,smmu-v2", .data = _smmuv2 },
+   { .compatible = "qcom,smmu-v2", .data = _smmuv2 },
{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
QUALCOMM INDIA, on behalf 

[PATCH v13 3/4] iommu/arm-smmu: Add the device_link between masters and smmu

2018-07-19 Thread Vivek Gautam
From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Changes since v12:
 - device_link_add() doesn't depend on pm_runtime_enabled() now.
 - Treat failure in device link addition as non-fatal.

 drivers/iommu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 15b20941441a..25ff3bdbf7a3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1461,6 +1461,9 @@ static int arm_smmu_add_device(struct device *dev)
 
iommu_device_link(>iommu, dev);
 
+   device_link_add(dev, smmu->dev,
+   DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
+
return 0;
 
 out_cfg_free:
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v13 2/4] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-07-19 Thread Vivek Gautam
From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Changes since v12:
 - Explicitly enabling and disabling clocks in probe and remove routines
   if runtime PM is disabled,

 drivers/iommu/arm-smmu.c | 101 +++
 1 file changed, 93 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9138a6fffe04..15b20941441a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -268,6 +268,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
 };
 
+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
+{
+   if (pm_runtime_enabled(smmu->dev))
+   return pm_runtime_get_sync(smmu->dev);
+
+   return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+   if (pm_runtime_enabled(smmu->dev))
+   pm_runtime_put(smmu->dev);
+}
+
 static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +927,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = _domain->cfg;
-   int irq;
+   int ret, irq;
 
if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
return;
 
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -932,6 +950,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
 
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   arm_smmu_rpm_put(smmu);
 }
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
@@ -1213,10 +1233,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENODEV;
 
smmu = fwspec_smmu(fwspec);
+
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
 
/*
 * Sanity check the domain. We don't support domains across
@@ -1226,33 +1251,50 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
dev_err(dev,
"cannot attach to SMMU %s whilst already attached to 
domain on SMMU %s\n",
dev_name(smmu_domain->smmu->dev), dev_name(smmu->dev));
-   return -EINVAL;
+   ret = -EINVAL;
+   goto rpm_put;
}
 
/* Looks ok, so add the device to the domain */
-   return arm_smmu_domain_add_master(smmu_domain, fwspec);
+   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+rpm_put:
+   arm_smmu_rpm_put(smmu);
+   return ret;
 }
 
 static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   int ret;
 
if (!ops)
return -ENODEV;
 
-   return ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 size_t size)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   size_t ret;
 
if (!ops)
return 0;
 
-   return ops->unmap(ops, iova, size);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->unmap(ops, iova, size);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -1407,7 +1449,13 @@ static int arm_smmu_add_device(struct device *dev)
while (i--)
cfg->smendx[i] = INVALID_SMENDX;
 
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   goto out_cfg_free;
+
ret = arm_smmu_master_alloc_smes(dev);
+   arm_smmu_rpm_put(smmu);
+
if (ret)
goto 

[PATCH v13 0/4] iommu/arm-smmu: Add runtime pm/sleep support

2018-07-19 Thread Vivek Gautam
This series provides the support for turning on the arm-smmu's
clocks/power domains using runtime pm. This is done using
device links between smmu and client devices. The device link
framework keeps the two devices in correct order for power-cycling
across runtime PM or across system-wide PM.

With addition of a new device link flag DL_FLAG_AUTOREMOVE_SUPPLIER [8]
(available in linux-next of Rafael's linux-pm tree [9]), the device links
created between arm-smmu and its clients will be automatically purged
when arm-smmu driver unbinds from its device.

As not all implementations support clock/power gating, we are checking
for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime
power management for such smmu implementations that can support it.
Otherwise, the clocks are turned to be always on in .probe until .remove.
With conditional runtime pm now, we avoid touching dev->power.lock
in fastpaths for smmu implementations that don't need to do anything
useful with pm_runtime.
This lets us to use the much-argued pm_runtime_get_sync/put_sync()
calls in map/unmap callbacks so that the clients do not have to
worry about handling any of the arm-smmu's power.

This series also adds support for Qcom's arm-smmu-v2 variant that
has different clocks and power requirements.

Previous version of this patch series is @ [2].

Tested this series on msm8996, and sdm845 after pulling in Rafael's linux-pm
linux-next[9] and Joerg's iommu next[10] branches.

[v13]
   Addressing Rafael's comments:
   * Added .suspend pm callback to disable the clocks in system wide suspend.
   * Added corresponding clock enable in .resume pm callback.
   * Explicitly enabling/disabling the clocks now when runtime PM is disabled.
   * device_link_add() doesn't depend on pm_runtime_enabled() as we can
 use device links across system suspend/resume too.

   Addressing Robin's comments:
   * Making device_link_add failures as non-fatal.

   * Removed IOMMU_OF_DECLARE() declaration as we don't need this after Rob's
 patch that removed all of these declarations.

[v12]
   * Use new device link's flag introduced in [8] -
 DL_FLAG_AUTOREMOVE_SUPPLIER. With this devices links are automatically
 purged when arm-smmu driver unbinds.
   * Using pm_runtime_force_suspend() instead of pm_runtime_disable() to
 avoid following warning from arm_smmu_device_remove()

 [295711.537507] [ cut here ]
 [295711.544226] Unpreparing enabled smmu_mdp_ahb_clk
 [295711.549099] WARNING: CPU: 0 PID: 1 at ../drivers/clk/clk.c:697
 clk_core_unprepare+0xd8/0xe0
 ...
 [295711.674073] Call trace:
 [295711.679454]  clk_core_unprepare+0xd8/0xe0
 [295711.682059]  clk_unprepare+0x28/0x40
 [295711.685964]  clk_bulk_unprepare+0x28/0x40
 [295711.689701]  arm_smmu_device_remove+0x88/0xd8
 [295711.693692]  arm_smmu_device_shutdown+0xc/0x18
 [295711.698120]  platform_drv_shutdown+0x20/0x30

[v11]
   * Some more cleanups for device link. We don't need an explicit
 delete for device link from the driver, but just set the flag
 DL_FLAG_AUTOREMOVE.
 device_link_add() API description says -
 "If the DL_FLAG_AUTOREMOVE is set, the link will be removed
 automatically when the consumer device driver unbinds."
   * Addressed the comments for 'smmu' in arm_smmu_map/unmap().
   * Dropped the patch [7] that introduced device_link_del_dev() API. 

[v10]
   * Introduce device_link_del_dev() API to delete the link between
 given consumer and supplier devices. The users of device link
 do not need to store link pointer to delete the link later.
 They can straightaway use this API by passing consumer and
 supplier devices.
   * Made corresponding changes to arm-smmu driver patch handling the
 device links.
   * Dropped the patch [6] that was adding device_link_find() API to
 device core layer. device_link_del_dev() serves the purpose to
 directly delete the link between two given devices.

[v9]
   * Removed 'rpm_supported' flag, instead checking on pm_domain
 to enable runtime pm.
   * Creating device link only when the runtime pm is enabled, as we
 don't need a device link besides managing the power dependency
 between supplier and consumer devices.
   * Introducing a patch to add device_link_find() API that finds
 and existing link between supplier and consumer devices.
 Also, made necessary change to device_link_add() to use this API.
   * arm_smmu_remove_device() now uses this device_link_find() to find
 the device link between smmu device and the master device, and then
 delete this link.
   * Dropped the destroy_domain_context() fix [5] as it was rather,
 introducing catastrophically bad problem by destroying
 'good dev's domain context.
   * Added 'Reviwed-by' tag for Tomasz's review.

[v8]
   * Major change -
 - Added a flag 'rpm_supported' which each platform that supports
   runtime pm, 

[PATCH v13 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-07-19 Thread Vivek Gautam
From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Also, while we enable the runtime pm add a pm sleep suspend
callback that pushes devices to low power state by turning
the clocks off in a system sleep.
Also add corresponding clock enable path in resume callback.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: rework for clock and pm ops]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Changes since v12:
 - Added pm sleep .suspend callback. This disables the clocks.
 - Added corresponding change to enable clocks in .resume
  pm sleep callback.

 drivers/iommu/arm-smmu.c | 75 ++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c73cfce1ccc0..9138a6fffe04 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -205,6 +206,8 @@ struct arm_smmu_device {
u32 num_global_irqs;
u32 num_context_irqs;
unsigned int*irqs;
+   struct clk_bulk_data*clks;
+   int num_clks;
 
u32 cavium_id_base; /* Specific to Cavium */
 
@@ -1897,10 +1900,12 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
 struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
+   const char * const *clks;
+   int num_clks;
 };
 
 #define ARM_SMMU_MATCH_DATA(name, ver, imp)\
-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1919,6 +1924,23 @@ static const struct of_device_id arm_smmu_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
 
+static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,
+  const char * const *clks)
+{
+   int i;
+
+   if (smmu->num_clks < 1)
+   return;
+
+   smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
+ sizeof(*smmu->clks), GFP_KERNEL);
+   if (!smmu->clks)
+   return;
+
+   for (i = 0; i < smmu->num_clks; i++)
+   smmu->clks[i].id = clks[i];
+}
+
 #ifdef CONFIG_ACPI
 static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
 {
@@ -2001,6 +2023,9 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
data = of_device_get_match_data(dev);
smmu->version = data->version;
smmu->model = data->model;
+   smmu->num_clks = data->num_clks;
+
+   arm_smmu_fill_clk_data(smmu, data->clks);
 
parse_driver_options(smmu);
 
@@ -2099,6 +2124,14 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
smmu->irqs[i] = irq;
}
 
+   err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);
+   if (err)
+   return err;
+
+   err = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+   if (err)
+   return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2181,6 +2214,9 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)
 
/* Turn the thing off */
writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+
return 0;
 }
 
@@ -2189,15 +2225,50 @@ static void arm_smmu_device_shutdown(struct 
platform_device *pdev)
arm_smmu_device_remove(pdev);
 }
 
+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   return clk_bulk_enable(smmu->num_clks, smmu->clks);
+}
+
+static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   clk_bulk_disable(smmu->num_clks, smmu->clks);
+
+   return 0;
+}
+
 static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
 {
struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+   int ret;
+
+   if (!pm_runtime_suspended(dev)) {
+   ret = arm_smmu_runtime_resume(dev);

Re: [PATCH] nios2: use generic dma_noncoherent_ops

2018-07-19 Thread Ley Foon Tan
On Wed, 2018-07-11 at 17:32 +0200, Christoph Hellwig wrote:
> ping?  Any comments?
> 
> On Tue, Jun 19, 2018 at 09:01:37AM +0200, Christoph Hellwig wrote:
> > 
> > Switch to the generic noncoherent direct mapping implementation.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  arch/nios2/Kconfig   |   3 +
> >  arch/nios2/include/asm/Kbuild|   1 +
> >  arch/nios2/include/asm/dma-mapping.h |  20 
> >  arch/nios2/mm/dma-mapping.c  | 139 +++--
> > --
> >  4 files changed, 17 insertions(+), 146 deletions(-)
> >  delete mode 100644 arch/nios2/include/asm/dma-mapping.h
> > 
Hi

Do you have git repo for this series of patches?

Regards
Ley Foon
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 2/3] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock

2018-07-19 Thread Leizhen (ThunderTown)



On 2017/10/18 22:04, Robin Murphy wrote:
> Even without the MSI trick, we can still do a lot better than hogging
> the entire queue while it drains. All we actually need to do for the
> necessary guarantee of completion is wait for our particular command to
> have been consumed, and as long as we keep track of where it is there is
> no need to block other CPUs from adding further commands in the
> meantime. There is one theoretical (but incredibly unlikely) edge case
> to avoid, where cons has wrapped twice to still appear 'behind' the sync
> position - this is easily disambiguated by adding a generation count to
> the queue to indicate when prod wraps, since cons cannot wrap twice
> without prod having wrapped at least once.

Hi Robin,

I applied your patch and got below improvemnet for NVMe device.

Randomly Read  IOPS: 146K --> 214K
Randomly Write IOPS: 143K --> 212K

Tested-by: Zhen Lei 

> 
> Signed-off-by: Robin Murphy 
> ---
> 
> v3:
>  - Move queue checks into helpers
>  - Avoid race by updating generation before prod (after some
>deliberation I've concluded that it doesn't actually need any
>special handling for the timeout failure case either)
> 
>  drivers/iommu/arm-smmu-v3.c | 91 
> +
>  1 file changed, 68 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 83b76404e882..3130e7182dde 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -417,7 +417,6 @@
>  
>  /* High-level queue structures */
>  #define ARM_SMMU_POLL_TIMEOUT_US 100
> -#define ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US   100 /* 1s! */
>  #define ARM_SMMU_SYNC_TIMEOUT_US 100 /* 1s! */
>  
>  #define MSI_IOVA_BASE0x800
> @@ -540,6 +539,7 @@ struct arm_smmu_queue {
>  struct arm_smmu_cmdq {
>   struct arm_smmu_queue   q;
>   spinlock_t  lock;
> + int generation;
>  };
>  
>  struct arm_smmu_evtq {
> @@ -737,6 +737,17 @@ static bool queue_empty(struct arm_smmu_queue *q)
>  Q_WRP(q, q->prod) == Q_WRP(q, q->cons);
>  }
>  
> +static bool queue_behind(struct arm_smmu_queue *q, u32 idx)
> +{
> + return Q_IDX(q, q->cons) < Q_IDX(q, idx);
> +}
> +
> +static bool queue_ahead_not_wrapped(struct arm_smmu_queue *q, u32 idx)
> +{
> + return Q_IDX(q, q->cons) >= Q_IDX(q, idx) &&
> +Q_WRP(q, q->cons) == Q_WRP(q, idx);
> +}
> +
>  static void queue_sync_cons(struct arm_smmu_queue *q)
>  {
>   q->cons = readl_relaxed(q->cons_reg);
> @@ -770,21 +781,12 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>   writel(q->prod, q->prod_reg);
>  }
>  
> -/*
> - * Wait for the SMMU to consume items. If drain is true, wait until the queue
> - * is empty. Otherwise, wait until there is at least one free slot.
> - */
> -static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
> +/* Wait for the SMMU to consume items, until there is at least one free slot 
> */
> +static int queue_poll_cons(struct arm_smmu_queue *q, bool wfe)
>  {
> - ktime_t timeout;
> - unsigned int delay = 1;
> + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US);
>  
> - /* Wait longer if it's queue drain */
> - timeout = ktime_add_us(ktime_get(), drain ?
> - ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US :
> - ARM_SMMU_POLL_TIMEOUT_US);
> -
> - while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) {
> + while (queue_sync_cons(q), queue_full(q)) {
>   if (ktime_compare(ktime_get(), timeout) > 0)
>   return -ETIMEDOUT;
>  
> @@ -792,8 +794,7 @@ static int queue_poll_cons(struct arm_smmu_queue *q, bool 
> drain, bool wfe)
>   wfe();
>   } else {
>   cpu_relax();
> - udelay(delay);
> - delay *= 2;
> + udelay(1);
>   }
>   }
>  
> @@ -959,15 +960,20 @@ static void arm_smmu_cmdq_skip_err(struct 
> arm_smmu_device *smmu)
>   queue_write(Q_ENT(q, cons), cmd, q->ent_dwords);
>  }
>  
> -static void arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd)
> +static u32 arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd)
>  {
>   struct arm_smmu_queue *q = >cmdq.q;
>   bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
>  
> + if (Q_IDX(q, q->prod + 1) == 0)
> + smmu->cmdq.generation++;
> +
>   while (queue_insert_raw(q, cmd) == -ENOSPC) {
> - if (queue_poll_cons(q, false, wfe))
> + if (queue_poll_cons(q, wfe))
>   dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>   }
> +
> + return q->prod;
>  }
>  
>  static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
> @@ -1001,15 +1007,53 @@ static