Re: [RFC] virtio-iommu version 0.5

2017-10-25 Thread Linu Cherian
Hi Jean,

On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote:
> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification, the 
> >>>>> paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event 
> >>>>> virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf, at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report translation faults to
> >>>>>   the driver. For the moment only unrecoverable faults are available but
> >>>>>   future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>>>>   properties.
> >>>>> * Rename "address space" to "domain". The change might seem futile but
> >>>>>   allows to introduce PASIDs and other features cleanly in the next
> >>>>>   versions. In the same vein, the few remaining "device" occurrences 
> >>>>> were
> >>>>>   replaced by "endpoint", to avoid any confusion with "the device"
> >>>>>   referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >>>>> I'll focus on optimizations and adding support for hardware 
> >>>>> acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly be optimized, even
> >>>>> without architectural changes. But the architecture itself can also be
> >>>>> improved in a number of ways. Currently it is designed to work well with
> >>>>> VFIO. However, having explicit MAP requests is less efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current architecture 
> >>>>> doesn't
> >>>>> address this. Binding page tables is an obvious way to improve 
> >>>>> throughput
> >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and driver merged, then
> >>>>> we'll analyze and compare several ideas for improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be interested in any
> >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their own page tables 
> >>>> and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest 
> >>>> processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> >>> and adding requests in pretty much the same format to virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is goi

Re: [RFC] virtio-iommu version 0.5

2017-10-25 Thread Linu Cherian
Hi Jean,

On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote:
> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification, the 
> >>>>> paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event 
> >>>>> virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf, at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report translation faults to
> >>>>>   the driver. For the moment only unrecoverable faults are available but
> >>>>>   future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>>>>   properties.
> >>>>> * Rename "address space" to "domain". The change might seem futile but
> >>>>>   allows to introduce PASIDs and other features cleanly in the next
> >>>>>   versions. In the same vein, the few remaining "device" occurrences 
> >>>>> were
> >>>>>   replaced by "endpoint", to avoid any confusion with "the device"
> >>>>>   referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >>>>> I'll focus on optimizations and adding support for hardware 
> >>>>> acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly be optimized, even
> >>>>> without architectural changes. But the architecture itself can also be
> >>>>> improved in a number of ways. Currently it is designed to work well with
> >>>>> VFIO. However, having explicit MAP requests is less efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current architecture 
> >>>>> doesn't
> >>>>> address this. Binding page tables is an obvious way to improve 
> >>>>> throughput
> >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and driver merged, then
> >>>>> we'll analyze and compare several ideas for improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be interested in any
> >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their own page tables 
> >>>> and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest 
> >>>> processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> >>> and adding requests in pretty much the same format to virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is goi

Re: [RFC] virtio-iommu version 0.5

2017-10-25 Thread Linu Cherian
Hi Jean,

On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> Hi Jean,
> Thanks for your reply.
> 
> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> > Hi Linu,
> > 
> > On 24/10/17 07:27, Linu Cherian wrote:
> > > Hi Jean,
> > > 
> > > On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> > >> This is version 0.5 of the virtio-iommu specification, the 
> > >> paravirtualized
> > >> IOMMU. This version addresses feedback from v0.4 and adds an event 
> > >> virtqueue.
> > >> Please find the specification, LaTeX sources and pdf, at:
> > >> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> > >>
> > >> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> > >>
> > >> * Add an event virtqueue for the device to report translation faults to
> > >>   the driver. For the moment only unrecoverable faults are available but
> > >>   future versions will extend it.
> > >> * Simplify PROBE request by removing the ack part, and flattening RESV
> > >>   properties.
> > >> * Rename "address space" to "domain". The change might seem futile but
> > >>   allows to introduce PASIDs and other features cleanly in the next
> > >>   versions. In the same vein, the few remaining "device" occurrences were
> > >>   replaced by "endpoint", to avoid any confusion with "the device"
> > >>   referring to the virtio device across the document.
> > >> * Add implementation notes for RESV_MEM properties.
> > >> * Update ACPI table definition.
> > >> * Fix typos and clarify a few things.
> > >>
> > >> I will publish the Linux driver for v0.5 shortly. Then for next versions
> > >> I'll focus on optimizations and adding support for hardware acceleration.
> > >>
> > >> Existing implementations are simple and can certainly be optimized, even
> > >> without architectural changes. But the architecture itself can also be
> > >> improved in a number of ways. Currently it is designed to work well with
> > >> VFIO. However, having explicit MAP requests is less efficient* than page
> > >> tables for emulated and PV endpoints, and the current architecture 
> > >> doesn't
> > >> address this. Binding page tables is an obvious way to improve throughput
> > >> in that case, but we can explore cleverer (and possibly simpler) ways to
> > >> do it.
> > >>
> > >> So first we'll work on getting the base device and driver merged, then
> > >> we'll analyze and compare several ideas for improving performance.
> > >>
> > >> Thanks,
> > >> Jean
> > >>
> > >> * I have yet to study this behaviour, and would be interested in any
> > >> prior art on the subject of analyzing devices DMA patterns (virtio and
> > >> others)
> > > 
> > > 
> > > From the spec,
> > > Under future extensions.
> > > 
> > > "Page Table Handover, to allow guests to manage their own page tables and 
> > > share them with the MMU"
> > > 
> > > Had few questions on this.
> > > 
> > > 1. Did you mean SVM support for vfio-pci devices attached to guest 
> > > processes here.
> > 
> > Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> > and adding requests in pretty much the same format to virtio-iommu.
> > 
> > > 2. Can you give some hints on how this is going to work , since 
> > > virtio-iommu guest kernel 
> > >driver need to create stage 1 page table as required by hardware which 
> > > is not the case now. 
> > >CMIIW. 
> > 
> > The virtio-iommu device advertises which PASID/page table format is
> > supported by the host (obtained via sysfs and communicated in the PROBE
> > request), then the guest binds page tables or PASID tables to a domain and
> > populates it. Binding page tables alone is easy because we already have
> > the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> > in the host to manage PASID tables. But since the PASID table pointer is
> > translated by stage-2, it woul

Re: [RFC] virtio-iommu version 0.5

2017-10-24 Thread Linu Cherian
Hi Jean,
Thanks for your reply.

On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> Hi Linu,
> 
> On 24/10/17 07:27, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >> IOMMU. This version addresses feedback from v0.4 and adds an event 
> >> virtqueue.
> >> Please find the specification, LaTeX sources and pdf, at:
> >> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>
> >> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>
> >> * Add an event virtqueue for the device to report translation faults to
> >>   the driver. For the moment only unrecoverable faults are available but
> >>   future versions will extend it.
> >> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>   properties.
> >> * Rename "address space" to "domain". The change might seem futile but
> >>   allows to introduce PASIDs and other features cleanly in the next
> >>   versions. In the same vein, the few remaining "device" occurrences were
> >>   replaced by "endpoint", to avoid any confusion with "the device"
> >>   referring to the virtio device across the document.
> >> * Add implementation notes for RESV_MEM properties.
> >> * Update ACPI table definition.
> >> * Fix typos and clarify a few things.
> >>
> >> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >> I'll focus on optimizations and adding support for hardware acceleration.
> >>
> >> Existing implementations are simple and can certainly be optimized, even
> >> without architectural changes. But the architecture itself can also be
> >> improved in a number of ways. Currently it is designed to work well with
> >> VFIO. However, having explicit MAP requests is less efficient* than page
> >> tables for emulated and PV endpoints, and the current architecture doesn't
> >> address this. Binding page tables is an obvious way to improve throughput
> >> in that case, but we can explore cleverer (and possibly simpler) ways to
> >> do it.
> >>
> >> So first we'll work on getting the base device and driver merged, then
> >> we'll analyze and compare several ideas for improving performance.
> >>
> >> Thanks,
> >> Jean
> >>
> >> * I have yet to study this behaviour, and would be interested in any
> >> prior art on the subject of analyzing devices DMA patterns (virtio and
> >> others)
> > 
> > 
> > From the spec,
> > Under future extensions.
> > 
> > "Page Table Handover, to allow guests to manage their own page tables and 
> > share them with the MMU"
> > 
> > Had few questions on this.
> > 
> > 1. Did you mean SVM support for vfio-pci devices attached to guest 
> > processes here.
> 
> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> and adding requests in pretty much the same format to virtio-iommu.
> 
> > 2. Can you give some hints on how this is going to work , since 
> > virtio-iommu guest kernel 
> >driver need to create stage 1 page table as required by hardware which 
> > is not the case now. 
> >CMIIW. 
> 
> The virtio-iommu device advertises which PASID/page table format is
> supported by the host (obtained via sysfs and communicated in the PROBE
> request), then the guest binds page tables or PASID tables to a domain and
> populates it. Binding page tables alone is easy because we already have
> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> in the host to manage PASID tables. But since the PASID table pointer is
> translated by stage-2, it would requires a little more work in the host
> for obtaining GPA buffers from the guest on demand.
  Is this for resolving PCI PRI requests ?. 
  IIUC, PCI PRI requests for devices owned by guest need to be resolved
  by guest itself.


 In addition the BIND
> ioctl is different from the one used by VT-d, so this solution didn't get
> much appreciation.

Could you please share the links on this ?

> 
> The alternative is to bind PASID tables. 

Sorry, i didnt get the difference here.

It requires to factor the guest
> PASID handling code into a library, which is difficult for SMMU. Luckily
> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> the driver isn't a big overhead. The good thing about this solution is
> that it reuses any specification work done for VFIO (and vice versa) and
> any host driver change made for vSMMU/VT-d emulations.
> 
> Thanks,
> Jean

-- 
Linu cherian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] virtio-iommu version 0.5

2017-10-24 Thread Linu Cherian
Hi Jean,

On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> Please find the specification, LaTeX sources and pdf, at:
> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> 
> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> 
> * Add an event virtqueue for the device to report translation faults to
>   the driver. For the moment only unrecoverable faults are available but
>   future versions will extend it.
> * Simplify PROBE request by removing the ack part, and flattening RESV
>   properties.
> * Rename "address space" to "domain". The change might seem futile but
>   allows to introduce PASIDs and other features cleanly in the next
>   versions. In the same vein, the few remaining "device" occurrences were
>   replaced by "endpoint", to avoid any confusion with "the device"
>   referring to the virtio device across the document.
> * Add implementation notes for RESV_MEM properties.
> * Update ACPI table definition.
> * Fix typos and clarify a few things.
> 
> I will publish the Linux driver for v0.5 shortly. Then for next versions
> I'll focus on optimizations and adding support for hardware acceleration.
> 
> Existing implementations are simple and can certainly be optimized, even
> without architectural changes. But the architecture itself can also be
> improved in a number of ways. Currently it is designed to work well with
> VFIO. However, having explicit MAP requests is less efficient* than page
> tables for emulated and PV endpoints, and the current architecture doesn't
> address this. Binding page tables is an obvious way to improve throughput
> in that case, but we can explore cleverer (and possibly simpler) ways to
> do it.
> 
> So first we'll work on getting the base device and driver merged, then
> we'll analyze and compare several ideas for improving performance.
> 
> Thanks,
> Jean
> 
> * I have yet to study this behaviour, and would be interested in any
> prior art on the subject of analyzing devices DMA patterns (virtio and
> others)


>From the spec,
Under future extensions.

"Page Table Handover, to allow guests to manage their own page tables and share 
them with the MMU"

Had few questions on this.

1. Did you mean SVM support for vfio-pci devices attached to guest processes 
here.

2. Can you give some hints on how this is going to work , since virtio-iommu 
guest kernel 
   driver need to create stage 1 page table as required by hardware which is 
not the case now. 
   CMIIW. 


-- 
Linu cherian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] iommu/io-pgtable-arm: Support lockless operation

2017-06-27 Thread Linu Cherian
On Tue Jun 27, 2017 at 09:39:13AM +0100, Will Deacon wrote:
> On Tue, Jun 27, 2017 at 10:41:55AM +0530, Linu Cherian wrote:
> > On Fri Jun 23, 2017 at 05:04:05PM +0530, Linu Cherian wrote:
> > > On Fri Jun 23, 2017 at 11:35:25AM +0100, Robin Murphy wrote:
> > > > Note that on a coherent platform like ThunderX that's as good as just
> > > > deleting it, because you'll never execute the case below. However, on
> > > > reflection I think it can at least safely be downgraded to dma_wmb()
> > > > (i.e. DMB) rather than a full DSB - would you be able to test what
> > > > difference that makes?
> > > 
> > > The testing was done on Thunderx 1, which has a non coherent page table 
> > > walk.
> > > Yes, downgrading to dma_wmb() helps. With this change, performance is 
> > > back to v1.
> > > 
> > 
> > Should i send a patch for this ?
> 
> I already did it:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=for-joerg/arm-smmu/updates=77f3445866c39d8866b31d8d9fa47c7c20938e05
> 
> Will
> 
> _______
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Thanks Will.

-- 
Linu cherian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] iommu/io-pgtable-arm: Support lockless operation

2017-06-26 Thread Linu Cherian
On Fri Jun 23, 2017 at 05:04:05PM +0530, Linu Cherian wrote:
> On Fri Jun 23, 2017 at 11:35:25AM +0100, Robin Murphy wrote:
> > On 23/06/17 09:56, Linu Cherian wrote:
> > > On Fri Jun 23, 2017 at 11:23:26AM +0530, Linu Cherian wrote:
> > >>
> > >> Robin,
> > >> Was trying to understand the new changes. Had few questions on 
> > >> arm_lpae_install_table. 
> > >>
> > >> On Thu Jun 22, 2017 at 04:53:54PM +0100, Robin Murphy wrote:
> > >>> For parallel I/O with multiple concurrent threads servicing the same
> > >>> device (or devices, if several share a domain), serialising page table
> > >>> updates becomes a massive bottleneck. On reflection, though, we don't
> > >>> strictly need to do that - for valid IOMMU API usage, there are in fact
> > >>> only two races that we need to guard against: multiple map requests for
> > >>> different blocks within the same region, when the intermediate-level
> > >>> table for that region does not yet exist; and multiple unmaps of
> > >>> different parts of the same block entry. Both of those are fairly easily
> > >>> solved by using a cmpxchg to install the new table, such that if we then
> > >>> find that someone else's table got there first, we can simply free ours
> > >>> and continue.
> > >>>
> > >>> Make the requisite changes such that we can withstand being called
> > >>> without the caller maintaining a lock. In theory, this opens up a few
> > >>> corners in which wildly misbehaving callers making nonsensical
> > >>> overlapping requests might lead to crashes instead of just unpredictable
> > >>> results, but correct code really does not deserve to pay a significant
> > >>> performance cost for the sake of masking bugs in theoretical broken 
> > >>> code.
> > >>>
> > >>> Signed-off-by: Robin Murphy <robin.mur...@arm.com>
> > >>> ---
> > >>>
> > >>> v2:
> > >>>  - Fix barriers in install_table
> > >>>  - Make a few more PTE accesses with {READ,WRITE}_ONCE() just in case
> > >>>  - Minor cosmetics
> > >>>
> > >>>  drivers/iommu/io-pgtable-arm.c | 72 
> > >>> +-
> > >>>  1 file changed, 57 insertions(+), 15 deletions(-)
> > >>>
> > >>> diff --git a/drivers/iommu/io-pgtable-arm.c 
> > >>> b/drivers/iommu/io-pgtable-arm.c
> > >>> index 6334f51912ea..52700fa958c2 100644
> > >>> --- a/drivers/iommu/io-pgtable-arm.c
> > >>> +++ b/drivers/iommu/io-pgtable-arm.c
> > >>> @@ -20,6 +20,7 @@
> > >>>  
> > >>>  #define pr_fmt(fmt)"arm-lpae io-pgtable: " fmt
> > >>>  
> > >>> +#include 
> > >>>  #include 
> > >>>  #include 
> > >>>  #include 
> > >>> @@ -99,6 +100,8 @@
> > >>>  #define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
> > >>>  #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |
> > >>> \
> > >>>  ARM_LPAE_PTE_ATTR_HI_MASK)
> > >>> +/* Software bit for solving coherency races */
> > >>> +#define ARM_LPAE_PTE_SW_SYNC   (((arm_lpae_iopte)1) << 55)
> > >>>  
> > >>>  /* Stage-1 PTE */
> > >>>  #define ARM_LPAE_PTE_AP_UNPRIV (((arm_lpae_iopte)1) << 6)
> > >>> @@ -249,15 +252,20 @@ static void __arm_lpae_free_pages(void *pages, 
> > >>> size_t size,
> > >>> free_pages_exact(pages, size);
> > >>>  }
> > >>>  
> > >>> +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
> > >>> +   struct io_pgtable_cfg *cfg)
> > >>> +{
> > >>> +   dma_sync_single_for_device(cfg->iommu_dev, 
> > >>> __arm_lpae_dma_addr(ptep),
> > >>> +  sizeof(*ptep), DMA_TO_DEVICE);
> > >>> +}
> > >>> +
> > >>>  static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte 
> > >>> pte,
> > >>>struct io_pgtable_cfg *cfg)
> > >>>  {
> > >>> *ptep = pte;
&

Re: [PATCH v2 5/8] iommu/io-pgtable-arm: Support lockless operation

2017-06-23 Thread Linu Cherian
On Fri Jun 23, 2017 at 11:35:25AM +0100, Robin Murphy wrote:
> On 23/06/17 09:56, Linu Cherian wrote:
> > On Fri Jun 23, 2017 at 11:23:26AM +0530, Linu Cherian wrote:
> >>
> >> Robin,
> >> Was trying to understand the new changes. Had few questions on 
> >> arm_lpae_install_table. 
> >>
> >> On Thu Jun 22, 2017 at 04:53:54PM +0100, Robin Murphy wrote:
> >>> For parallel I/O with multiple concurrent threads servicing the same
> >>> device (or devices, if several share a domain), serialising page table
> >>> updates becomes a massive bottleneck. On reflection, though, we don't
> >>> strictly need to do that - for valid IOMMU API usage, there are in fact
> >>> only two races that we need to guard against: multiple map requests for
> >>> different blocks within the same region, when the intermediate-level
> >>> table for that region does not yet exist; and multiple unmaps of
> >>> different parts of the same block entry. Both of those are fairly easily
> >>> solved by using a cmpxchg to install the new table, such that if we then
> >>> find that someone else's table got there first, we can simply free ours
> >>> and continue.
> >>>
> >>> Make the requisite changes such that we can withstand being called
> >>> without the caller maintaining a lock. In theory, this opens up a few
> >>> corners in which wildly misbehaving callers making nonsensical
> >>> overlapping requests might lead to crashes instead of just unpredictable
> >>> results, but correct code really does not deserve to pay a significant
> >>> performance cost for the sake of masking bugs in theoretical broken code.
> >>>
> >>> Signed-off-by: Robin Murphy <robin.mur...@arm.com>
> >>> ---
> >>>
> >>> v2:
> >>>  - Fix barriers in install_table
> >>>  - Make a few more PTE accesses with {READ,WRITE}_ONCE() just in case
> >>>  - Minor cosmetics
> >>>
> >>>  drivers/iommu/io-pgtable-arm.c | 72 
> >>> +-
> >>>  1 file changed, 57 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/io-pgtable-arm.c 
> >>> b/drivers/iommu/io-pgtable-arm.c
> >>> index 6334f51912ea..52700fa958c2 100644
> >>> --- a/drivers/iommu/io-pgtable-arm.c
> >>> +++ b/drivers/iommu/io-pgtable-arm.c
> >>> @@ -20,6 +20,7 @@
> >>>  
> >>>  #define pr_fmt(fmt)  "arm-lpae io-pgtable: " fmt
> >>>  
> >>> +#include 
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>> @@ -99,6 +100,8 @@
> >>>  #define ARM_LPAE_PTE_ATTR_HI_MASK(((arm_lpae_iopte)6) << 52)
> >>>  #define ARM_LPAE_PTE_ATTR_MASK   (ARM_LPAE_PTE_ATTR_LO_MASK |
> >>> \
> >>>ARM_LPAE_PTE_ATTR_HI_MASK)
> >>> +/* Software bit for solving coherency races */
> >>> +#define ARM_LPAE_PTE_SW_SYNC (((arm_lpae_iopte)1) << 55)
> >>>  
> >>>  /* Stage-1 PTE */
> >>>  #define ARM_LPAE_PTE_AP_UNPRIV   (((arm_lpae_iopte)1) << 6)
> >>> @@ -249,15 +252,20 @@ static void __arm_lpae_free_pages(void *pages, 
> >>> size_t size,
> >>>   free_pages_exact(pages, size);
> >>>  }
> >>>  
> >>> +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
> >>> + struct io_pgtable_cfg *cfg)
> >>> +{
> >>> + dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
> >>> +sizeof(*ptep), DMA_TO_DEVICE);
> >>> +}
> >>> +
> >>>  static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
> >>>  struct io_pgtable_cfg *cfg)
> >>>  {
> >>>   *ptep = pte;
> >>>  
> >>>   if (!(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA))
> >>> - dma_sync_single_for_device(cfg->iommu_dev,
> >>> -__arm_lpae_dma_addr(ptep),
> >>> -sizeof(pte), DMA_TO_DEVICE);
> >>> + __arm_lpae_sync_pte(ptep, cfg);
> >>>  }
> >>>  
> >>>  static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> >>> @@ -314,16 +322,30 @@ static int

Re: [PATCH v2 5/8] iommu/io-pgtable-arm: Support lockless operation

2017-06-23 Thread Linu Cherian
On Fri Jun 23, 2017 at 11:23:26AM +0530, Linu Cherian wrote:
> 
> Robin,
> Was trying to understand the new changes. Had few questions on 
> arm_lpae_install_table. 
> 
> On Thu Jun 22, 2017 at 04:53:54PM +0100, Robin Murphy wrote:
> > For parallel I/O with multiple concurrent threads servicing the same
> > device (or devices, if several share a domain), serialising page table
> > updates becomes a massive bottleneck. On reflection, though, we don't
> > strictly need to do that - for valid IOMMU API usage, there are in fact
> > only two races that we need to guard against: multiple map requests for
> > different blocks within the same region, when the intermediate-level
> > table for that region does not yet exist; and multiple unmaps of
> > different parts of the same block entry. Both of those are fairly easily
> > solved by using a cmpxchg to install the new table, such that if we then
> > find that someone else's table got there first, we can simply free ours
> > and continue.
> > 
> > Make the requisite changes such that we can withstand being called
> > without the caller maintaining a lock. In theory, this opens up a few
> > corners in which wildly misbehaving callers making nonsensical
> > overlapping requests might lead to crashes instead of just unpredictable
> > results, but correct code really does not deserve to pay a significant
> > performance cost for the sake of masking bugs in theoretical broken code.
> > 
> > Signed-off-by: Robin Murphy <robin.mur...@arm.com>
> > ---
> > 
> > v2:
> >  - Fix barriers in install_table
> >  - Make a few more PTE accesses with {READ,WRITE}_ONCE() just in case
> >  - Minor cosmetics
> > 
> >  drivers/iommu/io-pgtable-arm.c | 72 
> > +-
> >  1 file changed, 57 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 6334f51912ea..52700fa958c2 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -20,6 +20,7 @@
> >  
> >  #define pr_fmt(fmt)"arm-lpae io-pgtable: " fmt
> >  
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -99,6 +100,8 @@
> >  #define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
> >  #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |
> > \
> >  ARM_LPAE_PTE_ATTR_HI_MASK)
> > +/* Software bit for solving coherency races */
> > +#define ARM_LPAE_PTE_SW_SYNC   (((arm_lpae_iopte)1) << 55)
> >  
> >  /* Stage-1 PTE */
> >  #define ARM_LPAE_PTE_AP_UNPRIV (((arm_lpae_iopte)1) << 6)
> > @@ -249,15 +252,20 @@ static void __arm_lpae_free_pages(void *pages, size_t 
> > size,
> > free_pages_exact(pages, size);
> >  }
> >  
> > +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
> > +   struct io_pgtable_cfg *cfg)
> > +{
> > +   dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
> > +  sizeof(*ptep), DMA_TO_DEVICE);
> > +}
> > +
> >  static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
> >struct io_pgtable_cfg *cfg)
> >  {
> > *ptep = pte;
> >  
> > if (!(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA))
> > -   dma_sync_single_for_device(cfg->iommu_dev,
> > -  __arm_lpae_dma_addr(ptep),
> > -  sizeof(pte), DMA_TO_DEVICE);
> > +   __arm_lpae_sync_pte(ptep, cfg);
> >  }
> >  
> >  static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> > @@ -314,16 +322,30 @@ static int arm_lpae_init_pte(struct 
> > arm_lpae_io_pgtable *data,
> >  
> >  static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
> >  arm_lpae_iopte *ptep,
> > +arm_lpae_iopte curr,
> >  struct io_pgtable_cfg *cfg)
> >  {
> > -   arm_lpae_iopte new;
> > +   arm_lpae_iopte old, new;
> >  
> > new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
> > if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
> > new |= ARM_LPAE_PTE_NSTABLE;
> >  
> > -   __arm_lpae_set_pte(ptep, new, cfg);
> > -   return new;
> > +   /* Ensure the table itself is visible

Re: [PATCH v2 5/8] iommu/io-pgtable-arm: Support lockless operation

2017-06-22 Thread Linu Cherian

Robin,
Was trying to understand the new changes. Had few questions on 
arm_lpae_install_table. 

On Thu Jun 22, 2017 at 04:53:54PM +0100, Robin Murphy wrote:
> For parallel I/O with multiple concurrent threads servicing the same
> device (or devices, if several share a domain), serialising page table
> updates becomes a massive bottleneck. On reflection, though, we don't
> strictly need to do that - for valid IOMMU API usage, there are in fact
> only two races that we need to guard against: multiple map requests for
> different blocks within the same region, when the intermediate-level
> table for that region does not yet exist; and multiple unmaps of
> different parts of the same block entry. Both of those are fairly easily
> solved by using a cmpxchg to install the new table, such that if we then
> find that someone else's table got there first, we can simply free ours
> and continue.
> 
> Make the requisite changes such that we can withstand being called
> without the caller maintaining a lock. In theory, this opens up a few
> corners in which wildly misbehaving callers making nonsensical
> overlapping requests might lead to crashes instead of just unpredictable
> results, but correct code really does not deserve to pay a significant
> performance cost for the sake of masking bugs in theoretical broken code.
> 
> Signed-off-by: Robin Murphy <robin.mur...@arm.com>
> ---
> 
> v2:
>  - Fix barriers in install_table
>  - Make a few more PTE accesses with {READ,WRITE}_ONCE() just in case
>  - Minor cosmetics
> 
>  drivers/iommu/io-pgtable-arm.c | 72 
> +-
>  1 file changed, 57 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 6334f51912ea..52700fa958c2 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -20,6 +20,7 @@
>  
>  #define pr_fmt(fmt)  "arm-lpae io-pgtable: " fmt
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -99,6 +100,8 @@
>  #define ARM_LPAE_PTE_ATTR_HI_MASK(((arm_lpae_iopte)6) << 52)
>  #define ARM_LPAE_PTE_ATTR_MASK   (ARM_LPAE_PTE_ATTR_LO_MASK |
> \
>ARM_LPAE_PTE_ATTR_HI_MASK)
> +/* Software bit for solving coherency races */
> +#define ARM_LPAE_PTE_SW_SYNC (((arm_lpae_iopte)1) << 55)
>  
>  /* Stage-1 PTE */
>  #define ARM_LPAE_PTE_AP_UNPRIV   (((arm_lpae_iopte)1) << 6)
> @@ -249,15 +252,20 @@ static void __arm_lpae_free_pages(void *pages, size_t 
> size,
>   free_pages_exact(pages, size);
>  }
>  
> +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
> + struct io_pgtable_cfg *cfg)
> +{
> + dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
> +sizeof(*ptep), DMA_TO_DEVICE);
> +}
> +
>  static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>  struct io_pgtable_cfg *cfg)
>  {
>   *ptep = pte;
>  
>   if (!(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA))
> - dma_sync_single_for_device(cfg->iommu_dev,
> -__arm_lpae_dma_addr(ptep),
> -sizeof(pte), DMA_TO_DEVICE);
> + __arm_lpae_sync_pte(ptep, cfg);
>  }
>  
>  static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> @@ -314,16 +322,30 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
> *data,
>  
>  static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
>arm_lpae_iopte *ptep,
> +  arm_lpae_iopte curr,
>struct io_pgtable_cfg *cfg)
>  {
> - arm_lpae_iopte new;
> + arm_lpae_iopte old, new;
>  
>   new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
>   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
>   new |= ARM_LPAE_PTE_NSTABLE;
>  
> - __arm_lpae_set_pte(ptep, new, cfg);
> - return new;
> + /* Ensure the table itself is visible before its PTE can be */
> + wmb();

Could you please give more hints on why this is required.


> +
> + old = cmpxchg64_relaxed(ptep, curr, new);
> +
> + if ((cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA) ||
> + (old & ARM_LPAE_PTE_SW_SYNC))
> + return old;
> +
> + /* Even if it's not ours, there's no point waiting; just kick it */
> + __arm_lpae_sync_pte(ptep, cfg);
> + if (old == curr)
> + WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_

Re: [v4 3/4] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-05-09 Thread Linu Cherian
> On Tue May 09, 2017 at 02:02:58PM +0100, Robin Murphy wrote:
> > On 09/05/17 12:45, Geetha sowjanya wrote:
> > > From: Linu Cherian <linu.cher...@cavium.com>
> > > 
> > > Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
> > > and PAGE0_REGS_ONLY option is enabled as an errata workaround.
> > > This option when turned on, replaces all page 1 offsets used for
> > > EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.
> > > 
> > > SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY,
> > > since resource size can be either 64k/128k.
> > > For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
> > > platform_get_resource call, so that SMMU options are set beforehand.
> > > 
> > > Signed-off-by: Linu Cherian <linu.cher...@cavium.com>
> > > Signed-off-by: Geetha Sowjanya <geethasowjanya.ak...@cavium.com>
> > > ---
> > >  Documentation/arm64/silicon-errata.txt |  1 +
> > >  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |  6 ++
> > >  drivers/iommu/arm-smmu-v3.c| 80 
> > > --
> > >  3 files changed, 66 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/Documentation/arm64/silicon-errata.txt 
> > > b/Documentation/arm64/silicon-errata.txt
> > > index 10f2ddd..4693a32 100644
> > > --- a/Documentation/arm64/silicon-errata.txt
> > > +++ b/Documentation/arm64/silicon-errata.txt
> > > @@ -62,6 +62,7 @@ stable kernels.
> > >  | Cavium | ThunderX GICv3  | #23154  | 
> > > CAVIUM_ERRATUM_23154|
> > >  | Cavium | ThunderX Core   | #27456  | 
> > > CAVIUM_ERRATUM_27456|
> > >  | Cavium | ThunderX SMMUv2 | #27704  | N/A   
> > >   |
> > > +| Cavium | ThunderX2 SMMUv3| #74 | N/A   
> > >   |
> > >  || | |   
> > >   |
> > >  | Freescale/NXP  | LS2080A/LS1043A | A-008585| 
> > > FSL_ERRATUM_A008585 |
> > >  || | |   
> > >   |
> > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
> > > b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > index be57550..e6da62b 100644
> > > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > @@ -49,6 +49,12 @@ the PCIe specification.
> > >  - hisilicon,broken-prefetch-cmd
> > >  : Avoid sending CMD_PREFETCH_* commands to the SMMU.
> > >  
> > > +- cavium-cn99xx,broken-page1-regspace
> > > +: Replaces all page 1 offsets used for 
> > > EVTQ_PROD/CONS,
> > > + PRIQ_PROD/CONS register access 
> > > with page 0 offsets.
> > > + Set for Caviun ThunderX2 
> > > silicon that doesn't support
> > > + SMMU page1 register space.
> > > +
> > >  ** Example
> > >  
> > >  smmu@2b40 {
> > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > > index 380969a..1e986a0 100644
> > > --- a/drivers/iommu/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm-smmu-v3.c
> > > @@ -176,15 +176,15 @@
> > >  #define ARM_SMMU_CMDQ_CONS   0x9c
> > >  
> > >  #define ARM_SMMU_EVTQ_BASE   0xa0
> > > -#define ARM_SMMU_EVTQ_PROD   0x100a8
> > > -#define ARM_SMMU_EVTQ_CONS   0x100ac
> > > +#define ARM_SMMU_EVTQ_PROD(smmu) (page1_offset_adjust(0x100a8, smmu))
> > > +#define ARM_SMMU_EVTQ_CONS(smmu) (page1_offset_adjust(0x100ac, smmu))
> > 
> > Sorry, perhaps I should have communicated the rest of the idea more
> > explicitly - you now don't need to change these definitions...
> > 
> 
> Fine. 
> 
> 
> > >  #define ARM_SMMU_EVTQ_IRQ_CFG0   0xb0
> > >  #define ARM_SMMU_EVTQ_IRQ_CFG1   0xb8
> > >  #define ARM_SMMU_EVTQ_IRQ_CFG2   0xbc
> > >  
> > >  #define ARM_SMMU_PRIQ_BASE   0xc0
> > > -#define ARM_SMMU_PRIQ_PROD   0x100c8
> > > -#define ARM_S

Re: [v4 3/4] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-05-09 Thread Linu Cherian
On Tue May 09, 2017 at 02:02:58PM +0100, Robin Murphy wrote:
> On 09/05/17 12:45, Geetha sowjanya wrote:
> > From: Linu Cherian <linu.cher...@cavium.com>
> > 
> > Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
> > and PAGE0_REGS_ONLY option is enabled as an errata workaround.
> > This option when turned on, replaces all page 1 offsets used for
> > EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.
> > 
> > SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY,
> > since resource size can be either 64k/128k.
> > For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
> > platform_get_resource call, so that SMMU options are set beforehand.
> > 
> > Signed-off-by: Linu Cherian <linu.cher...@cavium.com>
> > Signed-off-by: Geetha Sowjanya <geethasowjanya.ak...@cavium.com>
> > ---
> >  Documentation/arm64/silicon-errata.txt |  1 +
> >  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |  6 ++
> >  drivers/iommu/arm-smmu-v3.c| 80 
> > --
> >  3 files changed, 66 insertions(+), 21 deletions(-)
> > 
> > diff --git a/Documentation/arm64/silicon-errata.txt 
> > b/Documentation/arm64/silicon-errata.txt
> > index 10f2ddd..4693a32 100644
> > --- a/Documentation/arm64/silicon-errata.txt
> > +++ b/Documentation/arm64/silicon-errata.txt
> > @@ -62,6 +62,7 @@ stable kernels.
> >  | Cavium | ThunderX GICv3  | #23154  | 
> > CAVIUM_ERRATUM_23154|
> >  | Cavium | ThunderX Core   | #27456  | 
> > CAVIUM_ERRATUM_27456|
> >  | Cavium | ThunderX SMMUv2 | #27704  | N/A 
> > |
> > +| Cavium | ThunderX2 SMMUv3| #74 | N/A 
> > |
> >  || | | 
> > |
> >  | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 
> > |
> >  || | | 
> > |
> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
> > b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > index be57550..e6da62b 100644
> > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > @@ -49,6 +49,12 @@ the PCIe specification.
> >  - hisilicon,broken-prefetch-cmd
> >  : Avoid sending CMD_PREFETCH_* commands to the SMMU.
> >  
> > +- cavium-cn99xx,broken-page1-regspace
> > +: Replaces all page 1 offsets used for EVTQ_PROD/CONS,
> > +   PRIQ_PROD/CONS register access 
> > with page 0 offsets.
> > +   Set for Caviun ThunderX2 
> > silicon that doesn't support
> > +   SMMU page1 register space.
> > +
> >  ** Example
> >  
> >  smmu@2b40 {
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 380969a..1e986a0 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -176,15 +176,15 @@
> >  #define ARM_SMMU_CMDQ_CONS 0x9c
> >  
> >  #define ARM_SMMU_EVTQ_BASE 0xa0
> > -#define ARM_SMMU_EVTQ_PROD 0x100a8
> > -#define ARM_SMMU_EVTQ_CONS 0x100ac
> > +#define ARM_SMMU_EVTQ_PROD(smmu)   (page1_offset_adjust(0x100a8, smmu))
> > +#define ARM_SMMU_EVTQ_CONS(smmu)   (page1_offset_adjust(0x100ac, smmu))
> 
> Sorry, perhaps I should have communicated the rest of the idea more
> explicitly - you now don't need to change these definitions...
> 

Fine. 


> >  #define ARM_SMMU_EVTQ_IRQ_CFG0 0xb0
> >  #define ARM_SMMU_EVTQ_IRQ_CFG1 0xb8
> >  #define ARM_SMMU_EVTQ_IRQ_CFG2 0xbc
> >  
> >  #define ARM_SMMU_PRIQ_BASE 0xc0
> > -#define ARM_SMMU_PRIQ_PROD 0x100c8
> > -#define ARM_SMMU_PRIQ_CONS 0x100cc
> > +#define ARM_SMMU_PRIQ_PROD(smmu)   (page1_offset_adjust(0x100c8, smmu))
> > +#define ARM_SMMU_PRIQ_CONS(smmu)   (page1_offset_adjust(0x100cc, smmu))
> >  #define ARM_SMMU_PRIQ_IRQ_CFG0 0xd0
> >  #define ARM_SMMU_PRIQ_IRQ_CFG1 0xd8
> >  #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
> > @@ -412,6 +412,9 @@
> >  #define MSI_IOVA_BASE  0

Re: [PATCH v3 0/7] Cavium ThunderX2 SMMUv3 errata workarounds

2017-05-08 Thread Linu Cherian
On Sat May 06, 2017 at 12:22:50AM +0200, Robert Richter wrote:
> On 05.05.17 17:38:04, Geetha sowjanya wrote:
> > From: Linu Cherian <linu.cher...@cavium.com>
> > 
> > Cavium ThunderX2 SMMUv3 implementation has two Silicon Erratas.
> > 1. Errata ID #74
> >SMMU register alias Page 1 is not implemented
> > 2. Errata ID #126
> >SMMU doesnt support unique IRQ lines and also MSI for gerror, 
> >eventq and cmdq-sync
> > 
> > The following patchset does software workaround for these two erratas.
> > 
> > This series is based on patchset. 
> > https://www.spinics.net/lists/arm-kernel/msg578443.html
> > 
> > Changes from v1:
> >  Since the use of MIDR register is rejected and SMMU_IIDR is broken on this 
> >  silicon, as suggested by Will Deacon modified the patches to use ThunderX2 
> >  SMMUv3 IORT model number to enable errata workaround.
> > 
> > Changes from v2:
> >  Updated "Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt" document 
> > with 
> >  new SMMU option used to enable errata workaround.
> >  
> > Geetha Sowjanya (1):
> >   iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
> > 
> > Linu Cherian (6):
> >   iommu/arm-smmu-v3: Introduce smmu option PAGE0_REGS_ONLY for ThunderX2
> > errata#74.
> >   iommu/arm-smmu-v3: Do resource size checks based on SMMU option
> > PAGE0_REGS_ONLY
> >   ACPICA: IORT: Add Cavium ThunderX2 SMMUv3 model definition.
> >   iommu/arm-smmu-v3: For ACPI based device probing, set PAGE0_REGS_ONLY
> > option for ThunderX2 SMMUv3 implementations.
> >   ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3
> > model
> >   arm64: Documentation: Add Cavium ThunderX2 SMMUv3 erratas
> 
> This split into patches does not look reasonable to me. 1 patch only
> for each workaround should be sufficient.
> 

* Should we not atleast keep the changes in drivers/acpi/iort.c and
 include/acpi/actbl2.h seperate, since they are outside smmuv3 driver ? 

* Probably i can merge the below patches, 
  1. iommu/arm-smmu-v3: Introduce smmu option PAGE0_REGS_ONLY for ThunderX2 
errata#74.
  2. iommu/arm-smmu-v3: Do resource size checks based on SMMU option
 PAGE0_REGS_O

Is that fine ?

Thanks.
-- 
Linu cherian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/7] iommu/arm-smmu-v3: Do resource size checks based on SMMU

2017-05-08 Thread Linu Cherian

On Mon May 08, 2017 at 12:09:32PM +0200, Robert Richter wrote:
> On 08.05.17 15:14:37, Linu Cherian wrote:
> > On Sat May 06, 2017 at 12:18:44AM +0200, Robert Richter wrote:
> > > On 05.05.17 17:38:06, Geetha sowjanya wrote:
> > > > From: Linu Cherian <linu.cher...@cavium.com>
> > > > 
> > > > With implementations supporting only page 0 register space,
> > > > resource size can be 64k as well and hence perform size checks
> > > > based on SMMU option PAGE0_REGS_ONLY.
> > > > 
> > > > For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
> > > > platform_get_resource call, so that SMMU options are set beforehand.
> > > > 
> > > > Signed-off-by:  Linu Cherian <linu.cher...@cavium.com>
> > > > Signed-off-by:  Geetha Sowjanya <geethasowjanya.ak...@cavium.com>
> > > > ---
> > > >  drivers/iommu/arm-smmu-v3.c | 26 +-
> > > >  1 file changed, 17 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > > > index 107b4a6..f027676 100644
> > > > --- a/drivers/iommu/arm-smmu-v3.c
> > > > +++ b/drivers/iommu/arm-smmu-v3.c
> > > > @@ -2672,6 +2672,14 @@ static int arm_smmu_device_dt_probe(struct 
> > > > platform_device *pdev,
> > > > return ret;
> > > >  }
> > > >  
> > > > +static unsigned long arm_smmu_resource_size(struct arm_smmu_device 
> > > > *smmu)
> > > > +{
> > > > +   if (ARM_SMMU_PAGE0_REGS_ONLY(smmu))
> > > > +   return SZ_64K;
> > > > +   else
> > > > +   return SZ_128K;
> > > > +}
> > > > +
> > > 
> > > I think this can be dropped. See below.
> > > 
> > > >  static int arm_smmu_device_probe(struct platform_device *pdev)
> > > >  {
> > > > int irq, ret;
> > > > @@ -2688,9 +2696,17 @@ static int arm_smmu_device_probe(struct 
> > > > platform_device *pdev)
> > > > }
> > > > smmu->dev = dev;
> > > >  
> > > > +   if (dev->of_node) {
> > > > +   ret = arm_smmu_device_dt_probe(pdev, smmu);
> > > > +   } else {
> > > > +   ret = arm_smmu_device_acpi_probe(pdev, smmu);
> > > > +   if (ret == -ENODEV)
> > > > +   return ret;
> > > > +   }
> > > > +
> > > > /* Base address */
> > > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > > > -   if (resource_size(res) + 1 < SZ_128K) {
> > > > +   if (resource_size(res) + 1 < arm_smmu_resource_size(smmu)) {
> > > > dev_err(dev, "MMIO region too small (%pr)\n", res);
> > > > return -EINVAL;
> > > > }
> > > 
> > > Why not just do the follwoing here:
> > > 
> > >   /* Base address */
> > >   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > >   if (resource_size(res) + 1 < arm_smmu_resource_size(smmu)) {
> > >   dev_err(dev, "MMIO region too small (%pr)\n", res);
> > >   return -EINVAL;
> > >   }
> > >   ioaddr = res->start;
> > > 
> > > + /*
> > > +  * Override the size, for Cavium ThunderX2 implementation
> > > +  * which doesn't support the page 1 SMMU register space.
> > > +  */
> > > + if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY)
> > > + res->end = res->size + SZ_64K -1;
> > > +
> > >   smmu->base = devm_ioremap_resource(dev, res);
> > >   if (IS_ERR(smmu->base))
> > >   return PTR_ERR(smmu->base);
> > 
> > 
> > This might not work, since platform_device_add is being called from
> > iort.c before the res->end gets fixed up here. 
> 
> It should. You added it with 128k and you get it back with
> platform_get_resource(), but before ioremap you shrink the size to
> 64k.
> 

The smmu devices are located at 64k offsets and not at 128k
offsets and hence this would be result in resource conflict during
platform_add_device ?

Code snippet from platform_add_device:

for (i = 0; i < pdev->num_resources; i++) {
struct resource *p, *r = >resource[i];

if (r->name == NULL)
r->name = dev_name(>dev);

p = r->parent;
if (!p) {
if (resource_type(r) == IORESOURCE_MEM)
p = _resource;
else if (resource_type(r) == IORESOURCE_IO)
p = _resource;
}

if (p && insert_resource(p, r)) {
dev_err(>dev, "failed to claim resource %d: 
%pR\n", i, r);
ret = -EBUSY;
goto failed;
}
}







-- 
Linu cherian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/7] iommu/arm-smmu-v3: Do resource size checks based on SMMU

2017-05-08 Thread Linu Cherian
On Sat May 06, 2017 at 12:18:44AM +0200, Robert Richter wrote:
> On 05.05.17 17:38:06, Geetha sowjanya wrote:
> > From: Linu Cherian <linu.cher...@cavium.com>
> > 
> > With implementations supporting only page 0 register space,
> > resource size can be 64k as well and hence perform size checks
> > based on SMMU option PAGE0_REGS_ONLY.
> > 
> > For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
> > platform_get_resource call, so that SMMU options are set beforehand.
> > 
> > Signed-off-by:  Linu Cherian <linu.cher...@cavium.com>
> > Signed-off-by:  Geetha Sowjanya <geethasowjanya.ak...@cavium.com>
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 26 +-
> >  1 file changed, 17 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 107b4a6..f027676 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -2672,6 +2672,14 @@ static int arm_smmu_device_dt_probe(struct 
> > platform_device *pdev,
> > return ret;
> >  }
> >  
> > +static unsigned long arm_smmu_resource_size(struct arm_smmu_device *smmu)
> > +{
> > +   if (ARM_SMMU_PAGE0_REGS_ONLY(smmu))
> > +   return SZ_64K;
> > +   else
> > +   return SZ_128K;
> > +}
> > +
> 
> I think this can be dropped. See below.
> 
> >  static int arm_smmu_device_probe(struct platform_device *pdev)
> >  {
> > int irq, ret;
> > @@ -2688,9 +2696,17 @@ static int arm_smmu_device_probe(struct 
> > platform_device *pdev)
> > }
> > smmu->dev = dev;
> >  
> > +   if (dev->of_node) {
> > +   ret = arm_smmu_device_dt_probe(pdev, smmu);
> > +   } else {
> > +   ret = arm_smmu_device_acpi_probe(pdev, smmu);
> > +   if (ret == -ENODEV)
> > +   return ret;
> > +   }
> > +
> > /* Base address */
> > res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > -   if (resource_size(res) + 1 < SZ_128K) {
> > +   if (resource_size(res) + 1 < arm_smmu_resource_size(smmu)) {
> > dev_err(dev, "MMIO region too small (%pr)\n", res);
> > return -EINVAL;
> > }
> 
> Why not just do the follwoing here:
> 
>   /* Base address */
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   if (resource_size(res) + 1 < arm_smmu_resource_size(smmu)) {
>   dev_err(dev, "MMIO region too small (%pr)\n", res);
>   return -EINVAL;
>   }
>   ioaddr = res->start;
> 
> + /*
> +  * Override the size, for Cavium ThunderX2 implementation
> +  * which doesn't support the page 1 SMMU register space.
> +  */
> + if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY)
> + res->end = res->size + SZ_64K -1;
> +
>   smmu->base = devm_ioremap_resource(dev, res);
>   if (IS_ERR(smmu->base))
>   return PTR_ERR(smmu->base);


This might not work, since platform_device_add is being called from
iort.c before the res->end gets fixed up here. 


-- 
Linu cherian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/7] iommu/arm-smmu-v3: Introduce SMMU option PAGE0_REGS_ONLY for ThunderX2 errata #74

2017-05-08 Thread Linu Cherian
On Sat May 06, 2017 at 01:03:28AM +0200, Robert Richter wrote:
> On 05.05.17 17:38:05, Geetha sowjanya wrote:
> > From: Linu Cherian <linu.cher...@cavium.com>
> > 
> > Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
> > and PAGE0_REGS_ONLY option will be enabled as an errata workaround.
> > 
> > This option when turned on, replaces all page 1 offsets used for
> > EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.
> > 
> > Signed-off-by: Linu Cherian <linu.cher...@cavium.com>
> > Signed-off-by: Geetha Sowjanya <geethasowjanya.ak...@cavium.com>
> > ---
> >  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |  6 +++
> >  drivers/iommu/arm-smmu-v3.c| 44 
> > --
> >  2 files changed, 38 insertions(+), 12 deletions(-)
> 
> > @@ -1995,8 +2011,10 @@ static int arm_smmu_init_queues(struct 
> > arm_smmu_device *smmu)
> > if (!(smmu->features & ARM_SMMU_FEAT_PRI))
> > return 0;
> >  
> > -   return arm_smmu_init_one_queue(smmu, >priq.q, ARM_SMMU_PRIQ_PROD,
> > -  ARM_SMMU_PRIQ_CONS, PRIQ_ENT_DWORDS);
> > +   return arm_smmu_init_one_queue(smmu, >priq.q,
> > +  ARM_SMMU_PRIQ_PROD(smmu),
> > +  ARM_SMMU_PRIQ_CONS(smmu),
> > +  PRIQ_ENT_DWORDS);
> 
> I would also suggest Robin's idea from the v1 review here. This works
> if we rework arm_smmu_init_one_queue() to pass addresses instead of
> offsets.
> 
> This would make these widespread offset calculations obsolete.
>

Have pasted here the relevant changes for doing fixups on smmu base instead
of offset to get feedback. 

This actually results in more lines of changes. If you think the below
approach is still better, will post a V4 of this series with this change.


+static inline unsigned long arm_smmu_page1_base(
+   struct arm_smmu_device *smmu)
+{
+   if (ARM_SMMU_PAGE0_REGS_ONLY(smmu))
+   return smmu->base;
+   else
+   return smmu->base + SZ_64K;
+}
+

@@ -1948,8 +1962,8 @@ static void arm_smmu_put_resv_regions(struct device *dev,
 /* Probing and initialisation functions */
 static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
   struct arm_smmu_queue *q,
-  unsigned long prod_off,
-  unsigned long cons_off,
+  unsigned long prod_addr,
+  unsigned long cons_addr,
   size_t dwords)
 {
size_t qsz = ((1 << q->max_n_shift) * dwords) << 3;
@@ -1961,8 +1975,8 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device 
*smmu,
return -ENOMEM;
}
 
-   q->prod_reg = smmu->base + prod_off;
-   q->cons_reg = smmu->base + cons_off;
+   q->prod_reg = prod_addr;
+   q->cons_reg = cons_addr;
q->ent_dwords   = dwords;
 
q->q_base  = Q_BASE_RWA;
@@ -1977,17 +1991,25 @@ static int arm_smmu_init_one_queue(struct 
arm_smmu_device *smmu,
 static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 {
int ret;
+   unsigned long page1_base, page0_base;
+
+   page0_base = smmu->base;
+   page1_base = arm_smmu_page1_base(smmu);
 
/* cmdq */
spin_lock_init(>cmdq.lock);
-   ret = arm_smmu_init_one_queue(smmu, >cmdq.q, ARM_SMMU_CMDQ_PROD,
- ARM_SMMU_CMDQ_CONS, CMDQ_ENT_DWORDS);
+   ret = arm_smmu_init_one_queue(smmu, >cmdq.q, 
+ page0_base + ARM_SMMU_CMDQ_PROD,
+ page0_base + ARM_SMMU_CMDQ_CONS, 
+ CMDQ_ENT_DWORDS);
if (ret)
return ret;
 
/* evtq */
-   ret = arm_smmu_init_one_queue(smmu, >evtq.q, ARM_SMMU_EVTQ_PROD,
- ARM_SMMU_EVTQ_CONS, EVTQ_ENT_DWORDS);
+   ret = arm_smmu_init_one_queue(smmu, >evtq.q,
+ page1_base + ARM_SMMU_EVTQ_PROD,
+ page1_base + ARM_SMMU_EVTQ_CONS,
+ EVTQ_ENT_DWORDS);
if (ret)
return ret;
 
@@ -1995,8 +2017,10 @@ static int arm_smmu_init_queues(struct arm_smmu_device 
*smmu)
if (!(smmu->features & ARM_SMMU_FEAT_PRI))
return 0;
 
-   return arm_smmu_init_one_queue(smmu, >priq.q, ARM_SMMU_PRIQ_PROD,
-  ARM_SMMU_PRIQ_CONS, PRIQ_ENT_DWORDS);
+   return arm_smmu_init_one_que

Re: [RFC PATCH 5/7] iommu/arm-smmu-v3: For ACPI based device probing, set relevant options for different SMMUv3 implementations.

2017-04-12 Thread Linu Cherian
>> + switch (model) {
>> + case ACPI_IORT_SMMU_V3:
>> + case ACPI_IORT_SMMU_CORELINK_MMU600:
>> + break;
>> + case ACPI_IORT_SMMU_V3_HISILICON:
>> + smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH;
>> + break;
>> + case ACPI_IORT_SMMU_V3_CAVIUM_CN99XX:
>> + smmu->options |= (ARM_SMMU_OPT_PAGE0_REGS_ONLY |
>> +   ARM_SMMU_OPT_USE_SHARED_IRQS);
>> + break;
>> + default:
>> + ret = -ENODEV;
>> + }
>> +
>> + return ret;
>> +}
>> +
>>  static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
>> struct arm_smmu_device *smmu)
>>  {
>>   struct acpi_iort_smmu_v3 *iort_smmu;
>>   struct device *dev = smmu->dev;
>>   struct acpi_iort_node *node;
>> + int ret;
>>
>>   node = *(struct acpi_iort_node **)dev_get_platdata(dev);
>>
>>   /* Retrieve SMMUv3 specific data */
>>   iort_smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>>
>> + ret = acpi_smmu_get_options(iort_smmu->model, smmu);
>> + if (ret < 0)
>> + return ret;
>> +
>
> could we add at least:
>
> dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options);
>
> We need a note to see which options have been enabled.
>

Yeah. Will add that.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/7] iommu/arm-smmu-v3: Introduce smmu option PAGE0_REGS_ONLY for Silicon errata.

2017-04-11 Thread Linu Cherian
On Tue, Apr 11, 2017 at 9:12 PM, Robin Murphy <robin.mur...@arm.com> wrote:
> On 11/04/17 15:42, linucher...@gmail.com wrote:
>> From: Linu Cherian <linu.cher...@cavium.com>
>>
>> Cavium 99xx SMMU implementation doesn't support page 1 register space
>> and PAGE0_REGS_ONLY option will be enabled as an errata workaround.
>
> Ugh :(
>
>> This option when turned on, replaces all page 1 offsets used for
>> EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.
>
> I think it might be neater to have something like:
>
> arm_smmu_page1(smmu) {
> if (smmu->quirk)
> return smmu->base;
> return smmu->base + 64k;
> }
>
> and use it as the base in the appropriate places, rather than override
> the individual registers. Much like ARM_SMMU_GR0_NS in the SMMUv2 driver.
>

IIUC, we need to change the offsets as well for this,
Like,
 #define ARM_SMMU_EVTQ_BASE 0xa0
-#define ARM_SMMU_EVTQ_PROD 0x100a8
-#define ARM_SMMU_EVTQ_CONS 0x100ac
+#define ARM_SMMU_EVTQ_PROD 0xa8
+#define ARM_SMMU_EVTQ_CONS 0xac
 #define ARM_SMMU_EVTQ_IRQ_CFG0 0xb0
 #define ARM_SMMU_EVTQ_IRQ_CFG1 0xb8
 #define ARM_SMMU_EVTQ_IRQ_CFG2 0xbc

 #define ARM_SMMU_PRIQ_BASE 0xc0
-#define ARM_SMMU_PRIQ_PROD 0x100c8
-#define ARM_SMMU_PRIQ_CONS 0x100cc
+#define ARM_SMMU_PRIQ_PROD 0xc8
+#define ARM_SMMU_PRIQ_CONS 0xcc


But, it appears difficult to take this approach, at least with
arm_smmu_init_one_queue function.
This function takes both page0 register offset and and page1 register
offset as an argument.
So, we might need to do additional checks in this function to decide,
whether to use
a pag0 base or page1 base.


Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu