Re: [RFC 2/2] drm/msm/gem: Make use of the system cache

2021-11-16 Thread Arnd Bergmann
On Wed, Nov 17, 2021 at 12:16 AM Georgi Djakov
 wrote:
>
> Instead of writing to WC cmdstream buffers that go all the way to the main
> memory, let's use the system cache to improve the performance.
>
> Signed-off-by: Georgi Djakov 
> ---
>  drivers/gpu/drm/msm/msm_gem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index 104fdfc14027..921a1c24721e 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -214,7 +214,7 @@ void msm_gem_put_pages(struct drm_gem_object *obj)
>  static pgprot_t msm_gem_pgprot(struct msm_gem_object *msm_obj, pgprot_t prot)
>  {
> if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
> -   return pgprot_writecombine(prot);
> +   return pgprot_syscached(prot);
> return prot;
>  }

Based on the definition in patch 1, doesn't this mean that 32-bit
kernels degrade
from writecombined to uncached, making them a lot slower?

My feeling about this series is that there should be a clearer
definition of what
exactly happens on systems with and without system cache.

 Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dma-direct fixes and cleanups v3

2021-11-16 Thread Christoph Hellwig
On Tue, Nov 16, 2021 at 11:31:49AM +, Robin Murphy wrote:
> On 2021-11-11 06:50, Christoph Hellwig wrote:
>> Hi all,
>>
>> Linus complained about the complex flow in dma_direct_alloc, so this
>> tries to simplify it a bit, and while I was at it I also made sure that
>> unencrypted pages never leak back into the page allocator.
>
> Before I forget, I've had a quick skim of the remaining patches and nothing 
> more stands out. Let me know if you'd like me to find time to check 
> everything over in detail again for a proper review, but otherwise I reckon 
> we may as well get this baking in -next sooner rather than later.

I'd rather wait for a proper review, I don't think we are in a rush
with any of these patches.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces

2021-11-16 Thread Lu Baolu

Hi Jason,

On 11/16/21 9:46 PM, Jason Gunthorpe wrote:

On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:

Hi Christoph,

On 11/15/21 9:14 PM, Christoph Hellwig wrote:

On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:

+enum iommu_dma_owner {
+   DMA_OWNER_NONE,
+   DMA_OWNER_KERNEL,
+   DMA_OWNER_USER,
+};
+



+   enum iommu_dma_owner dma_owner;
+   refcount_t owner_cnt;
+   struct file *owner_user_file;


I'd just overload the ownership into owner_user_file,

   NULL -> no owner
   (struct file *)1UL)  -> kernel
   real pointer -> user

Which could simplify a lot of the code dealing with the owner.



Yeah! Sounds reasonable. I will make this in the next version.


It would be good to figure out how to make iommu_attach_device()
enforce no other driver binding as a kernel user without a file *, as
Robin pointed to, before optimizing this.

This fixes an existing bug where iommu_attach_device() only checks the
group size and is vunerable to a hot plug increasing the group size
after it returns. That check should be replaced by this series's logic
instead.


As my my understanding, the essence of this problem is that only the
user owner of the iommu_group could attach an UNMANAGED domain to it.
If I understand it right, how about introducing a new interface to
allocate a user managed domain and storing the user file pointer in it.

--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,7 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+   struct file *owner_user_file;
 };
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1902,6 +1902,18 @@ struct iommu_domain *iommu_domain_alloc(struct 
bus_type *bus)

 }
 EXPORT_SYMBOL_GPL(iommu_domain_alloc);

+struct iommu_domain *iommu_domain_alloc_user(struct bus_type *bus,
+struct file *filep)
+{
+   struct iommu_domain *domain;
+
+   domain = __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
+   if (domain)
+   domain->owner_user_file = filep;
+
+   return domain;
+}

When attaching a domain to an user-owned iommu_group, both group and
domain should have matched user fd.

Does above help here?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-16 Thread Zi Yan via iommu
On 16 Nov 2021, at 3:58, David Hildenbrand wrote:

> On 15.11.21 20:37, Zi Yan wrote:
>> From: Zi Yan 
>>
>> Hi David,
>
> Hi,
>
> thanks for looking into this.
>
>>
>> You suggested to make alloc_contig_range() deal with pageblock_order instead 
>> of
>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>> patchset is my attempt to achieve that. Please take a look and let me know if
>> I am doing it correctly or not.
>>
>> From what my understanding, cma required alignment of
>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>> __free_one_page() does not prevent merging two different pageblocks, when
>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>> does prevent that. It should be OK to just align cma to pageblock_order.
>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>> pageblock_order as alignment too.
>
> I wonder if that's sufficient. Especially the outer_start logic in
> alloc_contig_range() might be problematic. There are some ugly corner
> cases with free pages/allocations spanning multiple pageblocks and we
> only isolated a single pageblock.

Thank you a lot for writing the list of these corner cases. They are
very helpful!

>
>
> Regarding CMA, we have to keep the following cases working:
>
> a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER
>- 1 page:
>[   MAX_ ORDER - 1 ]
>[ pageblock 0 | pageblock 1]
>
> Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA,
> but not both. We have to make sure alloc_contig_range() keeps working
> correctly. This should be the case even with your change, as we won't
> merging pages accross differing migratetypes.

Yes.

>
> b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>[   MAX_ ORDER - 1 ]
>[ pageblock 0 | pageblock 1]
>
> Assume both are MIGRATE_CMA. Assume we want to either allocate from
> pageblock 0 or pageblock 1. Especially, assume we want to allocate from
> pageblock 1. While we would isolate pageblock 1, we wouldn't isolate
> pageblock 0.
>
> What happens if we either have a free page spanning the MAX_ORDER - 1
> range already OR if we have to migrate a MAX_ORDER - 1 page, resulting
> in a free MAX_ORDER - 1 page of which only the second pageblock is
> isolated? We would end up essentially freeing a page that has mixed
> pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I
> might be wrong but I have the feeling that this would be problematic.
>

This could happen when start_isolate_page_range() stumbles upon a compound
page with order >= pageblock_order or a free page with order >=
pageblock_order, but should not. start_isolate_page_range() should check
the actual page size, either compound page size or free page size, and set
the migratetype across pageblocks if the page is bigger than pageblock size.
More precisely set_migratetype_isolate() should do that.


> c) Concurrent allocations:
> [   MAX_ ORDER - 1 ]
> [ pageblock 0 | pageblock 1]
>
> Assume b) but we have two concurrent CMA allocations to pageblock 0 and
> pageblock 1, which would now be possible as start_isolate_page_range()
> isolate would succeed on both.

Two isolations will be serialized by the zone lock taken by
set_migratetype_isolate(), so the concurrent allocation would not be a problem.
If it is a MAX_ORDER-1 free page, the first comer should split it and only
isolate one of the pageblock then second one can isolate the other pageblock.
If it is a MAX_ORDER-1 compound page, the first comer should isolate both
pageblocks, then the second one would fail. WDYT?


In sum, it seems to me that the issue is page isolation code only sees
pageblock without check the actual page. When there are multiple pageblocks
belonging to one page, the problem appears. This should be fixed.

>
>
> Regarding virtio-mem, we care about the following cases:
>
> a) Allocating parts from completely movable MAX_ ORDER - 1 page:
>[   MAX_ ORDER - 1 ]
>[ pageblock 0 | pageblock 1]
>
> Assume pageblock 0 and pageblock 1 are either free or contain only
> movable pages. Assume we allocated pageblock 0. We have to make sure we
> can allocate pageblock 1. The other way around, assume we allocated
> pageblock 1, we have to make sure we can allocate pageblock 0.
>
> Free pages spanning both pageblocks might be problematic.

Can you elaborate a bit? If either of pageblock 0 and 1 is used by
virtio-mem, why do we care the other? If pageblock 0 and 1 belong to
the same page (either free or compound), they should have the same
migratetype. If we want to just allocate one of them, we can split
the free page or migrate the compound page then split the remaining
free page.

>
> b) Allocate parts of partially movable MAX_ ORDER - 1 page:
>[   MAX_ ORDER - 1 ]
>[ pageblock 0 | pageblock 1]
>
> Assume pageblock 0 contains unmovable data but 

[RFC 1/2] arm64: Add support for system cache memory type

2021-11-16 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Non-coherent devices on systems that support a system or
last level cache may want to request that allocations be
cached in the system cache. For memory that is allocated
by the kernel, and used for DMA with devices, the memory
attributes used for CPU access should match the memory
attributes that will be used for device access.

The memory attributes that need to be programmed into
the MAIR for system cache usage are:

0xf4 - Normal memory, outer write back read/write allocate,
inner non-cacheable.

There is currently no support for this memory attribute for
CPU mappings, so add it.

Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
 arch/arm64/include/asm/memory.h  | 1 +
 arch/arm64/include/asm/pgtable.h | 9 +
 arch/arm64/include/asm/sysreg.h  | 1 +
 arch/arm64/mm/proc.S | 3 ++-
 include/linux/dma-map-ops.h  | 8 
 5 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0af70d9abede..22553aab67a4 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -134,6 +134,7 @@
 #define MT_NORMAL_NC   2
 #define MT_DEVICE_nGnRnE   3
 #define MT_DEVICE_nGnRE4
+#define MT_NORMAL_iNC_oWB  5
 
 /*
  * Memory types for Stage-2 translation
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c4ba047a82d2..681c294c364e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -524,6 +524,15 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
__pgprot_modify(prot, PTE_ATTRINDX_MASK, \
PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN)
 
+/*
+ * Mark the prot value as outer cacheable and inner non-cacheable. Non-coherent
+ * devices on a system with support for a system or last level cache use these
+ * attributes to cache allocations in the system cache.
+ */
+#define pgprot_syscached(prot) \
+   __pgprot_modify(prot, PTE_ATTRINDX_MASK, \
+   PTE_ATTRINDX(MT_NORMAL_iNC_oWB) | PTE_PXN | PTE_UXN)
+
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 struct file;
 extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 16b3f1a1d468..7c50b1840532 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -715,6 +715,7 @@
 #define MAIR_ATTR_NORMAL_TAGGEDUL(0xf0)
 #define MAIR_ATTR_NORMAL   UL(0xff)
 #define MAIR_ATTR_MASK UL(0xff)
+#define MAIR_ATTR_NORMAL_iNC_oWB   UL(0xf4)
 
 /* Position the attr at the correct index */
 #define MAIR_ATTRIDX(attr, idx)((attr) << ((idx) * 8))
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index d35c90d2e47a..8a75973e5148 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -64,7 +64,8 @@
 MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |\
 MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |  \
 MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |\
-MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
+MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED) | \
+MAIR_ATTRIDX(MAIR_ATTR_NORMAL_iNC_oWB, MT_NORMAL_iNC_oWB))
 
 #ifdef CONFIG_CPU_PM
 /**
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..1f7d75201577 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -277,6 +277,14 @@ void arch_dma_free(struct device *dev, size_t size, void 
*cpu_addr,
 #define pgprot_dmacoherent(prot)   pgprot_noncached(prot)
 #endif
 
+/*
+ * If there is no system cache pgprot, then fallback to dmacoherent
+ * pgprot, as the expectation is that the device is not coherent.
+ */
+#ifndef pgprot_syscached
+#define pgprot_syscached(prot) pgprot_dmacoherent(prot)
+#endif
+
 pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs);
 #else
 static inline pgprot_t dma_pgprot(struct device *dev, pgprot_t prot,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 2/2] drm/msm/gem: Make use of the system cache

2021-11-16 Thread Georgi Djakov
Instead of writing to WC cmdstream buffers that go all the way to the main
memory, let's use the system cache to improve the performance.

Signed-off-by: Georgi Djakov 
---
 drivers/gpu/drm/msm/msm_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 104fdfc14027..921a1c24721e 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -214,7 +214,7 @@ void msm_gem_put_pages(struct drm_gem_object *obj)
 static pgprot_t msm_gem_pgprot(struct msm_gem_object *msm_obj, pgprot_t prot)
 {
if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-   return pgprot_writecombine(prot);
+   return pgprot_syscached(prot);
return prot;
 }
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2021-11-16 Thread Jason Gunthorpe via iommu
On Tue, Nov 16, 2021 at 02:22:01PM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 16, 2021 at 03:24:29PM +0800, Lu Baolu wrote:
> > On 2021/11/16 4:44, Bjorn Helgaas wrote:
> > > On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> > > > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> > > > then all of the downstream devices will be part of the same IOMMU group
> > > > as the bridge.
> > > 
> > > I think this means something like: "If a PCIe Switch Downstream Port
> > > lacks , all downstream devices
> > > will be part of the same IOMMU group as the switch," right?
> > 
> > For this patch, yes.
> > 
> > > If so, can you fill in the details to make it specific and concrete?
> > 
> > The existing vfio implementation allows a kernel driver to bind with a
> > PCI bridge while its downstream devices are assigned to the user space
> > though there lacks ACS-like isolation in bridge.
> > 
> > drivers/vfio/vfio.c:
> >  540 static bool vfio_dev_driver_allowed(struct device *dev,
> >  541 struct device_driver *drv)
> >  542 {
> >  543 if (dev_is_pci(dev)) {
> >  544 struct pci_dev *pdev = to_pci_dev(dev);
> >  545
> >  546 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> >  547 return true;
> >  548 }
> > 
> > We are moving the group viability check to IOMMU core, and trying to
> > make it compatible with the current vfio policy. We saw three types of
> > bridges:
> > 
> > #1) PCIe/PCI-to-PCI bridges
> > These bridges are configured in the PCI framework, there's no
> > dedicated driver for such devices.
> > 
> > #2) Generic PCIe switch downstream port
> > The port driver doesn't map and access any MMIO in the PCI BAR.
> > The iommu group is viable to user even this driver is bound.
> > 
> > #3) Hot Plug Controller
> > The controller driver maps and access the device MMIO. The iommu
> > group is not viable to user with this driver bound to its device.
> 
> I *guess* the question here is whether the bridge can or will do DMA?
> I think that's orthogonal to the question of whether it implements
> BARs, so I'm not sure why the MMIO BARs are part of this discussion.
> I assume it's theoretically possible for a driver to use registers in
> config space to program a device to do DMA, even if the device has no
> BARs.

There are two questions Lu is trying to get at:

 1) Does the bridge driver use DMA? Calling pci_set_master() or
a dma_map_* API is a sure indicate the driver is doing DMA

Kernel DMA doesn't work if the PCI device is attached to
non-default iommu domain.

 2) If the bridge driver uses MMIO, is it tolerant to hostile
userspace also touching the same MMIO registers via P2P DMA
attacks?

Conservatively if the driver maps a MMIO region at all
we can say it fails this test.

Unless someone want to do the audit work, identifying MMIO usage alone
is sufficient to disqualify a driver.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2021-11-16 Thread Bjorn Helgaas
On Tue, Nov 16, 2021 at 03:24:29PM +0800, Lu Baolu wrote:
> On 2021/11/16 4:44, Bjorn Helgaas wrote:
> > On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> > > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> > > then all of the downstream devices will be part of the same IOMMU group
> > > as the bridge.
> > 
> > I think this means something like: "If a PCIe Switch Downstream Port
> > lacks , all downstream devices
> > will be part of the same IOMMU group as the switch," right?
> 
> For this patch, yes.
> 
> > If so, can you fill in the details to make it specific and concrete?
> 
> The existing vfio implementation allows a kernel driver to bind with a
> PCI bridge while its downstream devices are assigned to the user space
> though there lacks ACS-like isolation in bridge.
> 
> drivers/vfio/vfio.c:
>  540 static bool vfio_dev_driver_allowed(struct device *dev,
>  541 struct device_driver *drv)
>  542 {
>  543 if (dev_is_pci(dev)) {
>  544 struct pci_dev *pdev = to_pci_dev(dev);
>  545
>  546 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>  547 return true;
>  548 }
> 
> We are moving the group viability check to IOMMU core, and trying to
> make it compatible with the current vfio policy. We saw three types of
> bridges:
> 
> #1) PCIe/PCI-to-PCI bridges
> These bridges are configured in the PCI framework, there's no
> dedicated driver for such devices.
> 
> #2) Generic PCIe switch downstream port
> The port driver doesn't map and access any MMIO in the PCI BAR.
> The iommu group is viable to user even this driver is bound.
> 
> #3) Hot Plug Controller
> The controller driver maps and access the device MMIO. The iommu
> group is not viable to user with this driver bound to its device.

I *guess* the question here is whether the bridge can or will do DMA?

I think that's orthogonal to the question of whether it implements
BARs, so I'm not sure why the MMIO BARs are part of this discussion.
I assume it's theoretically possible for a driver to use registers in
config space to program a device to do DMA, even if the device has no
BARs.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] perf/smmuv3: Support devicetree

2021-11-16 Thread Jean-Philippe Brucker
On Tue, Nov 16, 2021 at 05:00:14PM +, Robin Murphy wrote:
> On 2021-11-16 15:42, Jean-Philippe Brucker wrote:
> > On Tue, Nov 16, 2021 at 12:02:47PM +, Robin Murphy wrote:
> > > On 2021-11-16 11:35, Jean-Philippe Brucker wrote:
> > > > Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
> > > > Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
> > > > multiple independent PMCGs, for example one for the Translation Control
> > > > Unit (TCU) and one per Translation Buffer Unit (TBU).
> > > > 
> > > > I previously sent the binding as reply to Jay Chen's thread implementing
> > > > device tree support [1]. This posting addresses the comments from that
> > > > thread.
> > > 
> > > Ha, I'd also resurrected this and was planning to post it at some point 
> > > this
> > > week[0] - you should have said :)
> > 
> > Ah sorry about that, I just resent because there was some demand for it at
> > Linaro
> 
> Heh, no worries - it's not like you were even CC'ed on the thread where I
> only mentioned I *might* do it.
> 
> Can I get away with being cheeky and just saying that my review comments are
> the diff between my branch and yours, I wonder...

Sure, that works for me, I'll send a v2 this week or so

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] perf/smmuv3: Support devicetree

2021-11-16 Thread Robin Murphy

On 2021-11-16 15:42, Jean-Philippe Brucker wrote:

On Tue, Nov 16, 2021 at 12:02:47PM +, Robin Murphy wrote:

On 2021-11-16 11:35, Jean-Philippe Brucker wrote:

Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
multiple independent PMCGs, for example one for the Translation Control
Unit (TCU) and one per Translation Buffer Unit (TBU).

I previously sent the binding as reply to Jay Chen's thread implementing
device tree support [1]. This posting addresses the comments from that
thread.


Ha, I'd also resurrected this and was planning to post it at some point this
week[0] - you should have said :)


Ah sorry about that, I just resent because there was some demand for it at
Linaro


Heh, no worries - it's not like you were even CC'ed on the thread where 
I only mentioned I *might* do it.


Can I get away with being cheeky and just saying that my review comments 
are the diff between my branch and yours, I wonder...



Patch 1 adds two compatible strings. "arm,smmu-v3-pmcg" is common to all
PMCGs. "hisilicon,smmu-v3-pmcg-hip08" allows to support the same quirk
as IORT for that implementation (see patch 2). We'll probably want to
also introduce compatible strings for each implementation that has
additional perf events. For example the MMU-600 implementation has
different events for TCU and TBU PMCGs [2], but both components have the
same device IDs. So the driver could differentiate them if they had two
distinct compatible strings such as "arm,mmu-600-pmcg-tbu" and
"arm,mmu-600-pmcg-tcu".


Actually it only needs a general MMU-600 compatible, since once you know
it's an Arm Ltd. implementation, you can assume the pattern for the IMP_DEF
ID registers to figure out the rest.


It might be an error in the MMU-600 spec specifically, both TBU and TCU
PMU registers have a 0x83 PIDR0, where I think the TBU should be 0x84 (the
revC model uses that value). It's possible that the implementation
actually has 0x84 instead.


Yup, it's a mistake in the TRM. I just checked a real MMU-600 and the 
PMU PIDRs match the main TCU/TBU PIDRs as expected. At least the MMU-700 
docs haven't repeated the same error.


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding

2021-11-16 Thread Jean-Philippe Brucker
On Tue, Nov 16, 2021 at 08:02:53AM -0600, Rob Herring wrote:
> My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> on your patch (DT_CHECKER_FLAGS is new in v5.13):
> 
> yamllint warnings/errors:
> ./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:24:9: 
> [warning] wrong indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:25:11: 
> [warning] wrong indentation: expected 12 but found 10 (indentation)
> 
> dtschema/dtc warnings/errors:
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b42:reg:0: [0, 725745664, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b42:reg:1: [0, 725811200, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b44:reg:0: [0, 725876736, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b44:reg:1: [0, 725942272, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> 
> doc reference errors (make refcheckdocs):
> 
> See https://patchwork.ozlabs.org/patch/1555758
> 
> This check can fail if there are any dependencies. The base for a patch
> series is generally the most recent rc1.
> 
> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure 'yamllint' is installed and dt-schema is up to
> date:
> 
> pip3 install dtschema --upgrade
> 
> Please check and re-submit.

Right I'll fix those, I had only run dtbs_check

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] perf/smmuv3: Add devicetree support

2021-11-16 Thread Jean-Philippe Brucker
On Tue, Nov 16, 2021 at 12:06:36PM +, John Garry wrote:
> On 16/11/2021 11:35, Jean-Philippe Brucker wrote:
> > Add device-tree support to the SMMUv3 PMCG.  One small cosmetic change
> > while factoring the option mask printout: don't display it when zero, it
> > only contains one erratum at the moment.
> > 
> > Signed-off-by: Jay Chen 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >   drivers/perf/arm_smmuv3_pmu.c | 25 +++--
> >   1 file changed, 23 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> > index 226348822ab3..958325ac103a 100644
> > --- a/drivers/perf/arm_smmuv3_pmu.c
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -47,6 +47,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -750,8 +751,15 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
> > *smmu_pmu)
> > smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
> > break;
> > }
> > +}
> > +
> > +static void smmu_pmu_get_of_options(struct smmu_pmu *smmu_pmu)
> > +{
> > +   struct device_node *node = smmu_pmu->dev->of_node;
> > -   dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
> > +   if (of_device_is_compatible(node, "hisilicon,smmu-v3-pmcg-hip08"))
> 
> I don't think that this is necessary. We don't support DT for hip08, nor
> have any plans to. Incidentally, was this binding missing in your series?

Ok I'll drop this (and the compatible value from patch 1)

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] perf/smmuv3: Support devicetree

2021-11-16 Thread Jean-Philippe Brucker
On Tue, Nov 16, 2021 at 12:02:47PM +, Robin Murphy wrote:
> On 2021-11-16 11:35, Jean-Philippe Brucker wrote:
> > Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
> > Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
> > multiple independent PMCGs, for example one for the Translation Control
> > Unit (TCU) and one per Translation Buffer Unit (TBU).
> > 
> > I previously sent the binding as reply to Jay Chen's thread implementing
> > device tree support [1]. This posting addresses the comments from that
> > thread.
> 
> Ha, I'd also resurrected this and was planning to post it at some point this
> week[0] - you should have said :)

Ah sorry about that, I just resent because there was some demand for it at
Linaro

> > Patch 1 adds two compatible strings. "arm,smmu-v3-pmcg" is common to all
> > PMCGs. "hisilicon,smmu-v3-pmcg-hip08" allows to support the same quirk
> > as IORT for that implementation (see patch 2). We'll probably want to
> > also introduce compatible strings for each implementation that has
> > additional perf events. For example the MMU-600 implementation has
> > different events for TCU and TBU PMCGs [2], but both components have the
> > same device IDs. So the driver could differentiate them if they had two
> > distinct compatible strings such as "arm,mmu-600-pmcg-tbu" and
> > "arm,mmu-600-pmcg-tcu".
> 
> Actually it only needs a general MMU-600 compatible, since once you know
> it's an Arm Ltd. implementation, you can assume the pattern for the IMP_DEF
> ID registers to figure out the rest.

It might be an error in the MMU-600 spec specifically, both TBU and TCU
PMU registers have a 0x83 PIDR0, where I think the TBU should be 0x84 (the
revC model uses that value). It's possible that the implementation
actually has 0x84 instead.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-11-16 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
-   hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
-   hvpgoff;
+   hvpfn = HVPFN_DOWN(sg_dma_address(sg));
+   hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
+sg_dma_len(sg)) - hvpfn;
 
/*
 * Fill the next portion of the PFN array with
@@ -1872,7 +1876,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * the PFN array is filled.
 */
while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   payload->range.pfn_array[i++] = hvpfn++;
}
}
 
@@ -2016,6 

[PATCH 4/5] net: netvsc: Add Isolation VM support for netvsc driver

2021-11-16 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation
VM. After calling vmbus_establish_gpadl() which marks these pages visible
to host, map these pages unencrypted addes space via dma_vmap_noncontiguous().

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   6 +
 4 files changed, 190 insertions(+), 15 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..31c77a00d01e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   struct sg_table *recv_sgt;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   struct sg_table *send_sgt;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..9cdc71930830 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void)
return net_device;
 }
 
+static struct hv_device *netvsc_channel_to_device(struct vmbus_channel 
*channel)
+{
+   struct vmbus_channel *primary = channel->primary_channel;
+
+   return primary ? primary->device_obj : channel->device_obj;
+}
+
 static void free_netvsc_device(struct rcu_head *head)
 {
struct netvsc_device *nvdev
= container_of(head, struct netvsc_device, rcu);
+   struct hv_device *dev =
+   netvsc_channel_to_device(nvdev->chan_table[0].channel);
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->recv_buf);
+   dma_free_noncontiguous(>device, nvdev->recv_buf_size,
+  nvdev->recv_sgt, DMA_FROM_DEVICE);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->send_buf);
+   dma_free_noncontiguous(>device, nvdev->send_buf_size,
+  nvdev->send_sgt, DMA_TO_DEVICE);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device,
buf_size = min_t(unsigned int, buf_size,
 NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
 
-   net_device->recv_buf = vzalloc(buf_size);
+   if (hv_isolation_type_snp()) {
+   net_device->recv_sgt =
+   dma_alloc_noncontiguous(>device, buf_size,
+   DMA_FROM_DEVICE, GFP_KERNEL, 0);
+   if (!net_device->recv_sgt) {
+   pr_err("Fail to allocate recv buffer buf_size %d.\n.", 
buf_size);
+   ret = -ENOMEM;
+   goto cleanup;
+   }
+
+   net_device->recv_buf = (void 
*)net_device->recv_sgt->sgl->dma_address;
+   } else {
+   net_device->recv_buf = vzalloc(buf_size);
+   }
+
if (!net_device->recv_buf) {
netdev_err(ndev,
   "unable to allocate receive buffer of size %u\n",
@@ -357,8 +396,6 @@ static int netvsc_init_buf(struct hv_device *device,
  

[PATCH 2/5] dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops

2021-11-16 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V netvsc driver needs to allocate noncontiguous DMA memory and
remap it into unencrypted address space before sharing with host. Add
vmap/vunmap_noncontiguous() callback and handle the remap in the Hyper-V
dma ops callback.

Signed-off-by: Tianyu Lan 
---
 include/linux/dma-map-ops.h |  3 +++
 kernel/dma/mapping.c| 18 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..f7b9958ca20a 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -27,6 +27,9 @@ struct dma_map_ops {
unsigned long attrs);
void (*free_noncontiguous)(struct device *dev, size_t size,
struct sg_table *sgt, enum dma_data_direction dir);
+   void *(*vmap_noncontiguous)(struct device *dev, size_t size,
+   struct sg_table *sgt);
+   void (*vunmap_noncontiguous)(struct device *dev, void *addr);
int (*mmap)(struct device *, struct vm_area_struct *,
void *, dma_addr_t, size_t, unsigned long attrs);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9478eccd1c8e..7fd751d866cc 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -674,8 +674,14 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t 
size,
const struct dma_map_ops *ops = get_dma_ops(dev);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 
-   if (ops && ops->alloc_noncontiguous)
-   return vmap(sgt_handle(sgt)->pages, count, VM_MAP, PAGE_KERNEL);
+   if (ops) {
+   if (ops->vmap_noncontiguous)
+   return ops->vmap_noncontiguous(dev, size, sgt);
+   else if (ops->alloc_noncontiguous)
+   return vmap(sgt_handle(sgt)->pages, count, VM_MAP,
+   PAGE_KERNEL);
+   }
+
return page_address(sg_page(sgt->sgl));
 }
 EXPORT_SYMBOL_GPL(dma_vmap_noncontiguous);
@@ -684,8 +690,12 @@ void dma_vunmap_noncontiguous(struct device *dev, void 
*vaddr)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
 
-   if (ops && ops->alloc_noncontiguous)
-   vunmap(vaddr);
+   if (ops) {
+   if (ops->vunmap_noncontiguous)
+   ops->vunmap_noncontiguous(dev, vaddr);
+   else if (ops->alloc_noncontiguous)
+   vunmap(vaddr);
+   }
 }
 EXPORT_SYMBOL_GPL(dma_vunmap_noncontiguous);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-11-16 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  6 
 kernel/dma/swiotlb.c| 75 -
 2 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..09a140d617fa 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and 
store
+ * virtual address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..4735c5e0f44d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr) {
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   return NULL;
+   }
+
+   return vaddr;
+   }
+
+   return phys_to_virt(mem->start);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,10 +200,17 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr) {
+   pr_err("Fail to remap swiotlb mem.\n");
+   return;
+   }
+
+   memset(mem->vaddr, 0, bytes);
 }
 
-static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
@@ -196,13 +231,28 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
}
+
+   /*
+* With swiotlb_unencrypted_base setting, swiotlb bounce buffer will
+* be remapped in the swiotlb_update_mem_attributes() and return 

[PATCH 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-16 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
in the hyperv_iommu_swiotlb_later_init().

Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous
callback to handle request of  allocating and mapping noncontiguous dma
memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_
bypass flag for hv device to use dma direct functions during mapping/unmapping
dma page.

Signed-off-by: Tianyu Lan 
---
 arch/x86/mm/mem_encrypt.c  |   4 +-
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/vmbus_drv.c |   6 ++
 drivers/iommu/hyperv-iommu.c   | 164 +
 include/linux/hyperv.h |  10 ++
 6 files changed, 186 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 35487305d8af..65bc385ae07a 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
@@ -203,7 +204,8 @@ void __init sev_setup_arch(void)
phys_addr_t total_mem = memblock_phys_mem_size();
unsigned long size;
 
-   if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+   if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)
+   && !hv_is_isolation_supported())
return;
 
/*
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 46df59aeaa06..30fd0600b008 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index dd12af20e467..d43b4cd88f57 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -9,6 +9,7 @@ config HYPERV
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
select VMAP_PFN
+   select DMA_OPS_BYPASS
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..32dc193e31cd 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_ops_bypass = true;
+   child_device_obj->device.dma_ops = _iommu_dma_ops;
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..ebcb628e7e8f 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,21 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include 

[PATCH 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-11-16 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM. Add Hyper-V dma ops and provide dma_alloc/free_
noncontiguous and vmap/vunmap_noncontiguous callback. Allocate
rx/tx ring via dma_alloc_noncontiguous() and map them into extra
address space via dma_vmap_noncontiguous().

Tianyu Lan (5):
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops
  hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  net: netvsc: Add Isolation VM support for netvsc driver
  scsi: storvsc: Add Isolation VM support for storvsc driver

 arch/x86/mm/mem_encrypt.c |   4 +-
 arch/x86/xen/pci-swiotlb-xen.c|   3 +-
 drivers/hv/Kconfig|   1 +
 drivers/hv/vmbus_drv.c|   6 +
 drivers/iommu/hyperv-iommu.c  | 164 +
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 +++---
 include/linux/dma-map-ops.h   |   3 +
 include/linux/hyperv.h|  17 +++
 include/linux/swiotlb.h   |   6 +
 kernel/dma/mapping.c  |  18 ++-
 kernel/dma/swiotlb.c  |  75 ++--
 14 files changed, 488 insertions(+), 45 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] iommu: Some IOVA code reorganisation

2021-11-16 Thread Robin Murphy

On 2021-11-16 14:21, John Garry wrote:

On 04/10/2021 12:44, Will Deacon wrote:

On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote:

The IOVA domain structure is a bit overloaded, holding:
- IOVA tree management
- FQ control
- IOVA rcache memories

Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c
uses the FQ feature.

This series separates out that structure. In addition, it moves the FQ
code into dma-iommu.c . This is not strictly necessary, but it does make
it easier for the FQ domain lookup the rcache domain.

The rcache code stays where it is, as it may be reworked in future, so
there is not much point in relocating and then discarding.

This topic was initially discussed and suggested (I think) by Robin 
here:
https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/ 


It would be useful to have Robin's Ack on patches 2-4. The implementation
looks straightforward to me, but the thread above isn't very clear about
what is being suggested.


Hi Robin,

Just wondering if you had made any progress on your FQ code rework or 
your own re-org?


Hey John - as it happens I started hacking on that in earnest about half 
an hour ago, aiming to get something out later this week.


Cheers,
Robin.

I wasn't planning on progressing 
https://lore.kernel.org/linux-iommu/1626259003-201303-1-git-send-email-john.ga...@huawei.com/ 
until this is done first (and that is still a big issue), even though 
not strictly necessary.


Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] iommu: Some IOVA code reorganisation

2021-11-16 Thread John Garry

On 04/10/2021 12:44, Will Deacon wrote:

On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote:

The IOVA domain structure is a bit overloaded, holding:
- IOVA tree management
- FQ control
- IOVA rcache memories

Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c
uses the FQ feature.

This series separates out that structure. In addition, it moves the FQ
code into dma-iommu.c . This is not strictly necessary, but it does make
it easier for the FQ domain lookup the rcache domain.

The rcache code stays where it is, as it may be reworked in future, so
there is not much point in relocating and then discarding.

This topic was initially discussed and suggested (I think) by Robin here:
https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/

It would be useful to have Robin's Ack on patches 2-4. The implementation
looks straightforward to me, but the thread above isn't very clear about
what is being suggested.


Hi Robin,

Just wondering if you had made any progress on your FQ code rework or 
your own re-org?


I wasn't planning on progressing 
https://lore.kernel.org/linux-iommu/1626259003-201303-1-git-send-email-john.ga...@huawei.com/ 
until this is done first (and that is still a big issue), even though 
not strictly necessary.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding

2021-11-16 Thread Rob Herring
On Tue, 16 Nov 2021 11:35:36 +, Jean-Philippe Brucker wrote:
> Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
> placed as a sibling node of the SMMU. Although the PMCGs registers may
> be within the SMMU MMIO region, they are separate devices, and there can
> be multiple PMCG devices for each SMMU (for example one for the TCU and
> one for each TBU).
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  .../bindings/iommu/arm,smmu-v3-pmcg.yaml  | 67 +++
>  1 file changed, 67 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:
./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:24:9: [warning] 
wrong indentation: expected 10 but found 8 (indentation)
./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:25:11: 
[warning] wrong indentation: expected 12 but found 10 (indentation)

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b42:reg:0: [0, 725745664, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b42:reg:1: [0, 725811200, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b44:reg:0: [0, 725876736, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b44:reg:1: [0, 725942272, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1555758

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces

2021-11-16 Thread Jason Gunthorpe via iommu
On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> Hi Christoph,
> 
> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > +enum iommu_dma_owner {
> > > + DMA_OWNER_NONE,
> > > + DMA_OWNER_KERNEL,
> > > + DMA_OWNER_USER,
> > > +};
> > > +
> > 
> > > + enum iommu_dma_owner dma_owner;
> > > + refcount_t owner_cnt;
> > > + struct file *owner_user_file;
> > 
> > I'd just overload the ownership into owner_user_file,
> > 
> >   NULL  -> no owner
> >   (struct file *)1UL)   -> kernel
> >   real pointer  -> user
> > 
> > Which could simplify a lot of the code dealing with the owner.
> > 
> 
> Yeah! Sounds reasonable. I will make this in the next version.

It would be good to figure out how to make iommu_attach_device()
enforce no other driver binding as a kernel user without a file *, as
Robin pointed to, before optimizing this.

This fixes an existing bug where iommu_attach_device() only checks the
group size and is vunerable to a hot plug increasing the group size
after it returns. That check should be replaced by this series's logic
instead.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] perf/smmuv3: Add devicetree support

2021-11-16 Thread John Garry

On 16/11/2021 11:35, Jean-Philippe Brucker wrote:

Add device-tree support to the SMMUv3 PMCG.  One small cosmetic change
while factoring the option mask printout: don't display it when zero, it
only contains one erratum at the moment.

Signed-off-by: Jay Chen 
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/perf/arm_smmuv3_pmu.c | 25 +++--
  1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 226348822ab3..958325ac103a 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -47,6 +47,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -750,8 +751,15 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
*smmu_pmu)
smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
break;
}
+}
+
+static void smmu_pmu_get_of_options(struct smmu_pmu *smmu_pmu)
+{
+   struct device_node *node = smmu_pmu->dev->of_node;
  
-	dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);

+   if (of_device_is_compatible(node, "hisilicon,smmu-v3-pmcg-hip08"))


I don't think that this is necessary. We don't support DT for hip08, nor 
have any plans to. Incidentally, was this binding missing in your series?


Thanks,
John


+   /* HiSilicon Erratum 162001800 */
+   smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
  }
  
  static int smmu_pmu_probe(struct platform_device *pdev)

@@ -834,7 +842,13 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return -EINVAL;
}
  
-	smmu_pmu_get_acpi_options(smmu_pmu);

+   if (dev->of_node)
+   smmu_pmu_get_of_options(smmu_pmu);
+   else
+   smmu_pmu_get_acpi_options(smmu_pmu);
+
+   if (smmu_pmu->options)
+   dev_notice(dev, "option mask 0x%x\n", smmu_pmu->options);
  
  	/* Pick one CPU to be the preferred one to use */

smmu_pmu->on_cpu = raw_smp_processor_id();
@@ -884,10 +898,17 @@ static void smmu_pmu_shutdown(struct platform_device 
*pdev)
smmu_pmu_disable(_pmu->pmu);
  }
  
+static const struct of_device_id arm_smmu_pmu_match[] = {

+   { .compatible = "arm,smmu-v3-pmcg" },
+   {},
+};
+MODULE_DEVICE_TABLE(of, arm_smmu_pmu_match);
+
  static struct platform_driver smmu_pmu_driver = {
.driver = {
.name = "arm-smmu-v3-pmcg",
.suppress_bind_attrs = true,
+   .of_match_table = of_match_ptr(arm_smmu_pmu_match),
},
.probe = smmu_pmu_probe,
.remove = smmu_pmu_remove,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] perf/smmuv3: Support devicetree

2021-11-16 Thread Robin Murphy

On 2021-11-16 11:35, Jean-Philippe Brucker wrote:

Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
multiple independent PMCGs, for example one for the Translation Control
Unit (TCU) and one per Translation Buffer Unit (TBU).

I previously sent the binding as reply to Jay Chen's thread implementing
device tree support [1]. This posting addresses the comments from that
thread.


Ha, I'd also resurrected this and was planning to post it at some point 
this week[0] - you should have said :)



Patch 1 adds two compatible strings. "arm,smmu-v3-pmcg" is common to all
PMCGs. "hisilicon,smmu-v3-pmcg-hip08" allows to support the same quirk
as IORT for that implementation (see patch 2). We'll probably want to
also introduce compatible strings for each implementation that has
additional perf events. For example the MMU-600 implementation has
different events for TCU and TBU PMCGs [2], but both components have the
same device IDs. So the driver could differentiate them if they had two
distinct compatible strings such as "arm,mmu-600-pmcg-tbu" and
"arm,mmu-600-pmcg-tcu".


Actually it only needs a general MMU-600 compatible, since once you know 
it's an Arm Ltd. implementation, you can assume the pattern for the 
IMP_DEF ID registers to figure out the rest.


Robin.

[0] https://gitlab.arm.com/linux-arm/linux-rm/-/commits/smmu-pmcg


The series doesn't deal with this because for testing I use a software
model which only implements architected events. I do not include DTS
change for that platform because enabling PMCGs requires an additional
model option. See my branch smmu/pmu-dt [3] for details.

[1] https://lore.kernel.org/all/20200707150114.GC159413@myrica/
[2] 
https://developer.arm.com/documentation/100310/0202/Functional-description/Operation/Performance-Monitoring-Unit
[3] https://jpbrucker.net/git/linux/log/?h=smmu/pmu-dt

Jean-Philippe Brucker (2):
   dt-bindings: Add Arm SMMUv3 PMCG binding
   perf/smmuv3: Add devicetree support

  .../bindings/iommu/arm,smmu-v3-pmcg.yaml  | 67 +++
  drivers/perf/arm_smmuv3_pmu.c | 25 ++-
  2 files changed, 90 insertions(+), 2 deletions(-)
  create mode 100644 
Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding

2021-11-16 Thread Jean-Philippe Brucker
Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
placed as a sibling node of the SMMU. Although the PMCGs registers may
be within the SMMU MMIO region, they are separate devices, and there can
be multiple PMCG devices for each SMMU (for example one for the TCU and
one for each TBU).

Signed-off-by: Jean-Philippe Brucker 
---
 .../bindings/iommu/arm,smmu-v3-pmcg.yaml  | 67 +++
 1 file changed, 67 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
new file mode 100644
index ..a893e071fdb4
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/arm,smmu-v3-pmcg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm SMMUv3 Performance Monitor Counter Group
+
+maintainers:
+  - Will Deacon 
+  - Robin Murphy 
+
+description: |+
+  An SMMUv3 may have several Performance Monitor Counter Group (PMCG).
+  They are standalone performance monitoring units that support both
+  architected and IMPLEMENTATION DEFINED event counters.
+
+properties:
+  $nodename:
+pattern: "^pmu@[0-9a-f]*"
+  compatible:
+oneOf:
+  - items:
+- enum:
+  - hisilicon,smmu-v3-pmcg-hip08
+- const: arm,smmu-v3-pmcg
+  - const: arm,smmu-v3-pmcg
+
+  reg:
+description: |
+  Base addresses of the PMCG registers. Either a single address for Page 0
+  or an additional address for Page 1, where some registers can be
+  relocated with SMMU_PMCG_CFGR.RELOC_CTRS.
+minItems: 1
+maxItems: 2
+
+  interrupts:
+maxItems: 1
+
+  msi-parent: true
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |+
+#include 
+#include 
+
+pmu@2b42 {
+compatible = "arm,smmu-v3-pmcg";
+reg = <0 0x2b42 0 0x1000>,
+  <0 0x2b43 0 0x1000>;
+interrupts = ;
+msi-parent = < 0xff>;
+};
+
+pmu@2b44 {
+compatible = "arm,smmu-v3-pmcg";
+reg = <0 0x2b44 0 0x1000>,
+  <0 0x2b45 0 0x1000>;
+interrupts = ;
+msi-parent = < 0xff>;
+};
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] perf/smmuv3: Support devicetree

2021-11-16 Thread Jean-Philippe Brucker
Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
multiple independent PMCGs, for example one for the Translation Control
Unit (TCU) and one per Translation Buffer Unit (TBU).

I previously sent the binding as reply to Jay Chen's thread implementing
device tree support [1]. This posting addresses the comments from that
thread.

Patch 1 adds two compatible strings. "arm,smmu-v3-pmcg" is common to all
PMCGs. "hisilicon,smmu-v3-pmcg-hip08" allows to support the same quirk
as IORT for that implementation (see patch 2). We'll probably want to
also introduce compatible strings for each implementation that has
additional perf events. For example the MMU-600 implementation has
different events for TCU and TBU PMCGs [2], but both components have the
same device IDs. So the driver could differentiate them if they had two
distinct compatible strings such as "arm,mmu-600-pmcg-tbu" and
"arm,mmu-600-pmcg-tcu".

The series doesn't deal with this because for testing I use a software
model which only implements architected events. I do not include DTS
change for that platform because enabling PMCGs requires an additional
model option. See my branch smmu/pmu-dt [3] for details.

[1] https://lore.kernel.org/all/20200707150114.GC159413@myrica/
[2] 
https://developer.arm.com/documentation/100310/0202/Functional-description/Operation/Performance-Monitoring-Unit
[3] https://jpbrucker.net/git/linux/log/?h=smmu/pmu-dt

Jean-Philippe Brucker (2):
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/smmuv3: Add devicetree support

 .../bindings/iommu/arm,smmu-v3-pmcg.yaml  | 67 +++
 drivers/perf/arm_smmuv3_pmu.c | 25 ++-
 2 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml

-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] perf/smmuv3: Add devicetree support

2021-11-16 Thread Jean-Philippe Brucker
Add device-tree support to the SMMUv3 PMCG.  One small cosmetic change
while factoring the option mask printout: don't display it when zero, it
only contains one erratum at the moment.

Signed-off-by: Jay Chen 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/perf/arm_smmuv3_pmu.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 226348822ab3..958325ac103a 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -750,8 +751,15 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
*smmu_pmu)
smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
break;
}
+}
+
+static void smmu_pmu_get_of_options(struct smmu_pmu *smmu_pmu)
+{
+   struct device_node *node = smmu_pmu->dev->of_node;
 
-   dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
+   if (of_device_is_compatible(node, "hisilicon,smmu-v3-pmcg-hip08"))
+   /* HiSilicon Erratum 162001800 */
+   smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
 }
 
 static int smmu_pmu_probe(struct platform_device *pdev)
@@ -834,7 +842,13 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   smmu_pmu_get_acpi_options(smmu_pmu);
+   if (dev->of_node)
+   smmu_pmu_get_of_options(smmu_pmu);
+   else
+   smmu_pmu_get_acpi_options(smmu_pmu);
+
+   if (smmu_pmu->options)
+   dev_notice(dev, "option mask 0x%x\n", smmu_pmu->options);
 
/* Pick one CPU to be the preferred one to use */
smmu_pmu->on_cpu = raw_smp_processor_id();
@@ -884,10 +898,17 @@ static void smmu_pmu_shutdown(struct platform_device 
*pdev)
smmu_pmu_disable(_pmu->pmu);
 }
 
+static const struct of_device_id arm_smmu_pmu_match[] = {
+   { .compatible = "arm,smmu-v3-pmcg" },
+   {},
+};
+MODULE_DEVICE_TABLE(of, arm_smmu_pmu_match);
+
 static struct platform_driver smmu_pmu_driver = {
.driver = {
.name = "arm-smmu-v3-pmcg",
.suppress_bind_attrs = true,
+   .of_match_table = of_match_ptr(arm_smmu_pmu_match),
},
.probe = smmu_pmu_probe,
.remove = smmu_pmu_remove,
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/6] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2021-11-16 Thread Yicong Yang via iommu
On 2021/11/16 18:56, Robin Murphy wrote:
> On 2021-11-16 09:06, Yicong Yang via iommu wrote:
> [...]
>> +/*
>> + * Get RMR address if provided by the firmware.
>> + * Return 0 if the IOMMU doesn't present or the policy of the
>> + * IOMMU domain is passthrough or we get a usable RMR region.
>> + * Otherwise a negative value is returned.
>> + */
>> +static int hisi_ptt_get_rmr(struct hisi_ptt *hisi_ptt)
>> +{
>> +    struct pci_dev *pdev = hisi_ptt->pdev;
>> +    struct iommu_domain *iommu_domain;
>> +    struct iommu_resv_region *region;
>> +    LIST_HEAD(list);
>> +
>> +    /*
>> + * Use direct DMA if IOMMU does not present or the policy of the
>> + * IOMMU domain is passthrough.
>> + */
>> +    iommu_domain = iommu_get_domain_for_dev(>dev);
>> +    if (!iommu_domain || iommu_domain->type == IOMMU_DOMAIN_IDENTITY)
>> +    return 0;
>> +
>> +    iommu_get_resv_regions(>dev, );
>> +    list_for_each_entry(region, , list)
>> +    if (region->type == IOMMU_RESV_DIRECT &&
>> +    region->length >= HISI_PTT_TRACE_BUFFER_SIZE) {
>> +    hisi_ptt->trace_ctrl.has_rmr = true;
>> +    hisi_ptt->trace_ctrl.rmr_addr = region->start;
>> +    hisi_ptt->trace_ctrl.rmr_length = region->length;
>> +    break;
>> +    }
>> +
>> +    iommu_put_resv_regions(>dev, );
>> +    return hisi_ptt->trace_ctrl.has_rmr ? 0 : -ENOMEM;
>> +}
> 
> No.
> 
> The whole point of RMRs is for devices that are already configured to access 
> the given address range in a manner beyond the kernel's control. If you can 
> do this, it proves that you should not have an RMR in the first place.
> 
> The notion of a kernel driver explicitly configuring its device to DMA into 
> any random RMR that looks big enough is so egregiously wrong that I'm almost 
> lost for words...
> 

our bios will reserve such a region and reported it through iort. the device 
will write to the region and in the driver we need to access the region
to get the traced data. the region is reserved exclusively and will not be 
accessed by kernel or other devices.

is it ok to let bios configure the address to the device and from CPU side we 
just read it?

Thanks,
Yicong


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: dma-direct fixes and cleanups v3

2021-11-16 Thread Robin Murphy

On 2021-11-11 06:50, Christoph Hellwig wrote:

Hi all,

Linus complained about the complex flow in dma_direct_alloc, so this
tries to simplify it a bit, and while I was at it I also made sure that
unencrypted pages never leak back into the page allocator.


Before I forget, I've had a quick skim of the remaining patches and 
nothing more stands out. Let me know if you'd like me to find time to 
check everything over in detail again for a proper review, but otherwise 
I reckon we may as well get this baking in -next sooner rather than later.


Cheers,
Robin.


Changes since v2:
  - don't call dma_set_decrypted on remapped memory
  - move the leak printk into dma_set_encrypted
  - add another local variable to clean up dma_direct_alloc
  - return NULL when the is no way to make the memory coherent

Changes since v1:
  - fix a missing return
  - add a new patch to fix a pre-existing missing unmap
  - various additional cleanups
  
Diffstat:

  direct.c |  234 
+--
  1 file changed, 138 insertions(+), 96 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/6] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2021-11-16 Thread Robin Murphy

On 2021-11-16 09:06, Yicong Yang via iommu wrote:
[...]

+/*
+ * Get RMR address if provided by the firmware.
+ * Return 0 if the IOMMU doesn't present or the policy of the
+ * IOMMU domain is passthrough or we get a usable RMR region.
+ * Otherwise a negative value is returned.
+ */
+static int hisi_ptt_get_rmr(struct hisi_ptt *hisi_ptt)
+{
+   struct pci_dev *pdev = hisi_ptt->pdev;
+   struct iommu_domain *iommu_domain;
+   struct iommu_resv_region *region;
+   LIST_HEAD(list);
+
+   /*
+* Use direct DMA if IOMMU does not present or the policy of the
+* IOMMU domain is passthrough.
+*/
+   iommu_domain = iommu_get_domain_for_dev(>dev);
+   if (!iommu_domain || iommu_domain->type == IOMMU_DOMAIN_IDENTITY)
+   return 0;
+
+   iommu_get_resv_regions(>dev, );
+   list_for_each_entry(region, , list)
+   if (region->type == IOMMU_RESV_DIRECT &&
+   region->length >= HISI_PTT_TRACE_BUFFER_SIZE) {
+   hisi_ptt->trace_ctrl.has_rmr = true;
+   hisi_ptt->trace_ctrl.rmr_addr = region->start;
+   hisi_ptt->trace_ctrl.rmr_length = region->length;
+   break;
+   }
+
+   iommu_put_resv_regions(>dev, );
+   return hisi_ptt->trace_ctrl.has_rmr ? 0 : -ENOMEM;
+}


No.

The whole point of RMRs is for devices that are already configured to 
access the given address range in a manner beyond the kernel's control. 
If you can do this, it proves that you should not have an RMR in the 
first place.


The notion of a kernel driver explicitly configuring its device to DMA 
into any random RMR that looks big enough is so egregiously wrong that 
I'm almost lost for words...


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces

2021-11-16 Thread Lu Baolu

Hi Christoph,

On 2021/11/15 21:27, Christoph Hellwig wrote:

On Mon, Nov 15, 2021 at 10:05:47AM +0800, Lu Baolu wrote:

The vfio needs to set DMA_OWNER_USER for the entire group when attaching


The vfio subsystem?  driver?


"vfio subsystem"




it to a vfio container. So expose group variants of setting/releasing dma
ownership for this purpose.

This also exposes the helper iommu_group_dma_owner_unclaimed() for vfio
report to userspace if the group is viable to user assignment, for


.. for vfio to report .. ?


Yes.




  void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner 
owner);
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner 
owner,
+ struct file *user_file);
+void iommu_group_release_dma_owner(struct iommu_group *group, enum 
iommu_dma_owner owner);


Pleae avoid all these overly long lines.


Sure. Thanks!




+static inline int iommu_group_set_dma_owner(struct iommu_group *group,
+   enum iommu_dma_owner owner,
+   struct file *user_file)
+{
+   return -EINVAL;
+}
+
+static inline void iommu_group_release_dma_owner(struct iommu_group *group,
+enum iommu_dma_owner owner)
+{
+}
+
+static inline bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+   return false;
+}


Why do we need these stubs?  All potential callers should already
require CONFIG_IOMMU_API?  Same for the helpers added in patch 1, btw.


You are right. This helper is only for vfio which requires IOMMU_API. I
will remove this.

The helpers in patch 1 seem not the same. The driver core (or bus
dma_configure() callback as suggested) will also call them.




+   mutex_lock(>mutex);
+   ret = __iommu_group_set_dma_owner(group, owner, user_file);
+   mutex_unlock(>mutex);



+   mutex_lock(>mutex);
+   __iommu_group_release_dma_owner(group, owner);
+   mutex_unlock(>mutex);


Unless I'm missing something (just skipping over the patches),
the existing callers also take the lock just around these calls,
so we don't really need the __-prefixed lowlevel helpers.



Move mutex_lock/unlock will make the helper implementation easier. :-)
It seems to be common code style in iommu core. For example,
__iommu_attach_group(), __iommu_group_for_each_dev(), etc.


+   mutex_lock(>mutex);
+   owner = group->dma_owner;
+   mutex_unlock(>mutex);


No need for a lock to read a single scalar.


Adding the lock will make kcasn happy. Jason G also told me that

[citing from his review comment]
"
It is always incorrect to read concurrent data without an annotation
of some kind.

For instance it can cause mis-execution of logic where the compiler is
unaware that a value it loads is allowed to change - ie no 
READ_ONCE/WRITE_ONCE semantic.

"




+
+   return owner == DMA_OWNER_NONE;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_unclaimed);


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/6] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2021-11-16 Thread Yicong Yang via iommu
From: Qi Liu 

'perf record' and 'perf report --dump-raw-trace' supported in this
patch.

Example usage:

Output will contain raw PTT data and its textual representation, such
as:

0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40  offset: 0
ref: 0xa5d50c725  idx: 0  tid: -1  cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
.  : 00 00 00 00 Prefix
.  0004: 08 20 00 60 Header DW0
.  0008: ff 02 00 01 Header DW1
.  000c: 20 08 00 00 Header DW2
.  0010: 10 e7 44 ab Header DW3
.  0014: 2a a8 1e 01 Time
.  0020: 00 00 00 00 Prefix
.  0024: 01 00 00 60 Header DW0
.  0028: 0f 1e 00 01 Header DW1
.  002c: 04 00 00 00 Header DW2
.  0030: 40 00 81 02 Header DW3
.  0034: ee 02 00 00 Time


Signed-off-by: Qi Liu 
Signed-off-by: Yicong Yang 
---
 tools/perf/arch/arm/util/auxtrace.c   |  56 -
 tools/perf/arch/arm/util/pmu.c|   3 +
 tools/perf/arch/arm64/util/Build  |   2 +-
 tools/perf/arch/arm64/util/hisi_ptt.c | 195 +++
 tools/perf/util/Build |   2 +
 tools/perf/util/auxtrace.c|   4 +
 tools/perf/util/auxtrace.h|   1 +
 tools/perf/util/hisi-ptt-decoder/Build|   1 +
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c   | 170 +
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h   |  28 +++
 tools/perf/util/hisi_ptt.c| 228 ++
 tools/perf/util/hisi_ptt.h|  28 +++
 12 files changed, 714 insertions(+), 4 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/Build
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h
 create mode 100644 tools/perf/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi_ptt.h

diff --git a/tools/perf/arch/arm/util/auxtrace.c 
b/tools/perf/arch/arm/util/auxtrace.c
index 5fc6a2a3dbc5..dc9d2172464e 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -4,6 +4,7 @@
  * Author: Mathieu Poirier 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -14,6 +15,7 @@
 #include "../../../util/pmu.h"
 #include "cs-etm.h"
 #include "arm-spe.h"
+#include "hisi_ptt.h"
 
 static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
 {
@@ -50,6 +52,39 @@ static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, 
int *err)
return arm_spe_pmus;
 }
 
+static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err)
+{
+   struct perf_pmu **hisi_ptt_pmus = NULL;
+   int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+   DIR *dir = NULL;
+   struct dirent *dent;
+
+   hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * nr_cpus);
+   if (!hisi_ptt_pmus) {
+   pr_err("hisi_ptt alloc failed\n");
+   *err = -ENOMEM;
+   return NULL;
+   }
+
+   dir = opendir("/sys/devices");
+   dent = readdir(dir);
+   while (dent) {
+   if (strstr(dent->d_name, HISI_PTT_PMU_NAME)) {
+   hisi_ptt_pmus[*nr_ptts] = perf_pmu__find(dent->d_name);
+   if (hisi_ptt_pmus[*nr_ptts]) {
+   pr_debug2("%s %d: arm_spe_pmu %d type %d name 
%s\n",
+ __func__, __LINE__, *nr_ptts,
+ hisi_ptt_pmus[*nr_ptts]->type,
+ hisi_ptt_pmus[*nr_ptts]->name);
+   (*nr_ptts)++;
+   }
+   }
+   dent = readdir(dir);
+   }
+
+   return hisi_ptt_pmus;
+}
+
 struct auxtrace_record
 *auxtrace_record__init(struct evlist *evlist, int *err)
 {
@@ -57,8 +92,12 @@ struct auxtrace_record
struct evsel *evsel;
bool found_etm = false;
struct perf_pmu *found_spe = NULL;
+   struct perf_pmu *found_ptt = NULL;
struct perf_pmu **arm_spe_pmus = NULL;
+   struct perf_pmu **hisi_ptt_pmus = NULL;
+
int nr_spes = 0;
+   int nr_ptts = 0;
int i = 0;
 
if (!evlist)
@@ -66,13 +105,14 @@ struct auxtrace_record
 
cs_etm_pmu = perf_pmu__find(CORESIGHT_ETM_PMU_NAME);
arm_spe_pmus = find_all_arm_spe_pmus(_spes, err);
+   hisi_ptt_pmus = find_all_hisi_ptt_pmus(_ptts, err);
 
evlist__for_each_entry(evlist, evsel) {
if (cs_etm_pmu &&
evsel->core.attr.type == 

[PATCH v2 2/6] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2021-11-16 Thread Yicong Yang via iommu
HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex
integrated Endpoint(RCiEP) device, providing the capability
to dynamically monitor and tune the PCIe traffic(tune),
and trace the TLP headers(trace).

Add the driver for the device to enable the trace function. The driver
will create PMU device for each PTT device, and users can start trace
through perf command.

Signed-off-by: Yicong Yang 
---
 drivers/Makefile   |1 +
 drivers/hwtracing/Kconfig  |2 +
 drivers/hwtracing/hisilicon/Kconfig|8 +
 drivers/hwtracing/hisilicon/Makefile   |2 +
 drivers/hwtracing/hisilicon/hisi_ptt.c | 1146 
 5 files changed, 1159 insertions(+)
 create mode 100644 drivers/hwtracing/hisilicon/Kconfig
 create mode 100644 drivers/hwtracing/hisilicon/Makefile
 create mode 100644 drivers/hwtracing/hisilicon/hisi_ptt.c

diff --git a/drivers/Makefile b/drivers/Makefile
index be5d40ae1488..bb0dc9b55ea2 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -176,6 +176,7 @@ obj-$(CONFIG_USB4)  += thunderbolt/
 obj-$(CONFIG_CORESIGHT)+= hwtracing/coresight/
 obj-y  += hwtracing/intel_th/
 obj-$(CONFIG_STM)  += hwtracing/stm/
+obj-$(CONFIG_HISI_PTT) += hwtracing/hisilicon/
 obj-$(CONFIG_ANDROID)  += android/
 obj-$(CONFIG_NVMEM)+= nvmem/
 obj-$(CONFIG_FPGA) += fpga/
diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
index 13085835a636..e3796b17541a 100644
--- a/drivers/hwtracing/Kconfig
+++ b/drivers/hwtracing/Kconfig
@@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
 
 source "drivers/hwtracing/intel_th/Kconfig"
 
+source "drivers/hwtracing/hisilicon/Kconfig"
+
 endmenu
diff --git a/drivers/hwtracing/hisilicon/Kconfig 
b/drivers/hwtracing/hisilicon/Kconfig
new file mode 100644
index ..9c3ab80d99fe
--- /dev/null
+++ b/drivers/hwtracing/hisilicon/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config HISI_PTT
+   tristate "HiSilicon PCIe Tune and Trace Device"
+   depends on PCI && HAS_DMA && HAS_IOMEM
+   help
+ HiSilicon PCIe Tune and Trace Device exist as a PCIe RCiEP
+ device, provides support for PCIe traffic tuning and
+ tracing TLP headers to the memory.
diff --git a/drivers/hwtracing/hisilicon/Makefile 
b/drivers/hwtracing/hisilicon/Makefile
new file mode 100644
index ..908c09a98161
--- /dev/null
+++ b/drivers/hwtracing/hisilicon/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
diff --git a/drivers/hwtracing/hisilicon/hisi_ptt.c 
b/drivers/hwtracing/hisilicon/hisi_ptt.c
new file mode 100644
index ..e11e9b6cc2a8
--- /dev/null
+++ b/drivers/hwtracing/hisilicon/hisi_ptt.c
@@ -0,0 +1,1146 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for HiSilicon PCIe tune and trace device
+ *
+ * Copyright (c) 2021 HiSilicon Limited.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define HISI_PTT_TRACE_ADDR_SIZE   0x0800
+#define HISI_PTT_TRACE_ADDR_BASE_LO_0  0x0810
+#define HISI_PTT_TRACE_ADDR_BASE_HI_0  0x0814
+#define HISI_PTT_TRACE_ADDR_STRIDE 0x8
+#define HISI_PTT_TRACE_CTRL0x0850
+#define   HISI_PTT_TRACE_CTRL_EN   BIT(0)
+#define   HISI_PTT_TRACE_CTRL_RST  BIT(1)
+#define   HISI_PTT_TRACE_CTRL_RXTX_SEL GENMASK(3, 2)
+#define   HISI_PTT_TRACE_CTRL_TYPE_SEL GENMASK(7, 4)
+#define   HISI_PTT_TRACE_CTRL_DATA_FORMAT  BIT(14)
+#define   HISI_PTT_TRACE_CTRL_FILTER_MODE  BIT(15)
+#define   HISI_PTT_TRACE_CTRL_TARGET_SEL   GENMASK(31, 16)
+#define HISI_PTT_TRACE_INT_STAT0x0890
+#define   HISI_PTT_TRACE_INT_STAT_MASK GENMASK(3, 0)
+#define HISI_PTT_TRACE_INT_MASK0x0894
+#define HISI_PTT_TRACE_WR_STS  0x08a0
+#define   HISI_PTT_TRACE_WR_STS_WRITE  GENMASK(27, 0)
+#define   HISI_PTT_TRACE_WR_STS_BUFFER GENMASK(29, 28)
+#define HISI_PTT_TRACE_STS 0x08b0
+#define   HISI_PTT_TRACE_IDLE  BIT(0)
+#define HISI_PTT_DEVICE_RANGE  0x0fe0
+#define HISI_PTT_LOCATION  0x0fe8
+#define   HISI_PTT_CORE_ID GENMASK(15, 0)
+#define   HISI_PTT_SICL_ID GENMASK(31, 16)
+
+#define HISI_PTT_TRACE_DMA_IRQ 0
+#define HISI_PTT_TRACE_BUFLETS_CNT 4
+#define HISI_PTT_TRACE_BUFLET_SIZE SZ_4M
+#define HISI_PTT_TRACE_BUFFER_SIZE (HISI_PTT_TRACE_BUFLET_SIZE * \
+HISI_PTT_TRACE_BUFLETS_CNT)
+#define HISI_PTT_FILTER_UPDATE_FIFO_SIZE   16
+
+/* Delay time for filter updating work */
+#define HISI_PTT_WORK_DELAY_MS 100UL
+/* Wait time for DMA hardware to reset */
+#define HISI_PTT_RESET_WAIT_MS 1000UL
+/* Poll timeout and 

[PATCH v2 5/6] docs: Add HiSilicon PTT device driver documentation

2021-11-16 Thread Yicong Yang via iommu
Document the introduction and usage of HiSilicon PTT device driver.

Signed-off-by: Yicong Yang 
---
 Documentation/trace/hisi-ptt.rst | 305 +++
 1 file changed, 305 insertions(+)
 create mode 100644 Documentation/trace/hisi-ptt.rst

diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst
new file mode 100644
index ..a7ef6438e4aa
--- /dev/null
+++ b/Documentation/trace/hisi-ptt.rst
@@ -0,0 +1,305 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+HiSilicon PCIe Tune and Trace device
+==
+
+Introduction
+
+
+HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
+integrated Endpoint (RCiEP) device, providing the capability
+to dynamically monitor and tune the PCIe link's events (tune),
+and trace the TLP headers (trace). The two functions are independent,
+but is recommended to use them together to analyze and enhance the
+PCIe link's performance.
+
+On Kunpeng 930 SoC, the PCIe Root Complex is composed of several
+PCIe cores. Each PCIe core includes several Root Ports and a PTT
+RCiEP, like below. The PTT device is capable of tuning and
+tracing the link of the PCIe core.
+::
+  +--Core 0---+
+  |   |   [   PTT   ] |
+  |   |   [Root Port]---[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+Root Complex  |--Core 1---+
+  |   |   [   PTT   ] |
+  |   |   [Root Port]---[ Switch ]---[Endpoint]
+  |   |   [Root Port]---[Endpoint] `-[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+  +---+
+
+The PTT device driver registers PMU device for each PTT device.
+The name of each PTT device is composed of 'hisi_ptt' prefix with
+the id of the SICL and the Core where it locates. The Kunpeng 930
+SoC encapsulates multiple CPU dies (SCCL, Super CPU Cluster) and
+IO dies (SICL, Super I/O Cluster), where there's one PCIe Root
+Complex for each SICL.
+::
+/sys/devices/hisi_ptt_
+
+Tune
+
+
+PTT tune is designed for monitoring and adjusting PCIe link parameters 
(events).
+Currently we support events in 4 classes. The scope of the events
+covers the PCIe core to which the PTT device belongs.
+
+Each event is presented as a file under $(PTT PMU dir)/tune, and
+mostly a simple open/read/write/close cycle will be used to tune
+the event.
+::
+$ cd /sys/devices/hisi_ptt_/tune
+$ ls
+qos_tx_cplqos_tx_npqos_tx_p
+tx_path_rx_req_alloc_buf_level
+tx_path_tx_req_alloc_buf_level
+$ cat qos_tx_dp
+1
+$ echo 2 > qos_tx_dp
+$ cat qos_tx_dp
+2
+
+Current value (numerical value) of the event can be simply read
+from the file, and the desired value written to the file to tune.
+
+1. Tx path QoS control
+
+
+The following files are provided to tune the QoS of the tx path of
+the PCIe core.
+
+- qos_tx_cpl: weight of Tx completion TLPs
+- qos_tx_np: weight of Tx non-posted TLPs
+- qos_tx_p: weight of Tx posted TLPs
+
+The weight influences the proportion of certain packets on the PCIe link.
+For example, for the storage scenario, increase the proportion
+of the completion packets on the link to enhance the performance as
+more completions are consumed.
+
+The available tune data of these events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+2. Tx path buffer control
+-
+
+Following files are provided to tune the buffer of tx path of the PCIe core.
+
+- tx_path_rx_req_alloc_buf_level: watermark of Rx requested
+- tx_path_tx_req_alloc_buf_level: watermark of Tx requested
+
+These events influence the watermark of the buffer allocated for each
+type. Rx means the inbound while Tx means outbound. The packets will
+be stored in the buffer first and then posted either when the watermark
+reached or when timed out. For a busy direction, you should increase
+the related buffer watermark to avoid frequently posting and thus
+enhance the performance. In most cases just keep the default value.
+
+The available tune data of above events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+Trace
+=
+
+PTT trace is designed for dumping the TLP headers to the memory, which
+can be used to analyze the transactions and usage condition of the PCIe
+Link. You can choose to filter the traced headers by either requester ID,
+or those downstream of a set of Root Ports on the same core of the PTT
+device. It's also supported to trace the headers of certain type and of
+certain direction.
+
+You can 

[PATCH v2 3/6] hwtracing: Add tune function support for HiSilicon PCIe Tune and Trace device

2021-11-16 Thread Yicong Yang via iommu
Add tune function for the HiSilicon Tune and Trace device. The interface
of tune is exposed through sysfs attributes of PTT PMU device.

Signed-off-by: Yicong Yang 
---
 drivers/hwtracing/hisilicon/hisi_ptt.c | 167 +
 1 file changed, 167 insertions(+)

diff --git a/drivers/hwtracing/hisilicon/hisi_ptt.c 
b/drivers/hwtracing/hisilicon/hisi_ptt.c
index e11e9b6cc2a8..a1e1fb766376 100644
--- a/drivers/hwtracing/hisilicon/hisi_ptt.c
+++ b/drivers/hwtracing/hisilicon/hisi_ptt.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 
+#define HISI_PTT_TUNING_CTRL   0x
+#define   HISI_PTT_TUNING_CTRL_CODEGENMASK(15, 0)
+#define   HISI_PTT_TUNING_CTRL_SUB GENMASK(23, 16)
+#define HISI_PTT_TUNING_DATA   0x0004
+#define   HISI_PTT_TUNING_DATA_VAL_MASKGENMASK(15, 0)
 #define HISI_PTT_TRACE_ADDR_SIZE   0x0800
 #define HISI_PTT_TRACE_ADDR_BASE_LO_0  0x0810
 #define HISI_PTT_TRACE_ADDR_BASE_HI_0  0x0814
@@ -39,6 +44,8 @@
 #define HISI_PTT_TRACE_INT_STAT0x0890
 #define   HISI_PTT_TRACE_INT_STAT_MASK GENMASK(3, 0)
 #define HISI_PTT_TRACE_INT_MASK0x0894
+#define HISI_PTT_TUNING_INT_STAT   0x0898
+#define   HISI_PTT_TUNING_INT_STAT_MASKBIT(0)
 #define HISI_PTT_TRACE_WR_STS  0x08a0
 #define   HISI_PTT_TRACE_WR_STS_WRITE  GENMASK(27, 0)
 #define   HISI_PTT_TRACE_WR_STS_BUFFER GENMASK(29, 28)
@@ -71,6 +78,12 @@ enum hisi_ptt_trace_status {
HISI_PTT_TRACE_STATUS_ON,
 };
 
+struct hisi_ptt_tune_desc {
+   struct hisi_ptt *hisi_ptt;
+   const char *name;
+   u32 event_code;
+};
+
 struct hisi_ptt_dma_buflet {
struct list_head list;
unsigned int size;
@@ -177,6 +190,159 @@ static inline struct hisi_ptt *to_hisi_ptt(struct pmu 
*pmu)
return container_of(pmu, struct hisi_ptt, hisi_ptt_pmu);
 }
 
+static int hisi_ptt_wait_tuning_finish(struct hisi_ptt *hisi_ptt)
+{
+   u32 val;
+
+   return readl_poll_timeout(hisi_ptt->iobase + HISI_PTT_TUNING_INT_STAT,
+ val, !(val & HISI_PTT_TUNING_INT_STAT_MASK),
+ HISI_PTT_WAIT_POLL_INTERVAL_US,
+ HISI_PTT_WAIT_TIMEOUT_US);
+}
+
+static int hisi_ptt_tune_data_get(struct hisi_ptt *hisi_ptt,
+ u32 event, u16 *data)
+{
+   u32 reg;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+   reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB);
+   reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB,
+ event);
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+
+   /* Write all 1 to indicates it's the read process */
+   writel(~0UL, hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+
+   if (hisi_ptt_wait_tuning_finish(hisi_ptt))
+   return -ETIMEDOUT;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+   reg &= HISI_PTT_TUNING_DATA_VAL_MASK;
+   *data = (u16)reg;
+
+   return 0;
+}
+
+static int hisi_ptt_tune_data_set(struct hisi_ptt *hisi_ptt,
+ u32 event, u16 data)
+{
+   u32 reg;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+   reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB);
+   reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB,
+ event);
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+
+   reg = data;
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+
+   if (hisi_ptt_wait_tuning_finish(hisi_ptt))
+   return -ETIMEDOUT;
+
+   return 0;
+}
+
+static ssize_t hisi_ptt_tune_attr_show(struct device *dev,
+  struct device_attribute *attr,
+  char *buf)
+{
+   struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
+   struct dev_ext_attribute *ext_attr;
+   struct hisi_ptt_tune_desc *desc;
+   int ret;
+   u16 val;
+
+   ext_attr = container_of(attr, struct dev_ext_attribute, attr);
+   desc = ext_attr->var;
+
+   if (!mutex_trylock(_ptt->mutex))
+   return -EBUSY;
+
+   ret = hisi_ptt_tune_data_get(hisi_ptt, desc->event_code, );
+
+   mutex_unlock(_ptt->mutex);
+   return ret ? ret : sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t hisi_ptt_tune_attr_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
+   struct dev_ext_attribute *ext_attr;
+   struct hisi_ptt_tune_desc *desc;
+   int ret;
+   u16 val;
+
+   ext_attr = container_of(attr, struct dev_ext_attribute, attr);
+   desc = ext_attr->var;
+
+   if (kstrtou16(buf, 10, ))
+   return -EINVAL;
+
+   if 

[PATCH v2 6/6] MAINTAINERS: Add maintainer for HiSilicon PTT driver

2021-11-16 Thread Yicong Yang via iommu
Add maintainer for driver and documentation of HiSilicon PTT device.

Signed-off-by: Yicong Yang 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a2345ce8521..823d495ca0d5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8601,6 +8601,13 @@ W:   http://www.hisilicon.com
 F: Documentation/admin-guide/perf/hisi-pmu.rst
 F: drivers/perf/hisilicon
 
+HISILICON PTT DRIVER
+M: Yicong Yang 
+L: linux-ker...@vger.kernel.org
+S: Maintained
+F: Documentation/trace/hisi-ptt.rst
+F: drivers/hwtracing/hisilicon/
+
 HISILICON QM AND ZIP Controller DRIVER
 M: Zhou Wang 
 L: linux-cry...@vger.kernel.org
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/6] Add support for HiSilicon PCIe Tune and Trace device

2021-11-16 Thread Yicong Yang via iommu
HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
integrated Endpoint (RCiEP) device, providing the capability
to dynamically monitor and tune the PCIe traffic (tune),
and trace the TLP headers (trace).

PTT tune is designed for monitoring and adjusting PCIe link parameters.
We provide several parameters of the PCIe link. Through the driver,
user can adjust the value of certain parameter to affect the PCIe link
for the purpose of enhancing the performance in certian situation.

PTT trace is designed for dumping the TLP headers to the memory, which
can be used to analyze the transactions and usage condition of the PCIe
Link. Users can choose filters to trace headers, by either requester
ID, or those downstream of a set of Root Ports on the same core of the
PTT device. It's also supported to trace the headers of certain type and
of certain direction.

The driver registers a PMU device for each PTT device. The trace can
be used through `perf record` and the traced headers can be decoded
by `perf report`. The perf command support for the device is also
added in this patchset. The tune can be used through the sysfs
attributes of related PMU device. See the documentation for the
detailed usage.

Change since v1:
- switch the user interface of trace to perf from debugfs
- switch the user interface of tune to sysfs from debugfs
- add perf tool support to start trace and decode the trace data
- address the comments of documentation from Bjorn
- add RMR[1] support of the device as trace works in RMR mode or
  direct DMA mode. RMR support is achieved by common APIs rather
  than the APIs implemented in [1].
Link: 
https://lore.kernel.org/lkml/1618654631-42454-1-git-send-email-yangyic...@hisilicon.com/

[1] 
https://lore.kernel.org/linux-acpi/20210805080724.480-1-shameerali.kolothum.th...@huawei.com/

Qi Liu (1):
  perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

Yicong Yang (5):
  iommu: Export iommu_{get,put}_resv_regions()
  hwtracing: Add trace function support for HiSilicon PCIe Tune and
Trace device
  hwtracing: Add tune function support for HiSilicon PCIe Tune and Trace
device
  docs: Add HiSilicon PTT device driver documentation
  MAINTAINERS: Add maintainer for HiSilicon PTT driver

 Documentation/trace/hisi-ptt.rst  |  305 
 MAINTAINERS   |7 +
 drivers/Makefile  |1 +
 drivers/hwtracing/Kconfig |2 +
 drivers/hwtracing/hisilicon/Kconfig   |8 +
 drivers/hwtracing/hisilicon/Makefile  |2 +
 drivers/hwtracing/hisilicon/hisi_ptt.c| 1313 +
 drivers/iommu/iommu.c |2 +
 include/linux/iommu.h |4 +-
 tools/perf/arch/arm/util/auxtrace.c   |   56 +-
 tools/perf/arch/arm/util/pmu.c|3 +
 tools/perf/arch/arm64/util/Build  |2 +-
 tools/perf/arch/arm64/util/hisi_ptt.c |  195 +++
 tools/perf/util/Build |2 +
 tools/perf/util/auxtrace.c|4 +
 tools/perf/util/auxtrace.h|1 +
 tools/perf/util/hisi-ptt-decoder/Build|1 +
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c   |  170 +++
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h   |   28 +
 tools/perf/util/hisi_ptt.c|  228 +++
 tools/perf/util/hisi_ptt.h|   28 +
 21 files changed, 2356 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/trace/hisi-ptt.rst
 create mode 100644 drivers/hwtracing/hisilicon/Kconfig
 create mode 100644 drivers/hwtracing/hisilicon/Makefile
 create mode 100644 drivers/hwtracing/hisilicon/hisi_ptt.c
 create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/Build
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h
 create mode 100644 tools/perf/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi_ptt.h

-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/6] iommu: Export iommu_{get,put}_resv_regions()

2021-11-16 Thread Yicong Yang via iommu
Export iommu_{get,put}_resv_regions() to the modules so that the driver
can retrieve and use the reserved regions of the device.

Signed-off-by: Yicong Yang 
---
 drivers/iommu/iommu.c | 2 ++
 include/linux/iommu.h | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dd7863e453a5..e96711eee965 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2792,6 +2792,7 @@ void iommu_get_resv_regions(struct device *dev, struct 
list_head *list)
if (ops && ops->get_resv_regions)
ops->get_resv_regions(dev, list);
 }
+EXPORT_SYMBOL_GPL(iommu_get_resv_regions);
 
 void iommu_put_resv_regions(struct device *dev, struct list_head *list)
 {
@@ -2800,6 +2801,7 @@ void iommu_put_resv_regions(struct device *dev, struct 
list_head *list)
if (ops && ops->put_resv_regions)
ops->put_resv_regions(dev, list);
 }
+EXPORT_SYMBOL_GPL(iommu_put_resv_regions);
 
 /**
  * generic_iommu_put_resv_regions - Reserved region driver helper
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..1b7b0f370e28 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -450,8 +450,8 @@ extern phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t io
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
iommu_fault_handler_t handler, void *token);
 
-extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
-extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
+void iommu_get_resv_regions(struct device *dev, struct list_head *list);
+void iommu_put_resv_regions(struct device *dev, struct list_head *list);
 extern void generic_iommu_put_resv_regions(struct device *dev,
   struct list_head *list);
 extern void iommu_set_default_passthrough(bool cmd_line);
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-16 Thread David Hildenbrand
On 15.11.21 20:37, Zi Yan wrote:
> From: Zi Yan 
> 
> Hi David,

Hi,

thanks for looking into this.

> 
> You suggested to make alloc_contig_range() deal with pageblock_order instead 
> of
> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
> patchset is my attempt to achieve that. Please take a look and let me know if
> I am doing it correctly or not.
> 
> From what my understanding, cma required alignment of
> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
> __free_one_page() does not prevent merging two different pageblocks, when
> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
> does prevent that. It should be OK to just align cma to pageblock_order.
> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
> pageblock_order as alignment too.

I wonder if that's sufficient. Especially the outer_start logic in
alloc_contig_range() might be problematic. There are some ugly corner
cases with free pages/allocations spanning multiple pageblocks and we
only isolated a single pageblock.


Regarding CMA, we have to keep the following cases working:

a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER
   - 1 page:
   [   MAX_ ORDER - 1 ]
   [ pageblock 0 | pageblock 1]

Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA,
but not both. We have to make sure alloc_contig_range() keeps working
correctly. This should be the case even with your change, as we won't
merging pages accross differing migratetypes.

b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
   [   MAX_ ORDER - 1 ]
   [ pageblock 0 | pageblock 1]

Assume both are MIGRATE_CMA. Assume we want to either allocate from
pageblock 0 or pageblock 1. Especially, assume we want to allocate from
pageblock 1. While we would isolate pageblock 1, we wouldn't isolate
pageblock 0.

What happens if we either have a free page spanning the MAX_ORDER - 1
range already OR if we have to migrate a MAX_ORDER - 1 page, resulting
in a free MAX_ORDER - 1 page of which only the second pageblock is
isolated? We would end up essentially freeing a page that has mixed
pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I
might be wrong but I have the feeling that this would be problematic.

c) Concurrent allocations:
[   MAX_ ORDER - 1 ]
[ pageblock 0 | pageblock 1]

Assume b) but we have two concurrent CMA allocations to pageblock 0 and
pageblock 1, which would now be possible as start_isolate_page_range()
isolate would succeed on both.


Regarding virtio-mem, we care about the following cases:

a) Allocating parts from completely movable MAX_ ORDER - 1 page:
   [   MAX_ ORDER - 1 ]
   [ pageblock 0 | pageblock 1]

Assume pageblock 0 and pageblock 1 are either free or contain only
movable pages. Assume we allocated pageblock 0. We have to make sure we
can allocate pageblock 1. The other way around, assume we allocated
pageblock 1, we have to make sure we can allocate pageblock 0.

Free pages spanning both pageblocks might be problematic.

b) Allocate parts of partially movable MAX_ ORDER - 1 page:
   [   MAX_ ORDER - 1 ]
   [ pageblock 0 | pageblock 1]

Assume pageblock 0 contains unmovable data but pageblock 1 not: we have
to make sure we can allocate pageblock 1. Similarly, assume pageblock 1
contains unmovable data but pageblock 0 no: we have to make sure we can
allocate pageblock 1. has_unmovable_pages() might allow for that.

But, we want to fail early in case we want to allocate a single
pageblock but it contains unmovable data. This could be either directly
or indirectly.

If we have an unmovable (compound) MAX_ ORDER - 1 and we'd try isolating
pageblock 1, has_unmovable_pages() would always return "false" because
we'd simply be skiping over any tail pages, and not detect the
un-movability.

c) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:

Same concern as for CMA b)


So the biggest concern I have is dealing with migrating/freeing >
pageblock_order pages while only having isolated a single pageblock.

> 
> In terms of virtio_mem, if I understand correctly, it relies on
> alloc_contig_range() to obtain contiguous free pages and offlines them to 
> reduce
> guest memory size. As the result of alloc_contig_range() alignment change,
> virtio_mem should be able to just align PFNs to pageblock_order.

For virtio-mem it will most probably be desirable to first try
allocating the MAX_ORDER -1 range if possible and then fallback to
pageblock_order. But that's an additional change on top in virtio-mem code.



My take to teach alloc_contig_range() to properly handle would be the
following:

a) Convert MIGRATE_ISOLATE into a separate pageblock flag

We would want to convert MIGRATE_ISOLATE into a separate pageblock
flags, such that when we isolate a page block we preserve the original
migratetype. While start_isolate_page_range()