date:20211116

On Tue, Nov 16, 2021 at 05:00:14PM +, Robin Murphy wrote:
> On 2021-11-16 15:42, Jean-Philippe Brucker wrote:
> > On Tue, Nov 16, 2021 at 12:02:47PM +, Robin Murphy wrote:
> > > On 2021-11-16 11:35, Jean-Philippe Brucker wrote:
> > > > Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
> > > > Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
> > > > multiple independent PMCGs, for example one for the Translation Control
> > > > Unit (TCU) and one per Translation Buffer Unit (TBU).
> > > > 
> > > > I previously sent the binding as reply to Jay Chen's thread implementing
> > > > device tree support [1]. This posting addresses the comments from that
> > > > thread.
> > > 
> > > Ha, I'd also resurrected this and was planning to post it at some point 
> > > this
> > > week[0] - you should have said :)
> > 
> > Ah sorry about that, I just resent because there was some demand for it at
> > Linaro
> 
> Heh, no worries - it's not like you were even CC'ed on the thread where I
> only mentioned I *might* do it.
> 
> Can I get away with being cheeky and just saying that my review comments are
> the diff between my branch and yours, I wonder...

Sure, that works for me, I'll send a v2 this week or so

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] perf/smmuv3: Support devicetree


On 2021-11-16 15:42, Jean-Philippe Brucker wrote:

On Tue, Nov 16, 2021 at 12:02:47PM +, Robin Murphy wrote:

On 2021-11-16 11:35, Jean-Philippe Brucker wrote:

Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
multiple independent PMCGs, for example one for the Translation Control
Unit (TCU) and one per Translation Buffer Unit (TBU).

I previously sent the binding as reply to Jay Chen's thread implementing
device tree support [1]. This posting addresses the comments from that
thread.


Ha, I'd also resurrected this and was planning to post it at some point this
week[0] - you should have said :)


Ah sorry about that, I just resent because there was some demand for it at
Linaro


Heh, no worries - it's not like you were even CC'ed on the thread where 
I only mentioned I *might* do it.


Can I get away with being cheeky and just saying that my review comments 
are the diff between my branch and yours, I wonder...



Patch 1 adds two compatible strings. "arm,smmu-v3-pmcg" is common to all
PMCGs. "hisilicon,smmu-v3-pmcg-hip08" allows to support the same quirk
as IORT for that implementation (see patch 2). We'll probably want to
also introduce compatible strings for each implementation that has
additional perf events. For example the MMU-600 implementation has
different events for TCU and TBU PMCGs [2], but both components have the
same device IDs. So the driver could differentiate them if they had two
distinct compatible strings such as "arm,mmu-600-pmcg-tbu" and
"arm,mmu-600-pmcg-tcu".


Actually it only needs a general MMU-600 compatible, since once you know
it's an Arm Ltd. implementation, you can assume the pattern for the IMP_DEF
ID registers to figure out the rest.


It might be an error in the MMU-600 spec specifically, both TBU and TCU
PMU registers have a 0x83 PIDR0, where I think the TBU should be 0x84 (the
revC model uses that value). It's possible that the implementation
actually has 0x84 instead.


Yup, it's a mistake in the TRM. I just checked a real MMU-600 and the 
PMU PIDRs match the main TCU/TBU PIDRs as expected. At least the MMU-700 
docs haven't repeated the same error.


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding

On Tue, Nov 16, 2021 at 08:02:53AM -0600, Rob Herring wrote:
> My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> on your patch (DT_CHECKER_FLAGS is new in v5.13):
> 
> yamllint warnings/errors:
> ./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:24:9: 
> [warning] wrong indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:25:11: 
> [warning] wrong indentation: expected 12 but found 10 (indentation)
> 
> dtschema/dtc warnings/errors:
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b42:reg:0: [0, 725745664, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b42:reg:1: [0, 725811200, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b44:reg:0: [0, 725876736, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
>  example-0: pmu@2b44:reg:1: [0, 725942272, 0, 4096] is too long
>   From schema: 
> /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
> 
> doc reference errors (make refcheckdocs):
> 
> See https://patchwork.ozlabs.org/patch/1555758
> 
> This check can fail if there are any dependencies. The base for a patch
> series is generally the most recent rc1.
> 
> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure 'yamllint' is installed and dt-schema is up to
> date:
> 
> pip3 install dtschema --upgrade
> 
> Please check and re-submit.

Right I'll fix those, I had only run dtbs_check

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] perf/smmuv3: Add devicetree support

On Tue, Nov 16, 2021 at 12:06:36PM +, John Garry wrote:
> On 16/11/2021 11:35, Jean-Philippe Brucker wrote:
> > Add device-tree support to the SMMUv3 PMCG.  One small cosmetic change
> > while factoring the option mask printout: don't display it when zero, it
> > only contains one erratum at the moment.
> > 
> > Signed-off-by: Jay Chen 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >   drivers/perf/arm_smmuv3_pmu.c | 25 +++--
> >   1 file changed, 23 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> > index 226348822ab3..958325ac103a 100644
> > --- a/drivers/perf/arm_smmuv3_pmu.c
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -47,6 +47,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -750,8 +751,15 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
> > *smmu_pmu)
> > smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
> > break;
> > }
> > +}
> > +
> > +static void smmu_pmu_get_of_options(struct smmu_pmu *smmu_pmu)
> > +{
> > +   struct device_node *node = smmu_pmu->dev->of_node;
> > -   dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
> > +   if (of_device_is_compatible(node, "hisilicon,smmu-v3-pmcg-hip08"))
> 
> I don't think that this is necessary. We don't support DT for hip08, nor
> have any plans to. Incidentally, was this binding missing in your series?

Ok I'll drop this (and the compatible value from patch 1)

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] perf/smmuv3: Support devicetree

On Tue, Nov 16, 2021 at 12:02:47PM +, Robin Murphy wrote:
> On 2021-11-16 11:35, Jean-Philippe Brucker wrote:
> > Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
> > Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
> > multiple independent PMCGs, for example one for the Translation Control
> > Unit (TCU) and one per Translation Buffer Unit (TBU).
> > 
> > I previously sent the binding as reply to Jay Chen's thread implementing
> > device tree support [1]. This posting addresses the comments from that
> > thread.
> 
> Ha, I'd also resurrected this and was planning to post it at some point this
> week[0] - you should have said :)

Ah sorry about that, I just resent because there was some demand for it at
Linaro

> > Patch 1 adds two compatible strings. "arm,smmu-v3-pmcg" is common to all
> > PMCGs. "hisilicon,smmu-v3-pmcg-hip08" allows to support the same quirk
> > as IORT for that implementation (see patch 2). We'll probably want to
> > also introduce compatible strings for each implementation that has
> > additional perf events. For example the MMU-600 implementation has
> > different events for TCU and TBU PMCGs [2], but both components have the
> > same device IDs. So the driver could differentiate them if they had two
> > distinct compatible strings such as "arm,mmu-600-pmcg-tbu" and
> > "arm,mmu-600-pmcg-tcu".
> 
> Actually it only needs a general MMU-600 compatible, since once you know
> it's an Arm Ltd. implementation, you can assume the pattern for the IMP_DEF
> ID registers to figure out the rest.

It might be an error in the MMU-600 spec specifically, both TBU and TCU
PMU registers have a 0x83 PIDR0, where I think the TBU should be 0x84 (the
revC model uses that value). It's possible that the implementation
actually has 0x84 instead.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 5/5] scsi: storvsc: Add Isolation VM support for storvsc driver

From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
-   hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
-   hvpgoff;
+   hvpfn = HVPFN_DOWN(sg_dma_address(sg));
+   hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
+sg_dma_len(sg)) - hvpfn;
 
/*
 * Fill the next portion of the PFN array with
@@ -1872,7 +1876,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * the PFN array is filled.
 */
while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   payload->range.pfn_array[i++] = hvpfn++;
}
}
 
@@ -2016,6

[PATCH 4/5] net: netvsc: Add Isolation VM support for netvsc driver

From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation
VM. After calling vmbus_establish_gpadl() which marks these pages visible
to host, map these pages unencrypted addes space via dma_vmap_noncontiguous().

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   6 +
 4 files changed, 190 insertions(+), 15 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..31c77a00d01e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   struct sg_table *recv_sgt;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   struct sg_table *send_sgt;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..9cdc71930830 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void)
return net_device;
 }
 
+static struct hv_device *netvsc_channel_to_device(struct vmbus_channel 
*channel)
+{
+   struct vmbus_channel *primary = channel->primary_channel;
+
+   return primary ? primary->device_obj : channel->device_obj;
+}
+
 static void free_netvsc_device(struct rcu_head *head)
 {
struct netvsc_device *nvdev
= container_of(head, struct netvsc_device, rcu);
+   struct hv_device *dev =
+   netvsc_channel_to_device(nvdev->chan_table[0].channel);
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->recv_buf);
+   dma_free_noncontiguous(>device, nvdev->recv_buf_size,
+  nvdev->recv_sgt, DMA_FROM_DEVICE);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->send_buf);
+   dma_free_noncontiguous(>device, nvdev->send_buf_size,
+  nvdev->send_sgt, DMA_TO_DEVICE);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device,
buf_size = min_t(unsigned int, buf_size,
 NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
 
-   net_device->recv_buf = vzalloc(buf_size);
+   if (hv_isolation_type_snp()) {
+   net_device->recv_sgt =
+   dma_alloc_noncontiguous(>device, buf_size,
+   DMA_FROM_DEVICE, GFP_KERNEL, 0);
+   if (!net_device->recv_sgt) {
+   pr_err("Fail to allocate recv buffer buf_size %d.\n.", 
buf_size);
+   ret = -ENOMEM;
+   goto cleanup;
+   }
+
+   net_device->recv_buf = (void 
*)net_device->recv_sgt->sgl->dma_address;
+   } else {
+   net_device->recv_buf = vzalloc(buf_size);
+   }
+
if (!net_device->recv_buf) {
netdev_err(ndev,
   "unable to allocate receive buffer of size %u\n",
@@ -357,8 +396,6 @@ static int netvsc_init_buf(struct hv_device *device,

[PATCH 2/5] dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops

From: Tianyu Lan 

Hyper-V netvsc driver needs to allocate noncontiguous DMA memory and
remap it into unencrypted address space before sharing with host. Add
vmap/vunmap_noncontiguous() callback and handle the remap in the Hyper-V
dma ops callback.

Signed-off-by: Tianyu Lan 
---
 include/linux/dma-map-ops.h |  3 +++
 kernel/dma/mapping.c| 18 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..f7b9958ca20a 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -27,6 +27,9 @@ struct dma_map_ops {
unsigned long attrs);
void (*free_noncontiguous)(struct device *dev, size_t size,
struct sg_table *sgt, enum dma_data_direction dir);
+   void *(*vmap_noncontiguous)(struct device *dev, size_t size,
+   struct sg_table *sgt);
+   void (*vunmap_noncontiguous)(struct device *dev, void *addr);
int (*mmap)(struct device *, struct vm_area_struct *,
void *, dma_addr_t, size_t, unsigned long attrs);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9478eccd1c8e..7fd751d866cc 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -674,8 +674,14 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t 
size,
const struct dma_map_ops *ops = get_dma_ops(dev);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 
-   if (ops && ops->alloc_noncontiguous)
-   return vmap(sgt_handle(sgt)->pages, count, VM_MAP, PAGE_KERNEL);
+   if (ops) {
+   if (ops->vmap_noncontiguous)
+   return ops->vmap_noncontiguous(dev, size, sgt);
+   else if (ops->alloc_noncontiguous)
+   return vmap(sgt_handle(sgt)->pages, count, VM_MAP,
+   PAGE_KERNEL);
+   }
+
return page_address(sg_page(sgt->sgl));
 }
 EXPORT_SYMBOL_GPL(dma_vmap_noncontiguous);
@@ -684,8 +690,12 @@ void dma_vunmap_noncontiguous(struct device *dev, void 
*vaddr)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
 
-   if (ops && ops->alloc_noncontiguous)
-   vunmap(vaddr);
+   if (ops) {
+   if (ops->vunmap_noncontiguous)
+   ops->vunmap_noncontiguous(dev, vaddr);
+   else if (ops->alloc_noncontiguous)
+   vunmap(vaddr);
+   }
 }
 EXPORT_SYMBOL_GPL(dma_vunmap_noncontiguous);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/5] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  6 
 kernel/dma/swiotlb.c| 75 -
 2 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..09a140d617fa 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and 
store
+ * virtual address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..4735c5e0f44d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr) {
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   return NULL;
+   }
+
+   return vaddr;
+   }
+
+   return phys_to_virt(mem->start);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,10 +200,17 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr) {
+   pr_err("Fail to remap swiotlb mem.\n");
+   return;
+   }
+
+   memset(mem->vaddr, 0, bytes);
 }
 
-static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
@@ -196,13 +231,28 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
}
+
+   /*
+* With swiotlb_unencrypted_base setting, swiotlb bounce buffer will
+* be remapped in the swiotlb_update_mem_attributes() and return

[PATCH 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
in the hyperv_iommu_swiotlb_later_init().

Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous
callback to handle request of  allocating and mapping noncontiguous dma
memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_
bypass flag for hv device to use dma direct functions during mapping/unmapping
dma page.

Signed-off-by: Tianyu Lan 
---
 arch/x86/mm/mem_encrypt.c  |   4 +-
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/vmbus_drv.c |   6 ++
 drivers/iommu/hyperv-iommu.c   | 164 +
 include/linux/hyperv.h |  10 ++
 6 files changed, 186 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 35487305d8af..65bc385ae07a 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
@@ -203,7 +204,8 @@ void __init sev_setup_arch(void)
phys_addr_t total_mem = memblock_phys_mem_size();
unsigned long size;
 
-   if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+   if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)
+   && !hv_is_isolation_supported())
return;
 
/*
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 46df59aeaa06..30fd0600b008 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index dd12af20e467..d43b4cd88f57 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -9,6 +9,7 @@ config HYPERV
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
select VMAP_PFN
+   select DMA_OPS_BYPASS
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..32dc193e31cd 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_ops_bypass = true;
+   child_device_obj->device.dma_ops = _iommu_dma_ops;
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..ebcb628e7e8f 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,21 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include

[PATCH 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM. Add Hyper-V dma ops and provide dma_alloc/free_
noncontiguous and vmap/vunmap_noncontiguous callback. Allocate
rx/tx ring via dma_alloc_noncontiguous() and map them into extra
address space via dma_vmap_noncontiguous().

Tianyu Lan (5):
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops
  hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  net: netvsc: Add Isolation VM support for netvsc driver
  scsi: storvsc: Add Isolation VM support for storvsc driver

 arch/x86/mm/mem_encrypt.c |   4 +-
 arch/x86/xen/pci-swiotlb-xen.c|   3 +-
 drivers/hv/Kconfig|   1 +
 drivers/hv/vmbus_drv.c|   6 +
 drivers/iommu/hyperv-iommu.c  | 164 +
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 +++---
 include/linux/dma-map-ops.h   |   3 +
 include/linux/hyperv.h|  17 +++
 include/linux/swiotlb.h   |   6 +
 kernel/dma/mapping.c  |  18 ++-
 kernel/dma/swiotlb.c  |  75 ++--
 14 files changed, 488 insertions(+), 45 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/5] iommu: Some IOVA code reorganisation

On 2021-11-16 14:21, John Garry wrote:

On 04/10/2021 12:44, Will Deacon wrote:

On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote:

The IOVA domain structure is a bit overloaded, holding:
- IOVA tree management
- FQ control
- IOVA rcache memories

Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c
uses the FQ feature.

This series separates out that structure. In addition, it moves the FQ
code into dma-iommu.c . This is not strictly necessary, but it does make
it easier for the FQ domain lookup the rcache domain.

The rcache code stays where it is, as it may be reworked in future, so
there is not much point in relocating and then discarding.

This topic was initially discussed and suggested (I think) by Robin
here:
https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/

It would be useful to have Robin's Ack on patches 2-4. The implementation
looks straightforward to me, but the thread above isn't very clear about
what is being suggested.

Hi Robin,

Just wondering if you had made any progress on your FQ code rework or
your own re-org?

Hey John - as it happens I started hacking on that in earnest about half
an hour ago, aiming to get something out later this week.

Cheers,
Robin.

I wasn't planning on progressing
https://lore.kernel.org/linux-iommu/1626259003-201303-1-git-send-email-john.ga...@huawei.com/
until this is done first (and that is still a big issue), even though
not strictly necessary.

Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/5] iommu: Some IOVA code reorganisation

2021-11-16 Thread John Garry


On 04/10/2021 12:44, Will Deacon wrote:

On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote:

The IOVA domain structure is a bit overloaded, holding:
- IOVA tree management
- FQ control
- IOVA rcache memories

Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c
uses the FQ feature.

This series separates out that structure. In addition, it moves the FQ
code into dma-iommu.c . This is not strictly necessary, but it does make
it easier for the FQ domain lookup the rcache domain.

The rcache code stays where it is, as it may be reworked in future, so
there is not much point in relocating and then discarding.

This topic was initially discussed and suggested (I think) by Robin here:
https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/

It would be useful to have Robin's Ack on patches 2-4. The implementation
looks straightforward to me, but the thread above isn't very clear about
what is being suggested.


Hi Robin,

Just wondering if you had made any progress on your FQ code rework or 
your own re-org?


I wasn't planning on progressing 
https://lore.kernel.org/linux-iommu/1626259003-201303-1-git-send-email-john.ga...@huawei.com/ 
until this is done first (and that is still a big issue), even though 
not strictly necessary.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding

2021-11-16 Thread Rob Herring

On Tue, 16 Nov 2021 11:35:36 +, Jean-Philippe Brucker wrote:
> Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
> placed as a sibling node of the SMMU. Although the PMCGs registers may
> be within the SMMU MMIO region, they are separate devices, and there can
> be multiple PMCG devices for each SMMU (for example one for the TCU and
> one for each TBU).
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  .../bindings/iommu/arm,smmu-v3-pmcg.yaml  | 67 +++
>  1 file changed, 67 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:
./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:24:9: [warning] 
wrong indentation: expected 10 but found 8 (indentation)
./Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml:25:11: 
[warning] wrong indentation: expected 12 but found 10 (indentation)

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b42:reg:0: [0, 725745664, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b42:reg:1: [0, 725811200, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b44:reg:0: [0, 725876736, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.example.dt.yaml:
 example-0: pmu@2b44:reg:1: [0, 725942272, 0, 4096] is too long
From schema: 
/usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1555758

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces

2021-11-16 Thread Jason Gunthorpe via iommu

On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> Hi Christoph,
> 
> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > +enum iommu_dma_owner {
> > > + DMA_OWNER_NONE,
> > > + DMA_OWNER_KERNEL,
> > > + DMA_OWNER_USER,
> > > +};
> > > +
> > 
> > > + enum iommu_dma_owner dma_owner;
> > > + refcount_t owner_cnt;
> > > + struct file *owner_user_file;
> > 
> > I'd just overload the ownership into owner_user_file,
> > 
> >   NULL  -> no owner
> >   (struct file *)1UL)   -> kernel
> >   real pointer  -> user
> > 
> > Which could simplify a lot of the code dealing with the owner.
> > 
> 
> Yeah! Sounds reasonable. I will make this in the next version.

It would be good to figure out how to make iommu_attach_device()
enforce no other driver binding as a kernel user without a file *, as
Robin pointed to, before optimizing this.

This fixes an existing bug where iommu_attach_device() only checks the
group size and is vunerable to a hot plug increasing the group size
after it returns. That check should be replaced by this series's logic
instead.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] perf/smmuv3: Add devicetree support

2021-11-16 Thread John Garry


On 16/11/2021 11:35, Jean-Philippe Brucker wrote:

Add device-tree support to the SMMUv3 PMCG.  One small cosmetic change
while factoring the option mask printout: don't display it when zero, it
only contains one erratum at the moment.

Signed-off-by: Jay Chen 
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/perf/arm_smmuv3_pmu.c | 25 +++--
  1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 226348822ab3..958325ac103a 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -47,6 +47,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -750,8 +751,15 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
*smmu_pmu)
smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
break;
}
+}
+
+static void smmu_pmu_get_of_options(struct smmu_pmu *smmu_pmu)
+{
+   struct device_node *node = smmu_pmu->dev->of_node;
  
-	dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);

+   if (of_device_is_compatible(node, "hisilicon,smmu-v3-pmcg-hip08"))


I don't think that this is necessary. We don't support DT for hip08, nor 
have any plans to. Incidentally, was this binding missing in your series?


Thanks,
John


+   /* HiSilicon Erratum 162001800 */
+   smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
  }
  
  static int smmu_pmu_probe(struct platform_device *pdev)

@@ -834,7 +842,13 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return -EINVAL;
}
  
-	smmu_pmu_get_acpi_options(smmu_pmu);

+   if (dev->of_node)
+   smmu_pmu_get_of_options(smmu_pmu);
+   else
+   smmu_pmu_get_acpi_options(smmu_pmu);
+
+   if (smmu_pmu->options)
+   dev_notice(dev, "option mask 0x%x\n", smmu_pmu->options);
  
  	/* Pick one CPU to be the preferred one to use */

smmu_pmu->on_cpu = raw_smp_processor_id();
@@ -884,10 +898,17 @@ static void smmu_pmu_shutdown(struct platform_device 
*pdev)
smmu_pmu_disable(_pmu->pmu);
  }
  
+static const struct of_device_id arm_smmu_pmu_match[] = {

+   { .compatible = "arm,smmu-v3-pmcg" },
+   {},
+};
+MODULE_DEVICE_TABLE(of, arm_smmu_pmu_match);
+
  static struct platform_driver smmu_pmu_driver = {
.driver = {
.name = "arm-smmu-v3-pmcg",
.suppress_bind_attrs = true,
+   .of_match_table = of_match_ptr(arm_smmu_pmu_match),
},
.probe = smmu_pmu_probe,
.remove = smmu_pmu_remove,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] perf/smmuv3: Support devicetree

On 2021-11-16 11:35, Jean-Philippe Brucker wrote:

Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
multiple independent PMCGs, for example one for the Translation Control
Unit (TCU) and one per Translation Buffer Unit (TBU).

I previously sent the binding as reply to Jay Chen's thread implementing
device tree support [1]. This posting addresses the comments from that
thread.

Ha, I'd also resurrected this and was planning to post it at some point
this week[0] - you should have said :)

Patch 1 adds two compatible strings. "arm,smmu-v3-pmcg" is common to all
PMCGs. "hisilicon,smmu-v3-pmcg-hip08" allows to support the same quirk
as IORT for that implementation (see patch 2). We'll probably want to
also introduce compatible strings for each implementation that has
additional perf events. For example the MMU-600 implementation has
different events for TCU and TBU PMCGs [2], but both components have the
same device IDs. So the driver could differentiate them if they had two
distinct compatible strings such as "arm,mmu-600-pmcg-tbu" and
"arm,mmu-600-pmcg-tcu".

Actually it only needs a general MMU-600 compatible, since once you know
it's an Arm Ltd. implementation, you can assume the pattern for the
IMP_DEF ID registers to figure out the rest.

Robin.

[0] https://gitlab.arm.com/linux-arm/linux-rm/-/commits/smmu-pmcg

The series doesn't deal with this because for testing I use a software
model which only implements architected events. I do not include DTS
change for that platform because enabling PMCGs requires an additional
model option. See my branch smmu/pmu-dt [3] for details.

[1] https://lore.kernel.org/all/20200707150114.GC159413@myrica/
[2]
https://developer.arm.com/documentation/100310/0202/Functional-description/Operation/Performance-Monitoring-Unit
[3] https://jpbrucker.net/git/linux/log/?h=smmu/pmu-dt

Jean-Philippe Brucker (2):
dt-bindings: Add Arm SMMUv3 PMCG binding
perf/smmuv3: Add devicetree support

.../bindings/iommu/arm,smmu-v3-pmcg.yaml | 67 +++
drivers/perf/arm_smmuv3_pmu.c | 25 ++-
2 files changed, 90 insertions(+), 2 deletions(-)
create mode 100644
Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding

Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
placed as a sibling node of the SMMU. Although the PMCGs registers may
be within the SMMU MMIO region, they are separate devices, and there can
be multiple PMCG devices for each SMMU (for example one for the TCU and
one for each TBU).

Signed-off-by: Jean-Philippe Brucker 
---
 .../bindings/iommu/arm,smmu-v3-pmcg.yaml  | 67 +++
 1 file changed, 67 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
new file mode 100644
index ..a893e071fdb4
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/arm,smmu-v3-pmcg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm SMMUv3 Performance Monitor Counter Group
+
+maintainers:
+  - Will Deacon 
+  - Robin Murphy 
+
+description: |+
+  An SMMUv3 may have several Performance Monitor Counter Group (PMCG).
+  They are standalone performance monitoring units that support both
+  architected and IMPLEMENTATION DEFINED event counters.
+
+properties:
+  $nodename:
+pattern: "^pmu@[0-9a-f]*"
+  compatible:
+oneOf:
+  - items:
+- enum:
+  - hisilicon,smmu-v3-pmcg-hip08
+- const: arm,smmu-v3-pmcg
+  - const: arm,smmu-v3-pmcg
+
+  reg:
+description: |
+  Base addresses of the PMCG registers. Either a single address for Page 0
+  or an additional address for Page 1, where some registers can be
+  relocated with SMMU_PMCG_CFGR.RELOC_CTRS.
+minItems: 1
+maxItems: 2
+
+  interrupts:
+maxItems: 1
+
+  msi-parent: true
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |+
+#include 
+#include 
+
+pmu@2b42 {
+compatible = "arm,smmu-v3-pmcg";
+reg = <0 0x2b42 0 0x1000>,
+  <0 0x2b43 0 0x1000>;
+interrupts = ;
+msi-parent = < 0xff>;
+};
+
+pmu@2b44 {
+compatible = "arm,smmu-v3-pmcg";
+reg = <0 0x2b44 0 0x1000>,
+  <0 0x2b45 0 0x1000>;
+interrupts = ;
+msi-parent = < 0xff>;
+};
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/2] perf/smmuv3: Support devicetree

I previously sent the binding as reply to Jay Chen's thread implementing
device tree support [1]. This posting addresses the comments from that
thread.

Jean-Philippe Brucker (2):
dt-bindings: Add Arm SMMUv3 PMCG binding
perf/smmuv3: Add devicetree support

--
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/2] perf/smmuv3: Add devicetree support

Add device-tree support to the SMMUv3 PMCG.  One small cosmetic change
while factoring the option mask printout: don't display it when zero, it
only contains one erratum at the moment.

Signed-off-by: Jay Chen 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/perf/arm_smmuv3_pmu.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 226348822ab3..958325ac103a 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -750,8 +751,15 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
*smmu_pmu)
smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
break;
}
+}
+
+static void smmu_pmu_get_of_options(struct smmu_pmu *smmu_pmu)
+{
+   struct device_node *node = smmu_pmu->dev->of_node;
 
-   dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
+   if (of_device_is_compatible(node, "hisilicon,smmu-v3-pmcg-hip08"))
+   /* HiSilicon Erratum 162001800 */
+   smmu_pmu->options |= SMMU_PMCG_EVCNTR_RDONLY;
 }
 
 static int smmu_pmu_probe(struct platform_device *pdev)
@@ -834,7 +842,13 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   smmu_pmu_get_acpi_options(smmu_pmu);
+   if (dev->of_node)
+   smmu_pmu_get_of_options(smmu_pmu);
+   else
+   smmu_pmu_get_acpi_options(smmu_pmu);
+
+   if (smmu_pmu->options)
+   dev_notice(dev, "option mask 0x%x\n", smmu_pmu->options);
 
/* Pick one CPU to be the preferred one to use */
smmu_pmu->on_cpu = raw_smp_processor_id();
@@ -884,10 +898,17 @@ static void smmu_pmu_shutdown(struct platform_device 
*pdev)
smmu_pmu_disable(_pmu->pmu);
 }
 
+static const struct of_device_id arm_smmu_pmu_match[] = {
+   { .compatible = "arm,smmu-v3-pmcg" },
+   {},
+};
+MODULE_DEVICE_TABLE(of, arm_smmu_pmu_match);
+
 static struct platform_driver smmu_pmu_driver = {
.driver = {
.name = "arm-smmu-v3-pmcg",
.suppress_bind_attrs = true,
+   .of_match_table = of_match_ptr(arm_smmu_pmu_match),
},
.probe = smmu_pmu_probe,
.remove = smmu_pmu_remove,
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/6] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

On 2021/11/16 18:56, Robin Murphy wrote:
> On 2021-11-16 09:06, Yicong Yang via iommu wrote:
> [...]
>> +/*
>> + * Get RMR address if provided by the firmware.
>> + * Return 0 if the IOMMU doesn't present or the policy of the
>> + * IOMMU domain is passthrough or we get a usable RMR region.
>> + * Otherwise a negative value is returned.
>> + */
>> +static int hisi_ptt_get_rmr(struct hisi_ptt *hisi_ptt)
>> +{
>> +    struct pci_dev *pdev = hisi_ptt->pdev;
>> +    struct iommu_domain *iommu_domain;
>> +    struct iommu_resv_region *region;
>> +    LIST_HEAD(list);
>> +
>> +    /*
>> + * Use direct DMA if IOMMU does not present or the policy of the
>> + * IOMMU domain is passthrough.
>> + */
>> +    iommu_domain = iommu_get_domain_for_dev(>dev);
>> +    if (!iommu_domain || iommu_domain->type == IOMMU_DOMAIN_IDENTITY)
>> +    return 0;
>> +
>> +    iommu_get_resv_regions(>dev, );
>> +    list_for_each_entry(region, , list)
>> +    if (region->type == IOMMU_RESV_DIRECT &&
>> +    region->length >= HISI_PTT_TRACE_BUFFER_SIZE) {
>> +    hisi_ptt->trace_ctrl.has_rmr = true;
>> +    hisi_ptt->trace_ctrl.rmr_addr = region->start;
>> +    hisi_ptt->trace_ctrl.rmr_length = region->length;
>> +    break;
>> +    }
>> +
>> +    iommu_put_resv_regions(>dev, );
>> +    return hisi_ptt->trace_ctrl.has_rmr ? 0 : -ENOMEM;
>> +}
> 
> No.
> 
> The whole point of RMRs is for devices that are already configured to access 
> the given address range in a manner beyond the kernel's control. If you can 
> do this, it proves that you should not have an RMR in the first place.
> 
> The notion of a kernel driver explicitly configuring its device to DMA into 
> any random RMR that looks big enough is so egregiously wrong that I'm almost 
> lost for words...
> 

our bios will reserve such a region and reported it through iort. the device 
will write to the region and in the driver we need to access the region
to get the traced data. the region is reserved exclusively and will not be 
accessed by kernel or other devices.

is it ok to let bios configure the address to the device and from CPU side we 
just read it?

Thanks,
Yicong


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: dma-direct fixes and cleanups v3


On 2021-11-11 06:50, Christoph Hellwig wrote:

Hi all,

Linus complained about the complex flow in dma_direct_alloc, so this
tries to simplify it a bit, and while I was at it I also made sure that
unencrypted pages never leak back into the page allocator.


Before I forget, I've had a quick skim of the remaining patches and 
nothing more stands out. Let me know if you'd like me to find time to 
check everything over in detail again for a proper review, but otherwise 
I reckon we may as well get this baking in -next sooner rather than later.


Cheers,
Robin.


Changes since v2:
  - don't call dma_set_decrypted on remapped memory
  - move the leak printk into dma_set_encrypted
  - add another local variable to clean up dma_direct_alloc
  - return NULL when the is no way to make the memory coherent

Changes since v1:
  - fix a missing return
  - add a new patch to fix a pre-existing missing unmap
  - various additional cleanups
  
Diffstat:

  direct.c |  234 
+--
  1 file changed, 138 insertions(+), 96 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/6] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device


On 2021-11-16 09:06, Yicong Yang via iommu wrote:
[...]

+/*
+ * Get RMR address if provided by the firmware.
+ * Return 0 if the IOMMU doesn't present or the policy of the
+ * IOMMU domain is passthrough or we get a usable RMR region.
+ * Otherwise a negative value is returned.
+ */
+static int hisi_ptt_get_rmr(struct hisi_ptt *hisi_ptt)
+{
+   struct pci_dev *pdev = hisi_ptt->pdev;
+   struct iommu_domain *iommu_domain;
+   struct iommu_resv_region *region;
+   LIST_HEAD(list);
+
+   /*
+* Use direct DMA if IOMMU does not present or the policy of the
+* IOMMU domain is passthrough.
+*/
+   iommu_domain = iommu_get_domain_for_dev(>dev);
+   if (!iommu_domain || iommu_domain->type == IOMMU_DOMAIN_IDENTITY)
+   return 0;
+
+   iommu_get_resv_regions(>dev, );
+   list_for_each_entry(region, , list)
+   if (region->type == IOMMU_RESV_DIRECT &&
+   region->length >= HISI_PTT_TRACE_BUFFER_SIZE) {
+   hisi_ptt->trace_ctrl.has_rmr = true;
+   hisi_ptt->trace_ctrl.rmr_addr = region->start;
+   hisi_ptt->trace_ctrl.rmr_length = region->length;
+   break;
+   }
+
+   iommu_put_resv_regions(>dev, );
+   return hisi_ptt->trace_ctrl.has_rmr ? 0 : -ENOMEM;
+}


No.

The whole point of RMRs is for devices that are already configured to 
access the given address range in a manner beyond the kernel's control. 
If you can do this, it proves that you should not have an RMR in the 
first place.


The notion of a kernel driver explicitly configuring its device to DMA 
into any random RMR that looks big enough is so egregiously wrong that 
I'm almost lost for words...


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces

2021-11-16 Thread Lu Baolu


Hi Christoph,

On 2021/11/15 21:27, Christoph Hellwig wrote:

On Mon, Nov 15, 2021 at 10:05:47AM +0800, Lu Baolu wrote:

The vfio needs to set DMA_OWNER_USER for the entire group when attaching


The vfio subsystem?  driver?


"vfio subsystem"




it to a vfio container. So expose group variants of setting/releasing dma
ownership for this purpose.

This also exposes the helper iommu_group_dma_owner_unclaimed() for vfio
report to userspace if the group is viable to user assignment, for


.. for vfio to report .. ?


Yes.




  void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner 
owner);
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner 
owner,
+ struct file *user_file);
+void iommu_group_release_dma_owner(struct iommu_group *group, enum 
iommu_dma_owner owner);


Pleae avoid all these overly long lines.


Sure. Thanks!




+static inline int iommu_group_set_dma_owner(struct iommu_group *group,
+   enum iommu_dma_owner owner,
+   struct file *user_file)
+{
+   return -EINVAL;
+}
+
+static inline void iommu_group_release_dma_owner(struct iommu_group *group,
+enum iommu_dma_owner owner)
+{
+}
+
+static inline bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+   return false;
+}


Why do we need these stubs?  All potential callers should already
require CONFIG_IOMMU_API?  Same for the helpers added in patch 1, btw.


You are right. This helper is only for vfio which requires IOMMU_API. I
will remove this.

The helpers in patch 1 seem not the same. The driver core (or bus
dma_configure() callback as suggested) will also call them.




+   mutex_lock(>mutex);
+   ret = __iommu_group_set_dma_owner(group, owner, user_file);
+   mutex_unlock(>mutex);



+   mutex_lock(>mutex);
+   __iommu_group_release_dma_owner(group, owner);
+   mutex_unlock(>mutex);


Unless I'm missing something (just skipping over the patches),
the existing callers also take the lock just around these calls,
so we don't really need the __-prefixed lowlevel helpers.



Move mutex_lock/unlock will make the helper implementation easier. :-)
It seems to be common code style in iommu core. For example,
__iommu_attach_group(), __iommu_group_for_each_dev(), etc.


+   mutex_lock(>mutex);
+   owner = group->dma_owner;
+   mutex_unlock(>mutex);


No need for a lock to read a single scalar.


Adding the lock will make kcasn happy. Jason G also told me that

[citing from his review comment]
"
It is always incorrect to read concurrent data without an annotation
of some kind.

For instance it can cause mis-execution of logic where the compiler is
unaware that a value it loads is allowed to change - ie no 
READ_ONCE/WRITE_ONCE semantic.

"




+
+   return owner == DMA_OWNER_NONE;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_unclaimed);


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 4/6] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

From: Qi Liu 

'perf record' and 'perf report --dump-raw-trace' supported in this
patch.

Example usage:

Output will contain raw PTT data and its textual representation, such
as:

0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40  offset: 0
ref: 0xa5d50c725  idx: 0  tid: -1  cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
.  : 00 00 00 00 Prefix
.  0004: 08 20 00 60 Header DW0
.  0008: ff 02 00 01 Header DW1
.  000c: 20 08 00 00 Header DW2
.  0010: 10 e7 44 ab Header DW3
.  0014: 2a a8 1e 01 Time
.  0020: 00 00 00 00 Prefix
.  0024: 01 00 00 60 Header DW0
.  0028: 0f 1e 00 01 Header DW1
.  002c: 04 00 00 00 Header DW2
.  0030: 40 00 81 02 Header DW3
.  0034: ee 02 00 00 Time


Signed-off-by: Qi Liu 
Signed-off-by: Yicong Yang 
---
 tools/perf/arch/arm/util/auxtrace.c   |  56 -
 tools/perf/arch/arm/util/pmu.c|   3 +
 tools/perf/arch/arm64/util/Build  |   2 +-
 tools/perf/arch/arm64/util/hisi_ptt.c | 195 +++
 tools/perf/util/Build |   2 +
 tools/perf/util/auxtrace.c|   4 +
 tools/perf/util/auxtrace.h|   1 +
 tools/perf/util/hisi-ptt-decoder/Build|   1 +
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c   | 170 +
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h   |  28 +++
 tools/perf/util/hisi_ptt.c| 228 ++
 tools/perf/util/hisi_ptt.h|  28 +++
 12 files changed, 714 insertions(+), 4 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/Build
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h
 create mode 100644 tools/perf/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi_ptt.h

diff --git a/tools/perf/arch/arm/util/auxtrace.c 
b/tools/perf/arch/arm/util/auxtrace.c
index 5fc6a2a3dbc5..dc9d2172464e 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -4,6 +4,7 @@
  * Author: Mathieu Poirier 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -14,6 +15,7 @@
 #include "../../../util/pmu.h"
 #include "cs-etm.h"
 #include "arm-spe.h"
+#include "hisi_ptt.h"
 
 static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
 {
@@ -50,6 +52,39 @@ static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, 
int *err)
return arm_spe_pmus;
 }
 
+static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err)
+{
+   struct perf_pmu **hisi_ptt_pmus = NULL;
+   int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+   DIR *dir = NULL;
+   struct dirent *dent;
+
+   hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * nr_cpus);
+   if (!hisi_ptt_pmus) {
+   pr_err("hisi_ptt alloc failed\n");
+   *err = -ENOMEM;
+   return NULL;
+   }
+
+   dir = opendir("/sys/devices");
+   dent = readdir(dir);
+   while (dent) {
+   if (strstr(dent->d_name, HISI_PTT_PMU_NAME)) {
+   hisi_ptt_pmus[*nr_ptts] = perf_pmu__find(dent->d_name);
+   if (hisi_ptt_pmus[*nr_ptts]) {
+   pr_debug2("%s %d: arm_spe_pmu %d type %d name 
%s\n",
+ __func__, __LINE__, *nr_ptts,
+ hisi_ptt_pmus[*nr_ptts]->type,
+ hisi_ptt_pmus[*nr_ptts]->name);
+   (*nr_ptts)++;
+   }
+   }
+   dent = readdir(dir);
+   }
+
+   return hisi_ptt_pmus;
+}
+
 struct auxtrace_record
 *auxtrace_record__init(struct evlist *evlist, int *err)
 {
@@ -57,8 +92,12 @@ struct auxtrace_record
struct evsel *evsel;
bool found_etm = false;
struct perf_pmu *found_spe = NULL;
+   struct perf_pmu *found_ptt = NULL;
struct perf_pmu **arm_spe_pmus = NULL;
+   struct perf_pmu **hisi_ptt_pmus = NULL;
+
int nr_spes = 0;
+   int nr_ptts = 0;
int i = 0;
 
if (!evlist)
@@ -66,13 +105,14 @@ struct auxtrace_record
 
cs_etm_pmu = perf_pmu__find(CORESIGHT_ETM_PMU_NAME);
arm_spe_pmus = find_all_arm_spe_pmus(_spes, err);
+   hisi_ptt_pmus = find_all_hisi_ptt_pmus(_ptts, err);
 
evlist__for_each_entry(evlist, evsel) {
if (cs_etm_pmu &&
evsel->core.attr.type ==

[PATCH v2 2/6] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex
integrated Endpoint(RCiEP) device, providing the capability
to dynamically monitor and tune the PCIe traffic(tune),
and trace the TLP headers(trace).

Add the driver for the device to enable the trace function. The driver
will create PMU device for each PTT device, and users can start trace
through perf command.

Signed-off-by: Yicong Yang 
---
 drivers/Makefile   |1 +
 drivers/hwtracing/Kconfig  |2 +
 drivers/hwtracing/hisilicon/Kconfig|8 +
 drivers/hwtracing/hisilicon/Makefile   |2 +
 drivers/hwtracing/hisilicon/hisi_ptt.c | 1146 
 5 files changed, 1159 insertions(+)
 create mode 100644 drivers/hwtracing/hisilicon/Kconfig
 create mode 100644 drivers/hwtracing/hisilicon/Makefile
 create mode 100644 drivers/hwtracing/hisilicon/hisi_ptt.c

diff --git a/drivers/Makefile b/drivers/Makefile
index be5d40ae1488..bb0dc9b55ea2 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -176,6 +176,7 @@ obj-$(CONFIG_USB4)  += thunderbolt/
 obj-$(CONFIG_CORESIGHT)+= hwtracing/coresight/
 obj-y  += hwtracing/intel_th/
 obj-$(CONFIG_STM)  += hwtracing/stm/
+obj-$(CONFIG_HISI_PTT) += hwtracing/hisilicon/
 obj-$(CONFIG_ANDROID)  += android/
 obj-$(CONFIG_NVMEM)+= nvmem/
 obj-$(CONFIG_FPGA) += fpga/
diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
index 13085835a636..e3796b17541a 100644
--- a/drivers/hwtracing/Kconfig
+++ b/drivers/hwtracing/Kconfig
@@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
 
 source "drivers/hwtracing/intel_th/Kconfig"
 
+source "drivers/hwtracing/hisilicon/Kconfig"
+
 endmenu
diff --git a/drivers/hwtracing/hisilicon/Kconfig 
b/drivers/hwtracing/hisilicon/Kconfig
new file mode 100644
index ..9c3ab80d99fe
--- /dev/null
+++ b/drivers/hwtracing/hisilicon/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config HISI_PTT
+   tristate "HiSilicon PCIe Tune and Trace Device"
+   depends on PCI && HAS_DMA && HAS_IOMEM
+   help
+ HiSilicon PCIe Tune and Trace Device exist as a PCIe RCiEP
+ device, provides support for PCIe traffic tuning and
+ tracing TLP headers to the memory.
diff --git a/drivers/hwtracing/hisilicon/Makefile 
b/drivers/hwtracing/hisilicon/Makefile
new file mode 100644
index ..908c09a98161
--- /dev/null
+++ b/drivers/hwtracing/hisilicon/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
diff --git a/drivers/hwtracing/hisilicon/hisi_ptt.c 
b/drivers/hwtracing/hisilicon/hisi_ptt.c
new file mode 100644
index ..e11e9b6cc2a8
--- /dev/null
+++ b/drivers/hwtracing/hisilicon/hisi_ptt.c
@@ -0,0 +1,1146 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for HiSilicon PCIe tune and trace device
+ *
+ * Copyright (c) 2021 HiSilicon Limited.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define HISI_PTT_TRACE_ADDR_SIZE   0x0800
+#define HISI_PTT_TRACE_ADDR_BASE_LO_0  0x0810
+#define HISI_PTT_TRACE_ADDR_BASE_HI_0  0x0814
+#define HISI_PTT_TRACE_ADDR_STRIDE 0x8
+#define HISI_PTT_TRACE_CTRL0x0850
+#define   HISI_PTT_TRACE_CTRL_EN   BIT(0)
+#define   HISI_PTT_TRACE_CTRL_RST  BIT(1)
+#define   HISI_PTT_TRACE_CTRL_RXTX_SEL GENMASK(3, 2)
+#define   HISI_PTT_TRACE_CTRL_TYPE_SEL GENMASK(7, 4)
+#define   HISI_PTT_TRACE_CTRL_DATA_FORMAT  BIT(14)
+#define   HISI_PTT_TRACE_CTRL_FILTER_MODE  BIT(15)
+#define   HISI_PTT_TRACE_CTRL_TARGET_SEL   GENMASK(31, 16)
+#define HISI_PTT_TRACE_INT_STAT0x0890
+#define   HISI_PTT_TRACE_INT_STAT_MASK GENMASK(3, 0)
+#define HISI_PTT_TRACE_INT_MASK0x0894
+#define HISI_PTT_TRACE_WR_STS  0x08a0
+#define   HISI_PTT_TRACE_WR_STS_WRITE  GENMASK(27, 0)
+#define   HISI_PTT_TRACE_WR_STS_BUFFER GENMASK(29, 28)
+#define HISI_PTT_TRACE_STS 0x08b0
+#define   HISI_PTT_TRACE_IDLE  BIT(0)
+#define HISI_PTT_DEVICE_RANGE  0x0fe0
+#define HISI_PTT_LOCATION  0x0fe8
+#define   HISI_PTT_CORE_ID GENMASK(15, 0)
+#define   HISI_PTT_SICL_ID GENMASK(31, 16)
+
+#define HISI_PTT_TRACE_DMA_IRQ 0
+#define HISI_PTT_TRACE_BUFLETS_CNT 4
+#define HISI_PTT_TRACE_BUFLET_SIZE SZ_4M
+#define HISI_PTT_TRACE_BUFFER_SIZE (HISI_PTT_TRACE_BUFLET_SIZE * \
+HISI_PTT_TRACE_BUFLETS_CNT)
+#define HISI_PTT_FILTER_UPDATE_FIFO_SIZE   16
+
+/* Delay time for filter updating work */
+#define HISI_PTT_WORK_DELAY_MS 100UL
+/* Wait time for DMA hardware to reset */
+#define HISI_PTT_RESET_WAIT_MS 1000UL
+/* Poll timeout and

[PATCH v2 5/6] docs: Add HiSilicon PTT device driver documentation

Document the introduction and usage of HiSilicon PTT device driver.

Signed-off-by: Yicong Yang 
---
 Documentation/trace/hisi-ptt.rst | 305 +++
 1 file changed, 305 insertions(+)
 create mode 100644 Documentation/trace/hisi-ptt.rst

diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst
new file mode 100644
index ..a7ef6438e4aa
--- /dev/null
+++ b/Documentation/trace/hisi-ptt.rst
@@ -0,0 +1,305 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+HiSilicon PCIe Tune and Trace device
+==
+
+Introduction
+
+
+HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
+integrated Endpoint (RCiEP) device, providing the capability
+to dynamically monitor and tune the PCIe link's events (tune),
+and trace the TLP headers (trace). The two functions are independent,
+but is recommended to use them together to analyze and enhance the
+PCIe link's performance.
+
+On Kunpeng 930 SoC, the PCIe Root Complex is composed of several
+PCIe cores. Each PCIe core includes several Root Ports and a PTT
+RCiEP, like below. The PTT device is capable of tuning and
+tracing the link of the PCIe core.
+::
+  +--Core 0---+
+  |   |   [   PTT   ] |
+  |   |   [Root Port]---[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+Root Complex  |--Core 1---+
+  |   |   [   PTT   ] |
+  |   |   [Root Port]---[ Switch ]---[Endpoint]
+  |   |   [Root Port]---[Endpoint] `-[Endpoint]
+  |   |   [Root Port]---[Endpoint]
+  +---+
+
+The PTT device driver registers PMU device for each PTT device.
+The name of each PTT device is composed of 'hisi_ptt' prefix with
+the id of the SICL and the Core where it locates. The Kunpeng 930
+SoC encapsulates multiple CPU dies (SCCL, Super CPU Cluster) and
+IO dies (SICL, Super I/O Cluster), where there's one PCIe Root
+Complex for each SICL.
+::
+/sys/devices/hisi_ptt_
+
+Tune
+
+
+PTT tune is designed for monitoring and adjusting PCIe link parameters 
(events).
+Currently we support events in 4 classes. The scope of the events
+covers the PCIe core to which the PTT device belongs.
+
+Each event is presented as a file under $(PTT PMU dir)/tune, and
+mostly a simple open/read/write/close cycle will be used to tune
+the event.
+::
+$ cd /sys/devices/hisi_ptt_/tune
+$ ls
+qos_tx_cplqos_tx_npqos_tx_p
+tx_path_rx_req_alloc_buf_level
+tx_path_tx_req_alloc_buf_level
+$ cat qos_tx_dp
+1
+$ echo 2 > qos_tx_dp
+$ cat qos_tx_dp
+2
+
+Current value (numerical value) of the event can be simply read
+from the file, and the desired value written to the file to tune.
+
+1. Tx path QoS control
+
+
+The following files are provided to tune the QoS of the tx path of
+the PCIe core.
+
+- qos_tx_cpl: weight of Tx completion TLPs
+- qos_tx_np: weight of Tx non-posted TLPs
+- qos_tx_p: weight of Tx posted TLPs
+
+The weight influences the proportion of certain packets on the PCIe link.
+For example, for the storage scenario, increase the proportion
+of the completion packets on the link to enhance the performance as
+more completions are consumed.
+
+The available tune data of these events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+2. Tx path buffer control
+-
+
+Following files are provided to tune the buffer of tx path of the PCIe core.
+
+- tx_path_rx_req_alloc_buf_level: watermark of Rx requested
+- tx_path_tx_req_alloc_buf_level: watermark of Tx requested
+
+These events influence the watermark of the buffer allocated for each
+type. Rx means the inbound while Tx means outbound. The packets will
+be stored in the buffer first and then posted either when the watermark
+reached or when timed out. For a busy direction, you should increase
+the related buffer watermark to avoid frequently posting and thus
+enhance the performance. In most cases just keep the default value.
+
+The available tune data of above events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+Trace
+=
+
+PTT trace is designed for dumping the TLP headers to the memory, which
+can be used to analyze the transactions and usage condition of the PCIe
+Link. You can choose to filter the traced headers by either requester ID,
+or those downstream of a set of Root Ports on the same core of the PTT
+device. It's also supported to trace the headers of certain type and of
+certain direction.
+
+You can

[PATCH v2 3/6] hwtracing: Add tune function support for HiSilicon PCIe Tune and Trace device

Add tune function for the HiSilicon Tune and Trace device. The interface
of tune is exposed through sysfs attributes of PTT PMU device.

Signed-off-by: Yicong Yang 
---
 drivers/hwtracing/hisilicon/hisi_ptt.c | 167 +
 1 file changed, 167 insertions(+)

diff --git a/drivers/hwtracing/hisilicon/hisi_ptt.c 
b/drivers/hwtracing/hisilicon/hisi_ptt.c
index e11e9b6cc2a8..a1e1fb766376 100644
--- a/drivers/hwtracing/hisilicon/hisi_ptt.c
+++ b/drivers/hwtracing/hisilicon/hisi_ptt.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 
+#define HISI_PTT_TUNING_CTRL   0x
+#define   HISI_PTT_TUNING_CTRL_CODEGENMASK(15, 0)
+#define   HISI_PTT_TUNING_CTRL_SUB GENMASK(23, 16)
+#define HISI_PTT_TUNING_DATA   0x0004
+#define   HISI_PTT_TUNING_DATA_VAL_MASKGENMASK(15, 0)
 #define HISI_PTT_TRACE_ADDR_SIZE   0x0800
 #define HISI_PTT_TRACE_ADDR_BASE_LO_0  0x0810
 #define HISI_PTT_TRACE_ADDR_BASE_HI_0  0x0814
@@ -39,6 +44,8 @@
 #define HISI_PTT_TRACE_INT_STAT0x0890
 #define   HISI_PTT_TRACE_INT_STAT_MASK GENMASK(3, 0)
 #define HISI_PTT_TRACE_INT_MASK0x0894
+#define HISI_PTT_TUNING_INT_STAT   0x0898
+#define   HISI_PTT_TUNING_INT_STAT_MASKBIT(0)
 #define HISI_PTT_TRACE_WR_STS  0x08a0
 #define   HISI_PTT_TRACE_WR_STS_WRITE  GENMASK(27, 0)
 #define   HISI_PTT_TRACE_WR_STS_BUFFER GENMASK(29, 28)
@@ -71,6 +78,12 @@ enum hisi_ptt_trace_status {
HISI_PTT_TRACE_STATUS_ON,
 };
 
+struct hisi_ptt_tune_desc {
+   struct hisi_ptt *hisi_ptt;
+   const char *name;
+   u32 event_code;
+};
+
 struct hisi_ptt_dma_buflet {
struct list_head list;
unsigned int size;
@@ -177,6 +190,159 @@ static inline struct hisi_ptt *to_hisi_ptt(struct pmu 
*pmu)
return container_of(pmu, struct hisi_ptt, hisi_ptt_pmu);
 }
 
+static int hisi_ptt_wait_tuning_finish(struct hisi_ptt *hisi_ptt)
+{
+   u32 val;
+
+   return readl_poll_timeout(hisi_ptt->iobase + HISI_PTT_TUNING_INT_STAT,
+ val, !(val & HISI_PTT_TUNING_INT_STAT_MASK),
+ HISI_PTT_WAIT_POLL_INTERVAL_US,
+ HISI_PTT_WAIT_TIMEOUT_US);
+}
+
+static int hisi_ptt_tune_data_get(struct hisi_ptt *hisi_ptt,
+ u32 event, u16 *data)
+{
+   u32 reg;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+   reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB);
+   reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB,
+ event);
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+
+   /* Write all 1 to indicates it's the read process */
+   writel(~0UL, hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+
+   if (hisi_ptt_wait_tuning_finish(hisi_ptt))
+   return -ETIMEDOUT;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+   reg &= HISI_PTT_TUNING_DATA_VAL_MASK;
+   *data = (u16)reg;
+
+   return 0;
+}
+
+static int hisi_ptt_tune_data_set(struct hisi_ptt *hisi_ptt,
+ u32 event, u16 data)
+{
+   u32 reg;
+
+   reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+   reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB);
+   reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB,
+ event);
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL);
+
+   reg = data;
+   writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_DATA);
+
+   if (hisi_ptt_wait_tuning_finish(hisi_ptt))
+   return -ETIMEDOUT;
+
+   return 0;
+}
+
+static ssize_t hisi_ptt_tune_attr_show(struct device *dev,
+  struct device_attribute *attr,
+  char *buf)
+{
+   struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
+   struct dev_ext_attribute *ext_attr;
+   struct hisi_ptt_tune_desc *desc;
+   int ret;
+   u16 val;
+
+   ext_attr = container_of(attr, struct dev_ext_attribute, attr);
+   desc = ext_attr->var;
+
+   if (!mutex_trylock(_ptt->mutex))
+   return -EBUSY;
+
+   ret = hisi_ptt_tune_data_get(hisi_ptt, desc->event_code, );
+
+   mutex_unlock(_ptt->mutex);
+   return ret ? ret : sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t hisi_ptt_tune_attr_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
+   struct dev_ext_attribute *ext_attr;
+   struct hisi_ptt_tune_desc *desc;
+   int ret;
+   u16 val;
+
+   ext_attr = container_of(attr, struct dev_ext_attribute, attr);
+   desc = ext_attr->var;
+
+   if (kstrtou16(buf, 10, ))
+   return -EINVAL;
+
+   if

[PATCH v2 6/6] MAINTAINERS: Add maintainer for HiSilicon PTT driver

Add maintainer for driver and documentation of HiSilicon PTT device.

Signed-off-by: Yicong Yang 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a2345ce8521..823d495ca0d5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8601,6 +8601,13 @@ W:   http://www.hisilicon.com
 F: Documentation/admin-guide/perf/hisi-pmu.rst
 F: drivers/perf/hisilicon
 
+HISILICON PTT DRIVER
+M: Yicong Yang 
+L: linux-ker...@vger.kernel.org
+S: Maintained
+F: Documentation/trace/hisi-ptt.rst
+F: drivers/hwtracing/hisilicon/
+
 HISILICON QM AND ZIP Controller DRIVER
 M: Zhou Wang 
 L: linux-cry...@vger.kernel.org
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 0/6] Add support for HiSilicon PCIe Tune and Trace device

HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
integrated Endpoint (RCiEP) device, providing the capability
to dynamically monitor and tune the PCIe traffic (tune),
and trace the TLP headers (trace).

PTT tune is designed for monitoring and adjusting PCIe link parameters.
We provide several parameters of the PCIe link. Through the driver,
user can adjust the value of certain parameter to affect the PCIe link
for the purpose of enhancing the performance in certian situation.

PTT trace is designed for dumping the TLP headers to the memory, which
can be used to analyze the transactions and usage condition of the PCIe
Link. Users can choose filters to trace headers, by either requester
ID, or those downstream of a set of Root Ports on the same core of the
PTT device. It's also supported to trace the headers of certain type and
of certain direction.

The driver registers a PMU device for each PTT device. The trace can
be used through `perf record` and the traced headers can be decoded
by `perf report`. The perf command support for the device is also
added in this patchset. The tune can be used through the sysfs
attributes of related PMU device. See the documentation for the
detailed usage.

Change since v1:
- switch the user interface of trace to perf from debugfs
- switch the user interface of tune to sysfs from debugfs
- add perf tool support to start trace and decode the trace data
- address the comments of documentation from Bjorn
- add RMR[1] support of the device as trace works in RMR mode or
  direct DMA mode. RMR support is achieved by common APIs rather
  than the APIs implemented in [1].
Link: 
https://lore.kernel.org/lkml/1618654631-42454-1-git-send-email-yangyic...@hisilicon.com/

[1] 
https://lore.kernel.org/linux-acpi/20210805080724.480-1-shameerali.kolothum.th...@huawei.com/

Qi Liu (1):
  perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

Yicong Yang (5):
  iommu: Export iommu_{get,put}_resv_regions()
  hwtracing: Add trace function support for HiSilicon PCIe Tune and
Trace device
  hwtracing: Add tune function support for HiSilicon PCIe Tune and Trace
device
  docs: Add HiSilicon PTT device driver documentation
  MAINTAINERS: Add maintainer for HiSilicon PTT driver

 Documentation/trace/hisi-ptt.rst  |  305 
 MAINTAINERS   |7 +
 drivers/Makefile  |1 +
 drivers/hwtracing/Kconfig |2 +
 drivers/hwtracing/hisilicon/Kconfig   |8 +
 drivers/hwtracing/hisilicon/Makefile  |2 +
 drivers/hwtracing/hisilicon/hisi_ptt.c| 1313 +
 drivers/iommu/iommu.c |2 +
 include/linux/iommu.h |4 +-
 tools/perf/arch/arm/util/auxtrace.c   |   56 +-
 tools/perf/arch/arm/util/pmu.c|3 +
 tools/perf/arch/arm64/util/Build  |2 +-
 tools/perf/arch/arm64/util/hisi_ptt.c |  195 +++
 tools/perf/util/Build |2 +
 tools/perf/util/auxtrace.c|4 +
 tools/perf/util/auxtrace.h|1 +
 tools/perf/util/hisi-ptt-decoder/Build|1 +
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.c   |  170 +++
 .../hisi-ptt-decoder/hisi-ptt-pkt-decoder.h   |   28 +
 tools/perf/util/hisi_ptt.c|  228 +++
 tools/perf/util/hisi_ptt.h|   28 +
 21 files changed, 2356 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/trace/hisi-ptt.rst
 create mode 100644 drivers/hwtracing/hisilicon/Kconfig
 create mode 100644 drivers/hwtracing/hisilicon/Makefile
 create mode 100644 drivers/hwtracing/hisilicon/hisi_ptt.c
 create mode 100644 tools/perf/arch/arm64/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/Build
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.c
 create mode 100644 tools/perf/util/hisi-ptt-decoder/hisi-ptt-pkt-decoder.h
 create mode 100644 tools/perf/util/hisi_ptt.c
 create mode 100644 tools/perf/util/hisi_ptt.h

-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/6] iommu: Export iommu_{get,put}_resv_regions()