Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction

2017-07-17 Thread John Garry
+ On 29/06/2017 03:08, Leizhen (ThunderTown) wrote: On 2017/6/28 17:32, Will Deacon wrote: Hi Zhen Lei, Nate (CC'd), Robin and I have been working on something very similar to this series, but this patch is different to what we had planned. More below. On Mon, Jun 26, 2017 at 09:38:46PM

Re: [PATCH v6 3/3] iommu/arm-smmu-v3:Enable ACPI based HiSilicon erratum 161010801

2017-08-23 Thread John Garry
Signed-off-by: Shameer Kolothum --- drivers/iommu/arm-smmu-v3.c | 27 ++- 1 file changed, 22 insertions(+), 5 deletions(-) Please can you also add a devicetree binding with corresponding documentation to enable this workaround on

Re: [RFC v1 7/7] iommu/arm-smmu-v3: Enable ACPI based HiSilicon erratum 161010801

2017-05-17 Thread John Garry
On 16/05/2017 15:03, Shameerali Kolothum Thodi wrote: > Lorenzo made a point that it might be relatively straightforward to just > follow the IORT mapping for the SMMU through to the ITS MADT entry and > pull the ITS geometry out of that. It would certainly be nicer to have > such a helper

Re: [PATCH v3 0/2] acpi/iort, numa: Add numa node mapping for smmuv3 devices

2017-06-08 Thread John Garry
On 08/06/2017 05:44, Ganapatrao Kulkarni wrote: ARM IORT specification(rev. C) has added provision to define proximity domain in SMMUv3 IORT table. Adding required code to parse Proximity domain and set numa_node of smmv3 platform devices. v3: - Addressed Lorenzo Pieralisi comment. v2: -

Re: [PATCH v2 0/8] io-pgtable lock removal

2017-06-26 Thread John Garry
I saw Will has already sent the pull request. But, FWIW, we are seeing roughly the same performance as v1 patchset. For PCI NIC, Zhou again found performance drop goes from ~15->8% with SMMU enabled, and for integrated storage controller [platform device], we still see a drop of about 50%,

Re: [PATCH v2 0/8] io-pgtable lock removal

2017-06-26 Thread John Garry
On 23/06/2017 10:58, Robin Murphy wrote: On 23/06/17 09:47, John Garry wrote: On 22/06/2017 16:53, Robin Murphy wrote: The feedback has been promising, so v2 is just a final update to cover a handful of memory ordering and cosmetic tweaks that came up when Will and I went through this offline

Re: [PATCH v2 0/8] io-pgtable lock removal

2017-06-23 Thread John Garry
On 22/06/2017 16:53, Robin Murphy wrote: The feedback has been promising, so v2 is just a final update to cover a handful of memory ordering and cosmetic tweaks that came up when Will and I went through this offline. Thanks, Robin. Hi Robin, Is it worth us retesting this patchset? If yes,

Re: [PATCH v7 3/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126

2017-06-06 Thread John Garry
On 30/05/2017 13:03, Geetha sowjanya wrote: From: Geetha Sowjanya Cavium ThunderX2 SMMU doesn't support MSI and also doesn't have unique irq separate irq lines lines for gerror, eventq and cmdq-sync. This patch addresses the issue by checking if any

Re: [PATCH 0/8] io-pgtable lock removal

2017-06-15 Thread John Garry
On 15/06/2017 01:40, Ray Jui via iommu wrote: Hi Robin, wangzhou tested this patchset on our SMMUv3-based development board with a 10G PCI NIC card. Currently we see a ~17% performance (throughput) drop when enabling the SMMU, but only a ~8% drop with your patchset. FYI, for our

Re: [PATCH v8 3/5] iommu/of: Add msi address regions reservation helper

2017-10-05 Thread John Garry
On 05/10/2017 12:07, Robin Murphy wrote: On 04/10/17 14:50, Lorenzo Pieralisi wrote: On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote: On 27/09/17 14:32, Shameer Kolothum wrote: From: John Garry <john.ga...@huawei.com> On some platforms msi-controller address region

Re: [PATCH v8 3/5] iommu/of: Add msi address regions reservation helper

2017-10-05 Thread John Garry
On 05/10/2017 13:44, Robin Murphy wrote: On 05/10/17 13:37, John Garry wrote: On 05/10/2017 12:07, Robin Murphy wrote: On 04/10/17 14:50, Lorenzo Pieralisi wrote: On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote: On 27/09/17 14:32, Shameer Kolothum wrote: From: John Garry

Re: [PATCH v6 3/3] iommu/arm-smmu-v3:Enable ACPI based HiSilicon erratum 161010801

2017-08-23 Thread John Garry
On 23/08/2017 17:43, Will Deacon wrote: On Wed, Aug 23, 2017 at 03:29:46PM +0100, John Garry wrote: On 23/08/2017 14:24, Will Deacon wrote: On Wed, Aug 23, 2017 at 02:17:24PM +0100, John Garry wrote: Signed-off-by: Shameer Kolothum <shameerali.kolothum.th...@huawei.com> --- drivers

Re: [PATCH v6 3/3] iommu/arm-smmu-v3:Enable ACPI based HiSilicon erratum 161010801

2017-08-24 Thread John Garry
On 24/08/2017 15:35, Will Deacon wrote: > >>OK, seems reasonable. > >> > >>We would consider blacklisting the device, where/how to do is the question. > >> > >>So the errata is in the GICv3 ITS/PCI host controller, and we just use the > >>in-between SMMU (driver) to provide the workaround. So my

Re: [PATCH v6 3/3] iommu/arm-smmu-v3:Enable ACPI based HiSilicon erratum 161010801

2017-08-23 Thread John Garry
On 23/08/2017 14:24, Will Deacon wrote: On Wed, Aug 23, 2017 at 02:17:24PM +0100, John Garry wrote: Signed-off-by: Shameer Kolothum <shameerali.kolothum.th...@huawei.com> --- drivers/iommu/arm-smmu-v3.c | 27 ++- 1 file changed, 22 insertions(+), 5 deletions(-)

Re: [PATCH v6 3/3] iommu/arm-smmu-v3:Enable ACPI based HiSilicon erratum 161010801

2017-09-01 Thread John Garry
On 10/08/2017 18:27, Will Deacon wrote: On Wed, Aug 09, 2017 at 11:07:15AM +0100, Shameer Kolothum wrote: The HiSilicon erratum 161010801 describes the limitation of HiSilicon platforms Hip06/Hip07 to support the SMMU mappings for MSI transactions. On these platforms GICv3 ITS translator is

Re: [PATCH v6 3/3] iommu/arm-smmu-v3:Enable ACPI based HiSilicon erratum 161010801

2017-09-05 Thread John Garry
Hi Will, Lorenzo, Robin, I have created the patch to add DT support for this erratum. However, currently I have only added support for pci-based devices. I'm a bit stumped on how to add platform device support, or if we should also add support at all. And I would rather ask before sending the

Re: [PATCH 04/20] arm-nommu: use generic dma_noncoherent_ops

2018-05-11 Thread John Garry
On 11/05/2018 08:59, Christoph Hellwig wrote: Switch to the generic noncoherent direct mapping implementation for the nommu dma map implementation. Signed-off-by: Christoph Hellwig --- arch/arc/Kconfig| 1 + arch/arm/Kconfig| 4 +

Re: [PATCH v3 1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout

2018-08-15 Thread John Garry
On 15/08/2018 14:00, Will Deacon wrote: On Wed, Aug 15, 2018 at 01:26:31PM +0100, Robin Murphy wrote: On 15/08/18 11:23, Zhen Lei wrote: The condition "(int)(VAL - sync_idx) >= 0" to break loop in function __arm_smmu_sync_poll_msi requires that sync_idx must be increased monotonously according

Re: Question on iommu_get_domain_for_dev() for DMA map/unmap

2018-08-14 Thread John Garry
On 14/08/2018 11:45, Robin Murphy wrote: Hi John, Hi Robin, On 14/08/18 11:09, John Garry wrote: Hi All, I have a question on function iommu_get_domain_for_dev() in DMA mapping path, and why we need to get+put a reference to the iommu group. The background is that we have been testing

Question on iommu_get_domain_for_dev() for DMA map/unmap

2018-08-14 Thread John Garry
Hi All, I have a question on function iommu_get_domain_for_dev() in DMA mapping path, and why we need to get+put a reference to the iommu group. The background is that we have been testing iperf throughput performance for a PCIe NIC card behind an SMMUv3, with small packets and many threads

Re: [PATCH 1/3] iommu: Add fast hook for getting DMA domains

2018-08-17 Thread John Garry
On 14/08/2018 14:04, Robin Murphy wrote: While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU API callers to retrieve the domain pointer, for DMA ops domains it doesn't scale well for large systems and multi-queue devices, since the momentary refcount adjustment will lead to

Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention

2018-08-14 Thread John Garry
On 14/08/2018 14:04, Robin Murphy wrote: John raised the issue[1] that we have some unnecessary refcount contention in the DMA ops path which shows scalability problems now that we have more real high-performance hardware using iommu-dma. The x86 IOMMU drivers are sidestepping this by stashing

Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention

2018-08-17 Thread John Garry
On 14/08/2018 14:04, Robin Murphy wrote: John raised the issue[1] that we have some unnecessary refcount contention in the DMA ops path which shows scalability problems now that we have more real high-performance hardware using iommu-dma. The x86 IOMMU drivers are sidestepping this by stashing

[PATCH] iommu/arm-smmu-v3: Fix a couple of minor comment typos

2018-08-17 Thread John Garry
Fix some comment typos spotted. Signed-off-by: John Garry diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 5059d09..feef122 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -708,7 +708,7 @@ static void queue_inc_prod(struct arm_smmu_queue

Re: [PATCH 0/4] numa, iommu/smmu: IOMMU/SMMU driver optimization for NUMA systems

2018-08-22 Thread John Garry
On 21/09/2017 09:59, Ganapatrao Kulkarni wrote: Adding numa aware memory allocations used for iommu dma allocation and memory allocated for SMMU stream tables, page walk tables and command queues. With this patch, iperf testing on ThunderX2, with 40G NIC card on NODE 1 PCI shown same

Re: [PATCH 0/4] numa, iommu/smmu: IOMMU/SMMU driver optimization for NUMA systems

2018-08-22 Thread John Garry
On 22/08/2018 15:56, Robin Murphy wrote: Hi John, On 22/08/18 14:44, John Garry wrote: On 21/09/2017 09:59, Ganapatrao Kulkarni wrote: Adding numa aware memory allocations used for iommu dma allocation and memory allocated for SMMU stream tables, page walk tables and command queues

Re: [PATCH v4 2/2] iommu/arm-smmu-v3: avoid redundant CMD_SYNCs if possible

2018-08-30 Thread John Garry
On 19/08/2018 08:51, Zhen Lei wrote: More than two CMD_SYNCs maybe adjacent in the command queue, and the first one has done what others want to do. Drop the redundant CMD_SYNCs can improve IO performance especially under the pressure scene. I did the statistics in my test environment, the

Re: [PATCH v2 17/21] scsi: hisi_sas: Remove depends on HAS_DMA in case of platform dependency

2018-03-16 Thread John Garry
d-by: Mark Brown <broo...@kernel.org> Acked-by: Robin Murphy <robin.mur...@arm.com> Acked-by: John Garry <john.ga...@huawei.com> ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] iommu/arm-smmu-v3: eliminate a potential memory corruption on Hi16xx soc

2018-10-15 Thread John Garry
On 15/10/2018 09:36, Zhen Lei wrote: ITS translation register map: 0x-0x003C Reserved 0x0040 GITS_TRANSLATER 0x0044-0xFFFC Reserved Can you add a better opening than the ITS translation register map? The standard GITS_TRANSLATER register in ITS is only 4 bytes, but

Re: [PATCH v5 2/2] iommu/arm-smmu-v3: Reunify arm_smmu_cmdq_issue_sync()

2018-10-22 Thread John Garry
-by: Robin Murphy Tested-by: John Garry I seem to be getting some boost in the scenarios I tested: Storage controller: 746K IOPS (with) vs 730K (without) NVMe disk: 471K IOPS (with) vs 420K IOPS (without) Note that this is with strict mode set and without the CMD_SYNC optimisation I punted

Re: [PATCH v5 1/2] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock

2018-10-19 Thread John Garry
On 18/10/2018 12:48, John Garry wrote: On 18/10/2018 12:19, Robin Murphy wrote: On 18/10/18 11:55, John Garry wrote: [...] @@ -976,18 +1019,19 @@ static int __arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) { u64 cmd[CMDQ_ENT_DWORDS]; unsigned long flags; -bool wfe

Re: [PATCH v5 1/2] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock

2018-10-18 Thread John Garry
On 17/10/2018 14:56, Robin Murphy wrote: Even without the MSI trick, we can still do a lot better than hogging the entire queue while it drains. All we actually need to do for the necessary guarantee of completion is wait for our particular command to have been consumed - as long as we keep

Re: [PATCH v5 2/2] iommu/arm-smmu-v3: Reunify arm_smmu_cmdq_issue_sync()

2018-10-18 Thread John Garry
On 18/10/2018 12:18, Robin Murphy wrote: Hi John, On 17/10/18 15:38, John Garry wrote: On 17/10/2018 14:56, Robin Murphy wrote: Now that both sync methods are more or less the same shape, we can save some code and levels of indirection by rolling them up together again, with just a couple

Re: [PATCH v5 1/2] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock

2018-10-18 Thread John Garry
On 18/10/2018 12:19, Robin Murphy wrote: On 18/10/18 11:55, John Garry wrote: [...] @@ -976,18 +1019,19 @@ static int __arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) { u64 cmd[CMDQ_ENT_DWORDS]; unsigned long flags; -bool wfe = !!(smmu->features & ARM_SMMU_F

Re: [PATCH v5 2/2] iommu/arm-smmu-v3: Reunify arm_smmu_cmdq_issue_sync()

2018-10-17 Thread John Garry
unnecessary re-building. Signed-off-by: John Garry diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 6947ccf..9d86c29 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -963,14 +963,16 @@ static int __arm_smmu_cmdq_issue_sync_msi(struct

Re: [PATCH v2 1/1] iommu/arm-smmu-v3: eliminate a potential memory corruption on Hi16xx soc

2018-10-30 Thread John Garry
On 30/10/2018 01:52, Leizhen (ThunderTown) wrote: On 2018/10/30 1:59, Will Deacon wrote: On Sat, Oct 20, 2018 at 03:36:54PM +0800, Zhen Lei wrote: The standard GITS_TRANSLATER register in ITS is only 4 bytes, but Hisilicon expands the next 4 bytes to carry some IMPDEF information. That

[PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-08 Thread John Garry
Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Originally-from: Ganapatrao Kulkarni Signed-off-by: John Garry --- This patch was originally posted by Ganapatrao in [1] *. However, after initial review, it was never reposted (due

Re: [PATCH v4 2/9] dmapool: remove checks for dev == NULL

2018-11-12 Thread John Garry
On 12/11/2018 15:42, Tony Battersby wrote: dmapool originally tried to support pools without a device because dma_alloc_coherent() supports allocations without a device. But nobody ended up using dma pools without a device, so the current checks in dmapool.c for pool->dev == NULL are both

Re: [PATCH v8 0/7] Add non-strict mode support for iommu-dma

2018-09-24 Thread John Garry
On 21/09/2018 12:03, Robin Murphy wrote: On 21/09/18 10:29, Robin Murphy wrote: Hi John, On 2018-09-21 10:20 AM, John Garry wrote: On 20/09/2018 17:10, Robin Murphy wrote: Hi all, Hopefully this is the last spin of the series - I've now dropped my light touch and fixed up all the various

Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention

2018-09-20 Thread John Garry
On 17/09/2018 12:20, John Garry wrote: On 14/09/2018 13:48, Will Deacon wrote: Hi Robin, Hi Robin, I just spoke with Dongdong and we will test this version also so that we may provide a "Tested-by" tag. I tested this, so for series: Tested-by: John Garry Thanks, John Th

Re: [PATCH 3/4] dma-debug: Dynamically expand the dma_debug_entry pool

2018-12-04 Thread John Garry
In fact, having got this far in, what I'd quite like to do is to get rid of dma_debug_resize_entries() such that we never need to free things at all, since then we could allocate whole pages as blocks of entries to save on masses of individual slab allocations. On a related topic, is it

Re: [PATCH 3/4] dma-debug: Dynamically expand the dma_debug_entry pool

2018-12-04 Thread John Garry
On 04/12/2018 13:11, Robin Murphy wrote: Hi John, On 03/12/2018 18:23, John Garry wrote: On 03/12/2018 17:28, Robin Murphy wrote: Certain drivers such as large multi-queue network adapters can use pools of mapped DMA buffers larger than the default dma_debug_entry pool of 65536 entries

Re: [PATCH v4] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-12-07 Thread John Garry
On 30/11/2018 11:14, John Garry wrote: From: Ganapatrao Kulkarni Hi Joerg, A friendly reminder. Can you please let me know your position on this patch? Cheers, John Change function __iommu_dma_alloc_pages() to allocate pages for DMA from respective device NUMA node. The ternary operator

Re: [PATCH] dma-debug: Kconfig for PREALLOC_DMA_DEBUG_ENTRIES

2018-12-03 Thread John Garry
On 01/12/2018 16:36, Christoph Hellwig wrote: On Fri, Nov 30, 2018 at 07:39:50PM +, Robin Murphy wrote: I was assuming the point was to also add something like default 131072 if HNS_ENET so that DMA debug doesn't require too much thought from the user. If they still have to notice

Re: [PATCH 3/4] dma-debug: Dynamically expand the dma_debug_entry pool

2018-12-03 Thread John Garry
On 03/12/2018 17:28, Robin Murphy wrote: Certain drivers such as large multi-queue network adapters can use pools of mapped DMA buffers larger than the default dma_debug_entry pool of 65536 entries, with the result that merely probing such a device can cause DMA debug to disable itself during

Re: [PATCH] dma-debug: hns_enet_drv could use more DMA entries

2018-11-30 Thread John Garry
+ Pasting original message at bottom. On 30/11/2018 08:42, Christoph Hellwig wrote: On Thu, Nov 29, 2018 at 10:54:56PM -0500, Qian Cai wrote: /* allow architectures to override this if absolutely required */ #ifndef PREALLOC_DMA_DEBUG_ENTRIES +/* amount of DMA mappings on this driver is

[PATCH v4] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-30 Thread John Garry
, update message] Signed-off-by: John Garry --- Difference: v1->v2: - Add Ganapatrao's tag and change author v2->v3: - removed ternary operator - stopped making pages ** allocation local to device v3->v4: - Update commit message to include motivation for patch, including headline pe

Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-23 Thread John Garry
On 21/11/2018 16:57, Will Deacon wrote: On Wed, Nov 21, 2018 at 04:47:48PM +, John Garry wrote: On 21/11/2018 16:07, Will Deacon wrote: On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote: From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate pages

[PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-21 Thread John Garry
. Signed-off-by: Ganapatrao Kulkarni [JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary operator] Signed-off-by: John Garry diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..4afb1a8 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma

Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-21 Thread John Garry
On 21/11/2018 16:07, Will Deacon wrote: On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote: From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate pages for DMA from respective device NUMA node. The ternary operator which would be for alloc_pages_node

[PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry
From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Signed-off-by: Ganapatrao Kulkarni [JPG: Modifed to use kvzalloc() and fixed indentation] Signed-off-by: John Garry --- Difference v1->v2: - Add Ganapatra

Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry
On 20/11/2018 10:09, Ganapatrao Kulkarni wrote: Hi John, On Tue, Nov 20, 2018 at 3:35 PM John Garry wrote: On 08/11/2018 17:55, John Garry wrote: Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Ping a friendly reminder

Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry
On 08/11/2018 17:55, John Garry wrote: Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Ping a friendly reminder on this patch. Thanks Originally-from: Ganapatrao Kulkarni Signed-off-by: John Garry --- This patch

Re: [PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry
On 20/11/2018 14:20, Robin Murphy wrote: On 20/11/2018 13:42, John Garry wrote: From: Ganapatrao Kulkarni Change function __iommu_dma_alloc_pages() to allocate memory/pages for DMA from respective device NUMA node. Signed-off-by: Ganapatrao Kulkarni [JPG: Modifed to use kvzalloc

Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention

2018-09-17 Thread John Garry
On 14/09/2018 13:48, Will Deacon wrote: Hi Robin, Hi Robin, I just spoke with Dongdong and we will test this version also so that we may provide a "Tested-by" tag. Thanks, John On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote: John raised the issue[1] that we have some

Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure

2019-04-03 Thread John Garry
On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintaine

Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure

2019-04-03 Thread John Garry
On 03/04/2019 09:14, Greg KH wrote: On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the devi

Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release

2019-03-26 Thread John Garry
Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL. Oops... After reversing the order of the calls to arch_teardown_dma_ops() and devres_release_all(), dma_map_ops is still valid, and the DMA memory is now released using __iommu_free_attrs(): +sata_rcar ee30.sata:

Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release

2019-03-26 Thread John Garry
On 26/03/2019 12:31, Geert Uytterhoeven wrote: Hi John, CC robh On Tue, Mar 26, 2019 at 12:42 PM John Garry wrote: Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL. Oops... After reversing the order of the calls to arch_teardown_dma_ops() and devres_release_all

Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure

2019-04-04 Thread John Garry
On 03/04/2019 10:20, John Garry wrote: On 03/04/2019 09:14, Greg KH wrote: On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed th

[PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure

2019-03-28 Thread John Garry
releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry --- For convenience, here is the 2nd half of really_p

Re: [PATCH RFC 1/1] iommu: set the default iommu-dma mode as non-strict

2019-03-06 Thread John Garry
(VFIO) is untrusted, ok. But a malicious driver loaded into the kernel address space would have much easier ways to corrupt the system than to exploit lazy mode... Yes, so that we have no need to consider untrusted drivers. For (3), I agree that we should at least disallow lazy mode if

Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release

2019-03-07 Thread John Garry
On 07/03/2019 14:52, Robin Murphy wrote: Hi John, Hi Robin, ok, fine. It's a pain bisecting another rmmod issue with it... Cheers, On 07/03/2019 14:45, John Garry wrote: [...] Hi guys, Any idea what happened to this fix? It's been in -next for a while (commit 376991db4b64) - I assume

Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release

2019-03-07 Thread John Garry
On 11/02/2019 10:22, Robin Murphy wrote: On 08/02/2019 18:55, Geert Uytterhoeven wrote: Hi Robin, On Fri, Feb 8, 2019 at 6:55 PM Robin Murphy wrote: On 08/02/2019 16:40, Joerg Roedel wrote: On Thu, Feb 07, 2019 at 08:36:53PM +0100, Geert Uytterhoeven wrote: diff --git a/drivers/base/dd.c

Re: [PATCH 1/1] iommu: Add config option to set lazy mode as default

2019-03-22 Thread John Garry
On 22/03/2019 14:11, Zhen Lei wrote: This allows the default behaviour to be controlled by a kernel config option instead of changing the command line for the kernel to include "iommu.strict=0" on ARM64 where this is desired. This is similar to CONFIG_IOMMU_DEFAULT_PASSTHROUGH Note: At

Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure

2019-04-11 Thread John Garry
On 04/04/2019 12:17, John Garry wrote: On 03/04/2019 10:20, John Garry wrote: On 03/04/2019 09:14, Greg KH wrote: On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after d

Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure

2019-04-11 Thread John Garry
to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry So does this "fix" 376991db4b64? If so, should this be added to the patch and also backported to the stable trees? Hi Greg, No, I don't think so. I'd say it supplement

Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode

2019-04-12 Thread John Garry
On 09/04/2019 13:53, Zhen Lei wrote: Currently the IOMMU dma contains 3 modes: passthrough, lazy, strict. The passthrough mode bypass the IOMMU, the lazy mode defer the invalidation of hardware TLBs, and the strict mode invalidate IOMMU hardware TLBs synchronously. The three modes are mutually

Re: [PATCH v8 1/7] iommu: enhance IOMMU default DMA mode build options

2019-05-31 Thread John Garry
-config IOMMU_DEFAULT_PASSTHROUGH -bool "IOMMU passthrough by default" +choice +prompt "IOMMU default DMA mode" depends on IOMMU_API -help - Enable passthrough by default, removing the need to pass in - iommu.passthrough=on or iommu=pt through command line. If

Re: [PATCH v8 1/7] iommu: enhance IOMMU default DMA mode build options

2019-05-30 Thread John Garry
this was not picked up, but modulo (somtimes same) comments below: Reviewed-by: John Garry Signed-off-by: Zhen Lei --- drivers/iommu/Kconfig | 42 +++--- drivers/iommu/iommu.c | 3 ++- 2 files changed, 37 insertions(+), 8 deletions(-) diff --git a/drivers/iommu

Re: [RFC CFT 0/6] Try to reduce lock contention on the SMMUv3 command queue

2019-06-17 Thread John Garry
On 11/06/2019 14:45, Will Deacon wrote: Hi all, This patch series is an attempt to reduce lock contention when inserting commands into the Arm SMMUv3 command queue. Unfortunately, our initial benchmarking has shown mixed results across the board and the changes in the last patch don't appear to

Re: [PATCH v7 1/1] iommu: enhance IOMMU dma mode build options

2019-05-21 Thread John Garry
dma modes on each ARCHs have no change. Signed-off-by: Zhen Lei Apart from more minor comments, FWIW: Reviewed-by: John Garry --- arch/ia64/kernel/pci-dma.c| 2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- arch/s390/pci/pci_dma.c | 2 +- arch

Re: [PATCH v6 1/1] iommu: enhance IOMMU dma mode build options

2019-05-08 Thread John Garry
On 18/04/2019 14:57, Zhen Lei wrote: First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the opportunity to set {lazy|strict} mode as default at build time. Then put the three config options in an choice, make people can only choose one of the three at a time. The default IOMMU

Re: [PATCH] iommu/arm-smmu-v3: add nr_ats_masters to avoid unnecessary operations

2019-08-12 Thread John Garry
On 01/08/2019 13:20, Zhen Lei wrote: When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a smmu domain does not contain any ats master, the operations of arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain() are always executed. This will impact

Re: [PATCH 00/13] Rework IOMMU API to allow for batching of invalidation

2019-08-15 Thread John Garry
Cc: Jonathan Cameron Cc: Vijay Kilary Cc: Joerg Roedel Cc: John Garry Cc: Alex Williamson Cc: Marek Szyprowski Cc: David Woodhouse Will Deacon (13): iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync() iomm

Re: [PATCH 00/13] Rework IOMMU API to allow for batching of invalidation

2019-08-16 Thread John Garry
On 15/08/2019 14:55, Will Deacon wrote: On Thu, Aug 15, 2019 at 12:19:58PM +0100, John Garry wrote: On 14/08/2019 18:56, Will Deacon wrote: If you'd like to play with the patches, then I've also pushed them here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu

Re: [PATCH 4/4] iommu: Document usage of "/sys/kernel/iommu_groups//type" file

2019-08-21 Thread John Garry
On 21/08/2019 03:42, Sai Praneeth Prakhya wrote: The default domain type of an iommu group can be changed using file "/sys/kernel/iommu_groups//type". Hence, document it's usage and more importantly spell out it's limitations and an example. Cc: Christoph Hellwig Cc: Joerg Roedel Cc: Ashok

arm64 iommu groups issue

2019-09-19 Thread John Garry
Hi all, We have noticed a special behaviour on our arm64 D05 board when the SMMU is enabled with regards PCI device iommu groups. This platform does not support ACS, yet we find that all functions for a PCI device are not grouped together: root@ubuntu:/sys# dmesg | grep "Adding to iommu

Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-19 Thread John Garry
+static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync) +{ + u64 cmd_sync[CMDQ_ENT_DWORDS]; + u32 prod; unsigned long flags; - bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); -

Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-24 Thread John Garry
On 11/07/2019 18:19, Will Deacon wrote: +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync) +{ + u64 cmd_sync[CMDQ_ENT_DWORDS]; + u32 prod; unsigned long flags; - bool wfe =

Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-24 Thread John Garry
del Cc: John Garry Cc: Alex Williamson Will Deacon (19): iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync() iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops iommu: Introduce str

Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-24 Thread John Garry
On 24/07/2019 13:20, Will Deacon wrote: On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote: On 11/07/2019 18:19, Will Deacon wrote: This is a significant rework of the RFC I previously posted here: https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com But this time

Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-24 Thread John Garry
On 24/07/2019 13:15, Will Deacon wrote: Hi John, Thanks for reading the code! On Fri, Jul 19, 2019 at 12:04:15PM +0100, John Garry wrote: +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync) +{ + u64

Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-25 Thread John Garry
On 24/07/2019 15:48, Will Deacon wrote: On Wed, Jul 24, 2019 at 03:25:07PM +0100, John Garry wrote: On 24/07/2019 13:20, Will Deacon wrote: On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote: On 11/07/2019 18:19, Will Deacon wrote: This is a significant rework of the RFC I previously

Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-25 Thread John Garry
Hi Will, + old = cmpxchg_relaxed(>q.llq.val, llq.val, head.val); I added some basic debug to the stress test on your branch, and this cmpxchg was failing ~10 times on average on my D06. So we're not using the spinlock now, but this cmpxchg may lack fairness. It definitely

Re: arm64 iommu groups issue

2019-09-19 Thread John Garry
On 19/09/2019 14:25, Robin Murphy wrote: When the port eventually probes it gets a new, separate group. This all seems to be as the built-in module init ordering is as follows: pcieport drv, smmu drv, mlx5 drv I notice that if I build the mlx5 drv as a ko and insert after boot, all functions +

[RFC PATCH 1/6] ACPI/IORT: Set PMCG device parent

2019-09-30 Thread John Garry
In the IORT, a PMCG node includes a node reference to its associated device. Set the PMCG platform device parent device for future referencing. For now, we only consider setting for when the associated component is an SMMUv3. Signed-off-by: John Garry --- drivers/acpi/arm64/iort.c | 34

[RFC PATCH 2/6] iommu/arm-smmu-v3: Record IIDR in arm_smmu_device structure

2019-09-30 Thread John Garry
To allow other devices know the SMMU HW IIDR, record the IIDR contents as the first member of the arm_smmu_device structure. In storing as the first member, it saves exposing SMMU APIs, which are nicely self-contained today. Signed-off-by: John Garry --- drivers/iommu/arm-smmu-v3.c | 5

[RFC PATCH 6/6] ACPI/IORT: Drop code to set the PMCG software-defined model

2019-09-30 Thread John Garry
Now that we can identify a PMCG implementation from the parent SMMUv3 IIDR, drop all the code to match based on the ACPI OEM ID. Signed-off-by: John Garry --- drivers/acpi/arm64/iort.c | 35 +-- include/linux/acpi_iort.h | 8 2 files changed, 1

[RFC PATCH 4/6] perf/smmuv3: Support HiSilicon hip08 (hi1620) IMP DEF events

2019-09-30 Thread John Garry
maybe be optimised in future to reduce structures required. For now, only the l1_tlb event is added for HiSilicon hip08 platform. This platform supports many more IMP DEF events, but I need something better than the electronically translated description of the event to support. Signed-off-by: John

[RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support

2019-09-30 Thread John Garry
OEM ID to the same parent SMMUv3 IIDR matching. For now, we only consider SMMUv3' nodes being the associated node for PMCG. John Garry (6): ACPI/IORT: Set PMCG device parent iommu/arm-smmu-v3: Record IIDR in arm_smmu_device structure perf/smmuv3: Retrieve parent SMMUv3 IIDR perf/smmuv3

[RFC PATCH 3/6] perf/smmuv3: Retrieve parent SMMUv3 IIDR

2019-09-30 Thread John Garry
will have the same IMP DEF events - otherwise, some other secondary matching would need to be done. Signed-off-by: John Garry --- drivers/perf/arm_smmuv3_pmu.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf

[RFC PATCH 5/6] perf/smmuv3: Match implementation options based on parent SMMU IIDR

2019-09-30 Thread John Garry
Currently we match the implementation options based on the ACPI PLATFORM OEM ID. Since we can now match based on the parent SMMUv3 IIDR, switch to this method. Signed-off-by: John Garry --- drivers/perf/arm_smmuv3_pmu.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff

Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support

2019-10-16 Thread John Garry
On 15/10/2019 19:00, Robin Murphy wrote: Hi John, On 30/09/2019 15:33, John Garry wrote: This patchset adds IMP DEF event support for the SMMUv3 PMCG. It is marked as an RFC as the method to identify the PMCG implementation may be a quite disliked. And, in general, the series is somewhat

Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support

2019-10-16 Thread John Garry
Hi Robin, Two significant concerns right off the bat: - It seems more common than not for silicon designers to fail to implement IIDR correctly, so it's only a matter of time before inevitably needing to bring back some firmware-level identifier abstraction (if not already - does Hi161x have

Re: [RFC PATCH 1/6] ACPI/IORT: Set PMCG device parent

2019-10-15 Thread John Garry
Hi Hanjun, Thanks for checking this. */ static int __init iort_add_platform_device(struct acpi_iort_node *node, - const struct iort_dev_config *ops) + const struct iort_dev_config *ops, struct device *parent)

Re: [RFC PATCH 6/6] ACPI/IORT: Drop code to set the PMCG software-defined model

2019-10-15 Thread John Garry
On 15/10/2019 04:06, Hanjun Guo wrote: -/* > - * PMCG model identifiers for use in smmu pmu driver. Please note > - * that this is purely for the use of software and has nothing to > - * do with hardware or with IORT specification. > - */ > -#define IORT_SMMU_V3_PMCG_GENERIC0x /*