Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-08-10 Thread Ming Lei
On Tue, Aug 10, 2021 at 10:36:47AM +0100, John Garry wrote: > On 28/07/2021 16:17, Ming Lei wrote: > > > > > Have you tried turning off the IOMMU to ensure that this is really > > > > > just > > > > > an IOMMU problem? > > > > > > > > > > You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-08-10 Thread John Garry
On 28/07/2021 16:17, Ming Lei wrote: Have you tried turning off the IOMMU to ensure that this is really just an IOMMU problem? You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent to disabling for kernel

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-28 Thread Robin Murphy
On 2021-07-28 16:17, Ming Lei wrote: On Wed, Jul 28, 2021 at 11:38:18AM +0100, John Garry wrote: On 28/07/2021 02:32, Ming Lei wrote: On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote: On 23/07/2021 11:21, Ming Lei wrote: Thanks, I was also going to suggest the latter, since it's what

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-28 Thread Ming Lei
On Wed, Jul 28, 2021 at 11:38:18AM +0100, John Garry wrote: > On 28/07/2021 02:32, Ming Lei wrote: > > On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote: > > > On 23/07/2021 11:21, Ming Lei wrote: > > > > > Thanks, I was also going to suggest the latter, since it's what > > > > >

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-28 Thread John Garry
On 28/07/2021 02:32, Ming Lei wrote: On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote: On 23/07/2021 11:21, Ming Lei wrote: Thanks, I was also going to suggest the latter, since it's what arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most indicative of where the slowness

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-27 Thread Ming Lei
On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote: > > On 23/07/2021 11:21, Ming Lei wrote: > >> Thanks, I was also going to suggest the latter, since it's what > >> arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most > >> indicative of where the slowness most likely stems

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-27 Thread Robin Murphy
On 2021-07-23 11:21, Ming Lei wrote: On Thu, Jul 22, 2021 at 06:40:18PM +0100, Robin Murphy wrote: On 2021-07-22 16:54, Ming Lei wrote: [...] If you are still keen to investigate more, then can try either of these: - add iommu.strict=0 to the cmdline - use perf record+annotate to find the

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-26 Thread John Garry
On 23/07/2021 11:21, Ming Lei wrote: Thanks, I was also going to suggest the latter, since it's what arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most indicative of where the slowness most likely stems from. The improvement from 'iommu.strict=0' is very small: Have you

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-23 Thread Ming Lei
On Thu, Jul 22, 2021 at 06:40:18PM +0100, Robin Murphy wrote: > On 2021-07-22 16:54, Ming Lei wrote: > [...] > > > If you are still keen to investigate more, then can try either of these: > > > > > > - add iommu.strict=0 to the cmdline > > > > > > - use perf record+annotate to find the hotspot >

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread Robin Murphy
On 2021-07-22 16:54, Ming Lei wrote: [...] If you are still keen to investigate more, then can try either of these: - add iommu.strict=0 to the cmdline - use perf record+annotate to find the hotspot - For this you need to enable psuedo-NMI with 2x steps: CONFIG_ARM64_PSEUDO_NMI=y in

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread Ming Lei
On Thu, Jul 22, 2021 at 12:12:05PM +0100, John Garry wrote: > On 22/07/2021 11:19, Ming Lei wrote: > > > If you check below, you can see that cpu4 services an NVMe irq. From > > > checking htop, during the test that cpu is at 100% load, which I put the > > > performance drop (vs cpu0) down to. > >

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread John Garry
On 22/07/2021 13:53, Marc Zyngier wrote: Hi John, [...]     Your kernel log should show:     [    0.00] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1 synchronisation Unrelated, but you seem to be running with ICC_CTLR_EL3.PMHE set, which makes the overhead of pseudo-NMIs much

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread Marc Zyngier
On 2021-07-22 12:12, John Garry wrote: Hi John, [...] Your kernel log should show: [0.00] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1 synchronisation Unrelated, but you seem to be running with ICC_CTLR_EL3.PMHE set, which makes the overhead of pseudo-NMIs much higher

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread John Garry
On 22/07/2021 11:19, Ming Lei wrote: If you check below, you can see that cpu4 services an NVMe irq. From checking htop, during the test that cpu is at 100% load, which I put the performance drop (vs cpu0) down to. nvme.poll_queues is 2 in my test, and no irq is involved. But the irq mode fio

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread Ming Lei
On Thu, Jul 22, 2021 at 11:05:00AM +0100, John Garry wrote: > On 22/07/2021 08:58, Ming Lei wrote: > > On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote: > > > On 21/07/2021 10:59, Ming Lei wrote: > > > > > I have now removed that from the tree, so please re-pull. > > > > Now the kernel

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread John Garry
On 22/07/2021 08:58, Ming Lei wrote: On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote: On 21/07/2021 10:59, Ming Lei wrote: I have now removed that from the tree, so please re-pull. Now the kernel can be built successfully, but not see obvious improvement on the reported issue:

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread Ming Lei
On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote: > On 21/07/2021 10:59, Ming Lei wrote: > > > I have now removed that from the tree, so please re-pull. > > Now the kernel can be built successfully, but not see obvious improvement > > on the reported issue: > > > >

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-21 Thread Ming Lei
On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote: > On 21/07/2021 10:59, Ming Lei wrote: > > > I have now removed that from the tree, so please re-pull. > > Now the kernel can be built successfully, but not see obvious improvement > > on the reported issue: > > > >

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-21 Thread John Garry
On 21/07/2021 10:59, Ming Lei wrote: I have now removed that from the tree, so please re-pull. Now the kernel can be built successfully, but not see obvious improvement on the reported issue: [root@ampere-mtjade-04 ~]# uname -a Linux ampere-mtjade-04.khw4.lab.eng.bos.redhat.com

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-21 Thread Ming Lei
On Wed, Jul 21, 2021 at 10:23:38AM +0100, John Garry wrote: > On 21/07/2021 02:40, Ming Lei wrote: > > > I think that you should see a significant performance boost. > > There is build issue, please check your tree: > > > >MODPOST vmlinux.symvers > >MODINFO modules.builtin.modinfo > >

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-21 Thread John Garry
On 21/07/2021 02:40, Ming Lei wrote: I think that you should see a significant performance boost. There is build issue, please check your tree: MODPOST vmlinux.symvers MODINFO modules.builtin.modinfo GEN modules.builtin LD .tmp_vmlinux.btf ld: Unexpected GOT/PLT entries

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-20 Thread Ming Lei
On Mon, Jul 19, 2021 at 05:14:28PM +0100, John Garry wrote: > On 09/07/2021 15:24, Ming Lei wrote: > > > associated compromises. > > Follows the log of 'perf report' > > > > 1) good(run fio from cpus in the nvme's numa node) > > Hi Ming, > > If you're still interested in this issue, as an

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-19 Thread John Garry
On 09/07/2021 15:24, Ming Lei wrote: associated compromises. Follows the log of 'perf report' 1) good(run fio from cpus in the nvme's numa node) Hi Ming, If you're still interested in this issue, as an experiment only you can try my rebased patches here:

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-09 Thread Ming Lei
On Fri, Jul 09, 2021 at 11:26:53AM +0100, Robin Murphy wrote: > On 2021-07-09 09:38, Ming Lei wrote: > > Hello, > > > > I observed that NVMe performance is very bad when running fio on one > > CPU(aarch64) in remote numa node compared with the nvme pci numa node. > > > > Please see the test

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-09 Thread Ming Lei
On Fri, Jul 09, 2021 at 11:16:14AM +0100, Russell King (Oracle) wrote: > On Fri, Jul 09, 2021 at 04:38:09PM +0800, Ming Lei wrote: > > I observed that NVMe performance is very bad when running fio on one > > CPU(aarch64) in remote numa node compared with the nvme pci numa node. > > Have you

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-09 Thread Robin Murphy
On 2021-07-09 12:04, John Garry wrote: On 09/07/2021 11:26, Robin Murphy wrote: n 2021-07-09 09:38, Ming Lei wrote: Hello, I observed that NVMe performance is very bad when running fio on one CPU(aarch64) in remote numa node compared with the nvme pci numa node. Please see the test result[1]

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-09 Thread John Garry
On 09/07/2021 11:26, Robin Murphy wrote: n 2021-07-09 09:38, Ming Lei wrote: Hello, I observed that NVMe performance is very bad when running fio on one CPU(aarch64) in remote numa node compared with the nvme pci numa node. Please see the test result[1] 327K vs. 34.9K. Latency trace shows

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-09 Thread Robin Murphy
On 2021-07-09 09:38, Ming Lei wrote: Hello, I observed that NVMe performance is very bad when running fio on one CPU(aarch64) in remote numa node compared with the nvme pci numa node. Please see the test result[1] 327K vs. 34.9K. Latency trace shows that one big difference is in

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-09 Thread Russell King (Oracle)
On Fri, Jul 09, 2021 at 04:38:09PM +0800, Ming Lei wrote: > I observed that NVMe performance is very bad when running fio on one > CPU(aarch64) in remote numa node compared with the nvme pci numa node. Have you checked the effect of running a memory-heavy process using memory from node 1 while