On Tue, Aug 10, 2021 at 10:36:47AM +0100, John Garry wrote:
> On 28/07/2021 16:17, Ming Lei wrote:
> > > > > Have you tried turning off the IOMMU to ensure that this is really
> > > > > just
> > > > > an IOMMU problem?
> > > > >
> > > > > You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig
On 28/07/2021 16:17, Ming Lei wrote:
Have you tried turning off the IOMMU to ensure that this is really just
an IOMMU problem?
You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing
cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent to
disabling for kernel
On 2021-07-28 16:17, Ming Lei wrote:
On Wed, Jul 28, 2021 at 11:38:18AM +0100, John Garry wrote:
On 28/07/2021 02:32, Ming Lei wrote:
On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote:
On 23/07/2021 11:21, Ming Lei wrote:
Thanks, I was also going to suggest the latter, since it's what
On Wed, Jul 28, 2021 at 11:38:18AM +0100, John Garry wrote:
> On 28/07/2021 02:32, Ming Lei wrote:
> > On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote:
> > > On 23/07/2021 11:21, Ming Lei wrote:
> > > > > Thanks, I was also going to suggest the latter, since it's what
> > > > >
On 28/07/2021 02:32, Ming Lei wrote:
On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote:
On 23/07/2021 11:21, Ming Lei wrote:
Thanks, I was also going to suggest the latter, since it's what
arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
indicative of where the slowness
On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote:
>
> On 23/07/2021 11:21, Ming Lei wrote:
> >> Thanks, I was also going to suggest the latter, since it's what
> >> arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
> >> indicative of where the slowness most likely stems
On 2021-07-23 11:21, Ming Lei wrote:
On Thu, Jul 22, 2021 at 06:40:18PM +0100, Robin Murphy wrote:
On 2021-07-22 16:54, Ming Lei wrote:
[...]
If you are still keen to investigate more, then can try either of these:
- add iommu.strict=0 to the cmdline
- use perf record+annotate to find the
On 23/07/2021 11:21, Ming Lei wrote:
Thanks, I was also going to suggest the latter, since it's what
arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
indicative of where the slowness most likely stems from.
The improvement from 'iommu.strict=0' is very small:
Have you
On Thu, Jul 22, 2021 at 06:40:18PM +0100, Robin Murphy wrote:
> On 2021-07-22 16:54, Ming Lei wrote:
> [...]
> > > If you are still keen to investigate more, then can try either of these:
> > >
> > > - add iommu.strict=0 to the cmdline
> > >
> > > - use perf record+annotate to find the hotspot
>
On 2021-07-22 16:54, Ming Lei wrote:
[...]
If you are still keen to investigate more, then can try either of these:
- add iommu.strict=0 to the cmdline
- use perf record+annotate to find the hotspot
- For this you need to enable psuedo-NMI with 2x steps:
CONFIG_ARM64_PSEUDO_NMI=y in
On Thu, Jul 22, 2021 at 12:12:05PM +0100, John Garry wrote:
> On 22/07/2021 11:19, Ming Lei wrote:
> > > If you check below, you can see that cpu4 services an NVMe irq. From
> > > checking htop, during the test that cpu is at 100% load, which I put the
> > > performance drop (vs cpu0) down to.
> >
On 22/07/2021 13:53, Marc Zyngier wrote:
Hi John,
[...]
Your kernel log should show:
[ 0.00] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1
synchronisation
Unrelated, but you seem to be running with ICC_CTLR_EL3.PMHE set,
which makes the overhead of pseudo-NMIs much
On 2021-07-22 12:12, John Garry wrote:
Hi John,
[...]
Your kernel log should show:
[0.00] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1
synchronisation
Unrelated, but you seem to be running with ICC_CTLR_EL3.PMHE set,
which makes the overhead of pseudo-NMIs much higher
On 22/07/2021 11:19, Ming Lei wrote:
If you check below, you can see that cpu4 services an NVMe irq. From
checking htop, during the test that cpu is at 100% load, which I put the
performance drop (vs cpu0) down to.
nvme.poll_queues is 2 in my test, and no irq is involved. But the irq mode
fio
On Thu, Jul 22, 2021 at 11:05:00AM +0100, John Garry wrote:
> On 22/07/2021 08:58, Ming Lei wrote:
> > On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote:
> > > On 21/07/2021 10:59, Ming Lei wrote:
> > > > > I have now removed that from the tree, so please re-pull.
> > > > Now the kernel
On 22/07/2021 08:58, Ming Lei wrote:
On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote:
On 21/07/2021 10:59, Ming Lei wrote:
I have now removed that from the tree, so please re-pull.
Now the kernel can be built successfully, but not see obvious improvement
on the reported issue:
On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote:
> On 21/07/2021 10:59, Ming Lei wrote:
> > > I have now removed that from the tree, so please re-pull.
> > Now the kernel can be built successfully, but not see obvious improvement
> > on the reported issue:
> >
> >
On Wed, Jul 21, 2021 at 12:07:22PM +0100, John Garry wrote:
> On 21/07/2021 10:59, Ming Lei wrote:
> > > I have now removed that from the tree, so please re-pull.
> > Now the kernel can be built successfully, but not see obvious improvement
> > on the reported issue:
> >
> >
On 21/07/2021 10:59, Ming Lei wrote:
I have now removed that from the tree, so please re-pull.
Now the kernel can be built successfully, but not see obvious improvement
on the reported issue:
[root@ampere-mtjade-04 ~]# uname -a
Linux ampere-mtjade-04.khw4.lab.eng.bos.redhat.com
On Wed, Jul 21, 2021 at 10:23:38AM +0100, John Garry wrote:
> On 21/07/2021 02:40, Ming Lei wrote:
> > > I think that you should see a significant performance boost.
> > There is build issue, please check your tree:
> >
> >MODPOST vmlinux.symvers
> >MODINFO modules.builtin.modinfo
> >
On 21/07/2021 02:40, Ming Lei wrote:
I think that you should see a significant performance boost.
There is build issue, please check your tree:
MODPOST vmlinux.symvers
MODINFO modules.builtin.modinfo
GEN modules.builtin
LD .tmp_vmlinux.btf
ld: Unexpected GOT/PLT entries
On Mon, Jul 19, 2021 at 05:14:28PM +0100, John Garry wrote:
> On 09/07/2021 15:24, Ming Lei wrote:
> > > associated compromises.
> > Follows the log of 'perf report'
> >
> > 1) good(run fio from cpus in the nvme's numa node)
>
> Hi Ming,
>
> If you're still interested in this issue, as an
On 09/07/2021 15:24, Ming Lei wrote:
associated compromises.
Follows the log of 'perf report'
1) good(run fio from cpus in the nvme's numa node)
Hi Ming,
If you're still interested in this issue, as an experiment only you can
try my rebased patches here:
On Fri, Jul 09, 2021 at 11:26:53AM +0100, Robin Murphy wrote:
> On 2021-07-09 09:38, Ming Lei wrote:
> > Hello,
> >
> > I observed that NVMe performance is very bad when running fio on one
> > CPU(aarch64) in remote numa node compared with the nvme pci numa node.
> >
> > Please see the test
On Fri, Jul 09, 2021 at 11:16:14AM +0100, Russell King (Oracle) wrote:
> On Fri, Jul 09, 2021 at 04:38:09PM +0800, Ming Lei wrote:
> > I observed that NVMe performance is very bad when running fio on one
> > CPU(aarch64) in remote numa node compared with the nvme pci numa node.
>
> Have you
On 2021-07-09 12:04, John Garry wrote:
On 09/07/2021 11:26, Robin Murphy wrote:
n 2021-07-09 09:38, Ming Lei wrote:
Hello,
I observed that NVMe performance is very bad when running fio on one
CPU(aarch64) in remote numa node compared with the nvme pci numa node.
Please see the test result[1]
On 09/07/2021 11:26, Robin Murphy wrote:
n 2021-07-09 09:38, Ming Lei wrote:
Hello,
I observed that NVMe performance is very bad when running fio on one
CPU(aarch64) in remote numa node compared with the nvme pci numa node.
Please see the test result[1] 327K vs. 34.9K.
Latency trace shows
On 2021-07-09 09:38, Ming Lei wrote:
Hello,
I observed that NVMe performance is very bad when running fio on one
CPU(aarch64) in remote numa node compared with the nvme pci numa node.
Please see the test result[1] 327K vs. 34.9K.
Latency trace shows that one big difference is in
On Fri, Jul 09, 2021 at 04:38:09PM +0800, Ming Lei wrote:
> I observed that NVMe performance is very bad when running fio on one
> CPU(aarch64) in remote numa node compared with the nvme pci numa node.
Have you checked the effect of running a memory-heavy process using
memory from node 1 while
29 matches
Mail list logo