Re: nvme: IO_PAGE_FAULT logged with Intel SSDPEKKF512G8

2022-01-31 Thread Jörg Rödel
On Tue, Jan 18, 2022 at 06:01:06PM +0100, Paul Menzel wrote:
> > >  $ dmesg --level=err
> > >  [4.194306] nvme :01:00.0: AMD-Vi: Event logged 
> > > [IO_PAGE_FAULT domain=0x000c address=0xc080 flags=0x0050]
> > >  [4.206970] nvme :01:00.0: AMD-Vi: Event logged 
> > > [IO_PAGE_FAULT domain=0x000c address=0xc000 flags=0x0050]

This was caused by a DMA read to a write-only page. Looks like a bug in
the driver or the devices firmware.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: nvme: IO_PAGE_FAULT logged with Intel SSDPEKKF512G8

2022-01-18 Thread Paul Menzel

Dear Keith,


Thank you for your quick response.


Am 18.01.22 um 17:53 schrieb Keith Busch:

On Tue, Jan 18, 2022 at 03:32:45PM +0100, Paul Menzel wrote:

On a Dell OptiPlex 5055 with an Intel SSDPEKKF512G8, Linux 5.10.82 reported
an IO_PAGE_FAULT error. This is the first and only time this has happened.

 $ dmesg --level=err
 [4.194306] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x000c address=0xc080 flags=0x0050]
 [4.206970] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x000c address=0xc000 flags=0x0050]
 [7.327820] kfd kfd: VERDE  not supported in kfd
 $ lspci -nn -s 01:00.0
 01:00.0 Non-Volatile memory controller [0108]: Intel Corporation SSD Pro 
7600p/760p/E 6100p Series [8086:f1a6] (rev 03)


I think it's a bug with the iommu implementation.


That would surprise me, but I am adding Jörg and Suravee to the 
recipient list. Last time, I saw an IO_PAGE_FAULT, it was a bug in the 
amdgpu driver.



If it causes problems, you can typically work around it with kernel
parameter "iommu=soft".


I have not noticed any problems yet.


Kind regards,

Paul


PS: No idea, if useful, but I include the content of `/proc/iomem`:

$ sudo more /proc/iomem
-0fff : Reserved
1000-00087fff : System RAM
00088000-00088fff : Reserved
00089000-0009efff : System RAM
0009f000-000b : Reserved
  000a-000b : PCI Bus :00
000c-000c3fff : PCI Bus :00
000c4000-000c7fff : PCI Bus :00
000c8000-000cbfff : PCI Bus :00
000cc000-000c : PCI Bus :00
000d-000d3fff : PCI Bus :00
000d4000-000d7fff : PCI Bus :00
000d8000-000dbfff : PCI Bus :00
000dc000-000d : PCI Bus :00
000e-000e3fff : PCI Bus :00
000e4000-000e7fff : PCI Bus :00
000e8000-000ebfff : PCI Bus :00
000ec000-000e : PCI Bus :00
000f-000f : System ROM
0010-09cf : System RAM
  0500-05e03316 : Kernel code
  0600-063a8fff : Kernel rodata
  0640-06762eff : Kernel data
  06d31000-06ff : Kernel bss
09d0-09e6 : Reserved
09e7-7afb5fff : System RAM
7afb6000-7afb6fff : Reserved
7afb7000-7afbbfff : System RAM
7afbc000-7afbcfff : Reserved
7afbd000-dadbefff : System RAM
dadbf000-dafbefff : Unknown E820 type
dafbf000-dcfbefff : Reserved
dcfbf000-defbefff : ACPI Non-volatile Storage
defbf000-deffefff : ACPI Tables
defff000-deff : System RAM
df00-dfff : Reserved
e000-f7ff : PCI Bus :00
  e000-efff : PCI Bus :06
e000-efff : :06:00.0
  f000-f00f : PCI Bus :02
f000-f00f : PCI Bus :03
  f000-f00f : PCI Bus :04
f000-f000 : :04:00.0
  f000-f000 : tg3
f001-f001 : :04:00.0
  f001-f001 : tg3
f002-f002 : :04:00.0
  f002-f002 : tg3
  f010-f01f : PCI Bus :08
f010-f0107fff : :08:00.3
  f010-f0107fff : ICH HD audio
f0108000-f0108fff : :08:00.2
  f0108000-f0108fff : ahci
  f020-f04f : PCI Bus :07
f020-f02f : :07:00.3
  f020-f02f : xhci-hcd
f030-f03f : :07:00.2
f040-f0401fff : :07:00.2
  f050-f05f : PCI Bus :06
f050-f053 : :06:00.0
f054-f0543fff : :06:00.1
  f054-f0543fff : ICH HD audio
f056-f057 : :06:00.0
  f060-f06f : PCI Bus :02
f060-f061 : :02:00.1
  f060-f061 : ahci
f062-f0627fff : :02:00.0
  f062-f0627fff : xhci-hcd
f068-f06f : :02:00.1
  f070-f07f : PCI Bus :01
f070-f0703fff : :01:00.0
  f070-f0703fff : nvme
f800-fbff : PCI MMCONFIG  [bus 00-3f]
  f800-fbff : Reserved
fc00-feaf : PCI Bus :00
  fc00-fc07 : amd_iommu
  fdf0-fdff : pnp 00:00
fec0-fec00fff : Reserved
  fec0-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec1-fec10fff : Reserved
fec3-fec30fff : AMDIF030:00
fed0-fed003ff : HPET 0
  fed0-fed003ff : PNP0103:00
fed4-fed44fff : MSFT0101:00
fed8-fed80fff : Reserved
fed81500-fed818ff : AMDI0030:00
fee0-fee00fff : Local APIC
  fee0-fee00fff : pnp 00:00
ff00- : Reserved
  ff00- : pnp 00:03
1-81eff : System RAM
81f00-81fff : RAM buffer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: nvme: IO_PAGE_FAULT logged with Intel SSDPEKKF512G8

2022-01-18 Thread Keith Busch
On Tue, Jan 18, 2022 at 03:32:45PM +0100, Paul Menzel wrote:
> On a Dell OptiPlex 5055 with an Intel SSDPEKKF512G8, Linux 5.10.82 reported
> an IO_PAGE_FAULT error. This is the first and only time this has happened.
> 
> $ dmesg --level=err
> [4.194306] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x000c address=0xc080 flags=0x0050]
> [4.206970] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x000c address=0xc000 flags=0x0050]
> [7.327820] kfd kfd: VERDE  not supported in kfd
> $ lspci -nn -s 01:00.0
> 01:00.0 Non-Volatile memory controller [0108]: Intel Corporation SSD Pro 
> 7600p/760p/E 6100p Series [8086:f1a6] (rev 03)

I think it's a bug with the iommu implementation. If it causes problems,
you can typically work around it with kernel parameter "iommu=soft".
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


nvme: IO_PAGE_FAULT logged with Intel SSDPEKKF512G8

2022-01-18 Thread Paul Menzel

Dear Linux folks,


On a Dell OptiPlex 5055 with an Intel SSDPEKKF512G8, Linux 5.10.82 
reported an IO_PAGE_FAULT error. This is the first and only time this 
has happened.


$ dmesg --level=err
[4.194306] nvme :01:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x000c address=0xc080 flags=0x0050]
[4.206970] nvme :01:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x000c address=0xc000 flags=0x0050]

[7.327820] kfd kfd: VERDE  not supported in kfd
$ lspci -nn -s 01:00.0
01:00.0 Non-Volatile memory controller [0108]: Intel Corporation 
SSD Pro 7600p/760p/E 6100p Series [8086:f1a6] (rev 03)

$ sudo ./nvme list
Node  SN   Model 
Namespace Usage  Format 
FW Rev
-  
 - 
--  
nvme0n1   BTHH82250YQK512D SSDPEKKF512G8 NVMe INTEL 
512GB   1 512.11  GB / 512.11  GB512   B +  0 B   D03N


Please find the output of `dmesg` attached.


Kind regards,

Paul


PS: Some more info:

$ lspci -tvn
-[:00]-+-00.0  1022:1450
   +-00.2  1022:1451
   +-01.0  1022:1452
   +-01.1-[01]00.0  8086:f1a6
   +-01.3-[02-05]--+-00.0  1022:43bb
   |   +-00.1  1022:43b7
   |   \-00.2-[03-05]--+-00.0-[04]00.0 
14e4:1687

   |   \-01.0-[05]--
   +-02.0  1022:1452
   +-03.0  1022:1452
   +-03.1-[06]--+-00.0  1002:682b
   |\-00.1  1002:aab0
   +-04.0  1022:1452
   +-07.0  1022:1452
   +-07.1-[07]--+-00.0  1022:145a
   |+-00.2  1022:1456
   |\-00.3  1022:145c
   +-08.0  1022:1452
   +-08.1-[08]--+-00.0  1022:1455
   |+-00.2  1022:7901
   |\-00.3  1022:1457
   +-14.0  1022:790b
   +-14.3  1022:790e
   +-18.0  1022:1460
   +-18.1  1022:1461
   +-18.2  1022:1462
   +-18.3  1022:1463
   +-18.4  1022:1464
   +-18.5  1022:1465
   +-18.6  1022:1466
   \-18.7  1022:1467[0.00] Linux version 5.10.82.mx64.414 (r...@invidia.molgen.mpg.de) (gcc 
(GCC) 7.5.0, GNU ld (GNU Binutils) 2.37) #1 SMP Mon Nov 29 14:15:19 CET 2021
[0.00] Command line: BOOT_IMAGE=/boot/bzImage.x86_64 root=LABEL=root ro 
crashkernel=64G-:256M console=ttyS0,115200n8 console=tty0 init=/bin/systemd 
audit=0 random.trust_cpu=on
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00087fff] usable
[0.00] BIOS-e820: [mem 0x00088000-0x00088fff] reserved
[0.00] BIOS-e820: [mem 0x00089000-0x0009efff] usable
[0.00] BIOS-e820: [mem 0x0009f000-0x000b] reserved
[0.00] BIOS-e820: [mem 0x0010-0x09cf] usable
[0.00] BIOS-e820: [mem 0x09d0-0x09e6] reserved
[0.00] BIOS-e820: [mem 0x09e7-0x7afb5fff] usable
[0.00] BIOS-e820: [mem 0x7afb6000-0x7afb6fff] reserved
[0.00] BIOS-e820: [mem 0x7afb7000-0x7afbbfff] usable
[0.00] BIOS-e820: [mem 0x7afbc000-0x7afbcfff] reserved
[0.00] BIOS-e820: [mem 0x7afbd000-0xdadbefff] usable
[0.00] BIOS-e820: [mem 0xdadbf000-0xdafbefff] type 20
[0.00] BIOS-e820: [mem 0xdafbf000-0xdcfbefff] reserved
[0.00] BIOS-e820: [mem 0xdcfbf000-0xdefbefff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdefbf000-0xdeffefff] ACPI data
[0.00] BIOS-e820: [mem 0xdefff000-0xdeff] usable
[0.00] BIOS-e820: [mem 0xdf00-0xdfff] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed80fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00]