Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-05-17 Thread Mark Ruijter
Hi Robin,

I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 
5.18-rc6.

[ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
[ 4879.122015] nvme nvme0: starting error recovery
[ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 
4, Message: local protection error
[ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
[ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
[ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...

I assume this means that the problem has still not been resolved?
If so, I'll try to diagnose the problem.

Thanks,

--Mark

On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" 
 
wrote:

On 2022-02-10 23:58, Martin Oliveira wrote:
> On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>>> Hello,
>>>
>>> We have been hitting an error when running IO over our nvme-of setup, 
using the mlx5 driver and we are wondering if anyone has seen anything 
similar/has any suggestions.
>>>
>>> Both initiator and target are AMD EPYC 7502 machines connected over 
RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a 
single NVMe fabrics device, one physical SSD per namespace.
>>>
>>
>> Thanks for reporting this, if you can bisect the problem on your setup
>> it will help others to help you better.
>>
>> -ck
> 
> Hi Chaitanya,
> 
> I went back to a kernel as old as 4.15 and the problem was still there, 
so I don't know of a good commit to start from.
> 
> I also learned that I can reproduce this with as little as 3 cards and I 
updated the firmware on the Mellanox cards to the latest version.
> 
> I'd be happy to try any tests if someone has any suggestions.

The IOMMU is probably your friend here - one thing that might be worth 
trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
the address reported in subsequent IOMMU faults was previously mapped as 
a valid DMA address (be warned that there will likely be a *lot* of 
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
should also make it easier to tell real DMA IOVAs from rogue physical 
addresses or other nonsense, as real DMA addresses should then look more 
like 0x24d08000.

That could at least help narrow down whether it's some kind of 
use-after-free race or a completely bogus address creeping in somehow.

Robin.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-05-17 Thread Max Gurtovoy via iommu

Hi,

Can you please send the original scenario, setup details and dumps ?

I can't find it in my mailbox.

you can send it directly to me to avoid spam.

-Max.

On 5/17/2022 11:26 AM, Mark Ruijter wrote:

Hi Robin,

I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 
5.18-rc6.

[ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
[ 4879.122015] nvme nvme0: starting error recovery
[ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 
4, Message: local protection error
[ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
[ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
[ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...

I assume this means that the problem has still not been resolved?
If so, I'll try to diagnose the problem.

Thanks,

--Mark

On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" 
 wrote:

 On 2022-02-10 23:58, Martin Oliveira wrote:
 > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
 >> On 2/8/22 6:50 PM, Martin Oliveira wrote:
 >>> Hello,
 >>>
 >>> We have been hitting an error when running IO over our nvme-of setup, 
using the mlx5 driver and we are wondering if anyone has seen anything similar/has any 
suggestions.
 >>>
 >>> Both initiator and target are AMD EPYC 7502 machines connected over 
RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single 
NVMe fabrics device, one physical SSD per namespace.
 >>>
 >>
 >> Thanks for reporting this, if you can bisect the problem on your setup
 >> it will help others to help you better.
 >>
 >> -ck
 >
 > Hi Chaitanya,
 >
 > I went back to a kernel as old as 4.15 and the problem was still there, 
so I don't know of a good commit to start from.
 >
 > I also learned that I can reproduce this with as little as 3 cards and I 
updated the firmware on the Mellanox cards to the latest version.
 >
 > I'd be happy to try any tests if someone has any suggestions.

 The IOMMU is probably your friend here - one thing that might be worth
 trying is capturing the iommu:map and iommu:unmap tracepoints to see if
 the address reported in subsequent IOMMU faults was previously mapped as
 a valid DMA address (be warned that there will likely be a *lot* of
 trace generated). With 5.13 or newer, booting with "iommu.forcedac=1"
 should also make it easier to tell real DMA IOVAs from rogue physical
 addresses or other nonsense, as real DMA addresses should then look more
 like 0x24d08000.

 That could at least help narrow down whether it's some kind of
 use-after-free race or a completely bogus address creeping in somehow.

 Robin.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-02-11 Thread Robin Murphy

On 2022-02-10 23:58, Martin Oliveira wrote:

On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:

On 2/8/22 6:50 PM, Martin Oliveira wrote:

Hello,

We have been hitting an error when running IO over our nvme-of setup, using the 
mlx5 driver and we are wondering if anyone has seen anything similar/has any 
suggestions.

Both initiator and target are AMD EPYC 7502 machines connected over RDMA using 
a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe 
fabrics device, one physical SSD per namespace.



Thanks for reporting this, if you can bisect the problem on your setup
it will help others to help you better.

-ck


Hi Chaitanya,

I went back to a kernel as old as 4.15 and the problem was still there, so I 
don't know of a good commit to start from.

I also learned that I can reproduce this with as little as 3 cards and I 
updated the firmware on the Mellanox cards to the latest version.

I'd be happy to try any tests if someone has any suggestions.


The IOMMU is probably your friend here - one thing that might be worth 
trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
the address reported in subsequent IOMMU faults was previously mapped as 
a valid DMA address (be warned that there will likely be a *lot* of 
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
should also make it easier to tell real DMA IOVAs from rogue physical 
addresses or other nonsense, as real DMA addresses should then look more 
like 0x24d08000.


That could at least help narrow down whether it's some kind of 
use-after-free race or a completely bogus address creeping in somehow.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-02-10 Thread Martin Oliveira
On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
> On 2/8/22 6:50 PM, Martin Oliveira wrote:
> > Hello,
> >
> > We have been hitting an error when running IO over our nvme-of setup, using 
> > the mlx5 driver and we are wondering if anyone has seen anything 
> > similar/has any suggestions.
> >
> > Both initiator and target are AMD EPYC 7502 machines connected over RDMA 
> > using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a 
> > single NVMe fabrics device, one physical SSD per namespace.
> >
> 
> Thanks for reporting this, if you can bisect the problem on your setup
> it will help others to help you better.
> 
> -ck

Hi Chaitanya,

I went back to a kernel as old as 4.15 and the problem was still there, so I 
don't know of a good commit to start from.

I also learned that I can reproduce this with as little as 3 cards and I 
updated the firmware on the Mellanox cards to the latest version.

I'd be happy to try any tests if someone has any suggestions.

Thanks,
Martin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-02-09 Thread Robin Murphy

On 2022-02-09 02:50, Martin Oliveira wrote:

Hello,

We have been hitting an error when running IO over our nvme-of setup, using the 
mlx5 driver and we are wondering if anyone has seen anything similar/has any 
suggestions.

Both initiator and target are AMD EPYC 7502 machines connected over RDMA using 
a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe 
fabrics device, one physical SSD per namespace.

When running an fio job targeting directly the fabrics devices (no filesystem, 
see script at the end), within a minute or so we start seeing errors like this:

[  408.368677] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x002f address=0x24d08000 flags=0x]
[  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 
4, Message: local protection error
[  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
[  408.380187] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380189] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380191] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  408.380192] 0030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
[  408.380230] nvme nvme15: RECV for CQE 0xce392ed9 failed with status 
local protection error (4)
[  408.380235] nvme nvme15: starting error recovery
[  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
[  408.380246] block nvme15n2: no usable path - requeuing I/O
[  408.380284] block nvme15n5: no usable path - requeuing I/O
[  408.380298] block nvme15n1: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380304] block nvme15n11: no usable path - requeuing I/O
[  408.380330] block nvme15n1: no usable path - requeuing I/O
[  408.380350] block nvme15n2: no usable path - requeuing I/O
[  408.380371] block nvme15n6: no usable path - requeuing I/O
[  408.380377] block nvme15n6: no usable path - requeuing I/O
[  408.380382] block nvme15n4: no usable path - requeuing I/O
[  408.380472] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x002f address=0x24d09000 flags=0x]
[  408.391265] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x002f address=0x24d0a000 flags=0x]
[  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[  415.131898] nvmet: ctrl 1 fatal error occurred!

Occasionally, we've seen the following stack trace:


FWIW this is indicative the scatterlist passed to dma_unmap_sg_attrs() 
was wrong - specifically it looks like an attempt to unmap a region 
that's already unmapped (or was never mapped in the first place). 
Whatever race or data corruption issue is causing that is almost 
certainly happening much earlier, since the IO_PAGE_FAULT logs further 
imply that either some pages have been spuriously unmapped while the 
device was still accessing them, or some DMA address in the scatterlist 
was already bogus by the time it was handed off to the device.


Robin.


[ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
[ 1158.427696] invalid opcode:  [#1] SMP NOPTI
[ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P   OE 
5.13.0-eid-athena-g6fb4e704d11c-dirty #14
[ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 
10/08/2020
[ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
[ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 
f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 
46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
[ 1158.480589] RSP: 0018:abb520587bd0 EFLAGS: 00010206
[ 1158.485812] RAX: 000100061fff RBX: 0010 RCX: 0027
[ 1158.492938] RDX: 30562000 RSI:  RDI: 
[ 1158.500071] RBP: abb520587c08 R08: abb520587bd0 R09: 
[ 1158.507202] R10: 0001 R11: 000ff000 R12: 9984abd9e318
[ 1158.514326] R13: 9984abd9e310 R14: 000100062000 R15: 0001
[ 1158.521452] FS:  () GS:99a36c8c() 
knlGS:
[ 1158.529540] CS:  0010 DS:  ES:  CR0: 80050033
[ 1158.535286] CR2: 7f75b04f1000 CR3: 0001eddd8000 CR4: 00350ee0
[ 1158.542419] Call Trace:
[ 1158.544877]  amd_iommu_unmap+0x2c/0x40
[ 1158.548653]  __iommu_unmap+0xc4/0x170
[ 1158.552344]  iommu_unmap_fast+0xe/0x10
[ 1158.556100]  __iommu_dma_unmap+0x85/0x120
[ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
[ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
[ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
[ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
[ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
[ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
[ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
[ 1158.594027]  

Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-02-09 Thread Chaitanya Kulkarni via iommu
On 2/8/22 6:50 PM, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup, using 
> the mlx5 driver and we are wondering if anyone has seen anything similar/has 
> any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA 
> using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a 
> single NVMe fabrics device, one physical SSD per namespace.
> 

Thanks for reporting this, if you can bisect the problem on your setup
it will help others to help you better.

-ck
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu