Performing verification for Noble.
I started a n2-standard-2 instance on Google cloud, running Noble.
I installed 6.8.0-39-generic from -updates, rebooted, and followed the
instructions in the
testcase.
$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)
Having a look at dmesg:
unknown: run blktests md/001 at 2024-08-08 04:26:39
root[1982]: run blktests md/001
kernel: brd: module loaded
(udev-worker)[1987]: dm-0: Process '/usr/bin/unshare -m /usr/bin/snap
auto-import --mount=/dev/dm-0' failed with exit code 1.
kernel: Key type psk registered
kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
kernel: nvmet_tcp: enabling port 0 (127.0.0.1:4420)
kernel: nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for
NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
kernel: nvme nvme1: creating 2 I/O queues.
kernel: nvme nvme1: mapped 2/0/0 default/read/poll queues.
kernel: nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420,
hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
(udev-worker)[2018]: nvme1n1: Process '/usr/bin/unshare -m /usr/bin/snap
auto-import --mount=/dev/nvme1n1' failed with exit code 1.
(udev-worker)[2018]: md127: Process '/usr/bin/unshare -m /usr/bin/snap
auto-import --mount=/dev/md127' failed with exit code 1.
kernel: md/raid1:md127: active with 1 out of 2 mirrors
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 0 PID: 50 at net/core/skbuff.c:6995
skb_splice_from_iter+0x139/0x370
kernel: Modules linked in: nvme_tcp nvmet_tcp nvmet nvme_keyring brd raid1
cfg80211 8021q garp mrp stp llc binfmt_misc nls_iso8859_1 intel_rapl_msr
intel_rapl_common intel_uncore_frequency_common isst_if_common nfit
crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic
ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl
pvpanic_mmio pvpanic nvme psmouse i2c_piix4 input_leds mac_hid serio_raw
dm_multipath nvme_fabrics nvme_core nvme_auth efi_pstore nfnetlink dmi_sysfs
virtio_rng ip_tables x_tables autofs4
kernel: CPU: 0 PID: 50 Comm: kworker/0:1H Not tainted 6.8.0-39-generic
#39-Ubuntu
kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 06/27/2024
kernel: Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
kernel: RIP: 0010:skb_splice_from_iter+0x139/0x370
kernel: Code: 39 e1 48 8b 53 08 49 0f 47 cc 49 89 cd f6 c2 01 0f 85 c0 01 00 00
66 90 48 89 da 48 8b 12 80 e6 08 0f 84 8e 00 00 00 4d 89 fe <0f> 0b 49 c7 c0 fb
ff ff ff 48 8b 85 68 ff ff ff 41 01 46 70 41 01
kernel: RSP: 0018:ffffbd92001b3a30 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: fffff5f1c48d9b40 RCX: 0000000000001000
kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
kernel: RBP: ffffbd92001b3ad8 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000020e8
kernel: R13: 0000000000001000 R14: ffff96834b496400 R15: ffff96834b496400
kernel: FS: 0000000000000000(0000) GS:ffff968477c00000(0000)
knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007507bcfe5f84 CR3: 000000010b49c002 CR4: 00000000003706f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel: <TASK>
kernel: ? show_regs+0x6d/0x80
kernel: ? __warn+0x89/0x160
kernel: ? skb_splice_from_iter+0x139/0x370
kernel: ? report_bug+0x17e/0x1b0
kernel: ? handle_bug+0x51/0xa0
kernel: ? exc_invalid_op+0x18/0x80
kernel: ? asm_exc_invalid_op+0x1b/0x20
kernel: ? skb_splice_from_iter+0x139/0x370
kernel: tcp_sendmsg_locked+0x352/0xd70
kernel: ? tcp_push+0x159/0x190
kernel: ? tcp_sendmsg_locked+0x9c4/0xd70
kernel: tcp_sendmsg+0x2c/0x50
kernel: inet_sendmsg+0x42/0x80
kernel: sock_sendmsg+0x118/0x150
kernel: nvme_tcp_try_send_data+0x18b/0x4c0 [nvme_tcp]
kernel: nvme_tcp_try_send+0x23c/0x300 [nvme_tcp]
kernel: nvme_tcp_io_work+0x40/0xe0 [nvme_tcp]
kernel: process_one_work+0x16c/0x350
kernel: worker_thread+0x306/0x440
kernel: ? _raw_spin_unlock_irqrestore+0x11/0x60
kernel: ? __pfx_worker_thread+0x10/0x10
kernel: kthread+0xef/0x120
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork+0x44/0x70
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork_asm+0x1b/0x30
kernel: </TASK>
kernel: ---[ end trace 0000000000000000 ]---
kernel: nvme nvme1: failed to send request -5
kernel: nvme nvme1: I/O tag 111 (106f) type 4 opcode 0x0 (I/O Cmd) QID 1 timeout
kernel: nvme nvme1: starting error recovery
kernel: block nvme1n1: no usable path - requeuing I/O
kernel: nvme nvme1: Reconnecting in 10 seconds...
blktests md/001 hangs the system, in this particular scenario.
I then restarted the instance, enabled -proposed2, and installed
6.8.0-41-generic:
6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 2 20:41:06 UTC
2024
I can now run md/001 multiple times, and it passes within a second each time.
The hang now longer occurs.
$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)
[passed]
runtime ... 0.441s
$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)
[passed]
runtime 0.441s ... 0.405s
$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)
[passed]
runtime 0.405s ... 0.410s
$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)
[passed]
runtime 0.410s ... 0.429s
$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)
[passed]
runtime 0.429s ... 0.408s
dmesg has:
unknown: run blktests md/001 at 2024-08-08 05:02:40
root[2377]: run blktests md/001
kernel: brd: module loaded
kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
kernel: nvmet_tcp: enabling port 0 (127.0.0.1:4420)
kernel: nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for
NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
kernel: nvme nvme1: creating 2 I/O queues.
kernel: nvme nvme1: mapped 2/0/0 default/read/poll queues.
kernel: nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420,
hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
kernel: md/raid1:md127: active with 1 out of 2 mirrors
kernel: md127: detected capacity change from 0 to 2093056
kernel: md127: detected capacity change from 2093056 to 0
kernel: md: md127 stopped.
kernel: nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
multipathd[190]: nvme1n1: path already removed
kernel: brd: module unloaded
sudo[2342]: pam_unix(sudo:session): session closed for user root
The 6.8.0-41-generic kernel in -proposed2 fixes the issue. Happy to mark
verified for Noble.
** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2075110
Title:
md: nvme over tcp with a striped underlying md raid device leads to
data corruption
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs