subject:"\[Kernel\-packages\] \[Bug 1896350\] Re\: nbd\: requests can become stuck when disconnecting from server with qemu\-nbd"

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-08-24 Thread Launchpad Bug Tracker

This bug was fixed in the package linux - 5.4.0-125.141

---
linux (5.4.0-125.141) focal; urgency=medium

  * focal/linux: 5.4.0-125.141 -proposed tracker (LP: #1983947)

  * nbd: requests can become stuck when disconnecting from server with qemu-nbd
(LP: #1896350)
- blk-mq: blk-mq: provide forced completion method
- blk-mq: move failure injection out of blk_mq_complete_request
- nbd: don't handle response without a corresponding request message
- nbd: make sure request completion won't concurrent
- nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
- nbd: fix io hung while disconnecting device

  * CVE-2021-33656
- vt: drop old FONT ioctls

  * CVE-2021-33061
- ixgbe: add the ability for the PF to disable VF link state
- ixgbe: add improvement for MDD response functionality
- ixgbevf: add disable link state

 -- Stefan Bader   Wed, 10 Aug 2022 10:17:28
+0200

** Changed in: linux (Ubuntu Focal)
   Status: Fix Committed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-33061

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-33656

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Won't Fix
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Impish:
  Won't Fix
Status in linux source package in Jammy:
  Fix Released
Status in linux source package in Kinetic:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896350

  [Impact]

  After 2516ab1("nbd: only clear the queue on device teardown"), present
  in 4.12-rc1 onward, the ioctl NBD_CLEAR_SOCK can no longer clear
  requests currently being processed. This change was made to fix a race
  between using the NBD_CLEAR_SOCK ioctl to clear requests, and teardown
  of the device clearing requests. This worked for the most part, as
  several years ago systemd was not set up to watch nbd devices for
  changes in their state.

  But after:

  commit f82abfcda58168d9f667e2094d438763531d3fa6
  From: Tony Asleson 
  Date: Fri, 8 Feb 2019 15:47:10 -0600
  Subject: rules: watch metadata changes on nbd devices
  Link: 
https://github.com/systemd/systemd/commit/f82abfcda58168d9f667e2094d438763531d3fa6

  in systemd v242-rc1, nbd* devices were added to a udev rule to watch
  those devices for changes with the inotify subsystem. From man udev:

  > watch
  >   Watch the device node with inotify; when the node is closed after being 
  >   opened for writing, a change uevent is synthesized.
  >
  > nowatch
  >   Disable the watching of a device node with inotify.

  This changed the behaviour of device teardown, since systemd now keeps
  tabs on the device with inotify, outstanding requests cannot be
  cleared as nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER',
  and requests get stuck, never to complete, because a disconnect has
  occurred, and never to timeout, as their timers keep being reset.

  Symptoms of this issue is that the nbd subsystem gets stuck with
  messages like:

  block nbd15: NBD_DISCONNECT
  block nbd15: Send disconnect failed -32
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 30 seconds
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 150 seconds
  ...
  INFO: task qemu-nbd:1267 blocked for more than 120 seconds.
Not tainted 5.15.0-23-generic #23-Ubuntu
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:qemu-nbdstate:D stack:0 pid: 1267 ppid: 1 
flags:0x0002
  Call Trace:
   
   __schedule+0x23d/0x590
   ? call_rcu+0xe/0x10
   schedule+0x4e/0xb0
   blk_mq_freeze_queue_wait+0x69/0xa0
   ? wait_woken+0x70/0x70
   blk_mq_freeze_queue+0x1b/0x30
   nbd_add_socket+0x76/0x1f0 [nbd]
   __nbd_ioctl+0x18b/0x340 [nbd]
   ? security_capable+0x3d/0x60
   nbd_ioctl+0x81/0xb0 [nbd]
   blkdev_ioctl+0x12e/0x270
   ? __fget_files+0x86/0xc0
   block_ioctl+0x46/0x50
   __x64_sys_ioctl+0x91/0xc0
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   

  Additionally, in syslog you will also see systemd-udevd get stuck:

  systemd-udevd[419]: nbd15: Worker [2004] processing SEQNUM=5661 is
  taking a long time

  $ ps aux
  ...
  4191194 root D 0.1 systemd-udevd   -

  We can workaround the issue by adding a higher priority udev rule to
  not watch nbd* devices.

  $ cat << EOF >> /etc/udev/rules.d/97-nbd-device.rules
  # Disable inotify watching of change events for NBD devices
  ACTION=="add|change", KERNEL=="nbd*", OPTIONS:="nowatch"
  EOF

  $ sudo udevadm control --reload-rules
  $

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-08-22 Thread Matthew Ruffell

Performing verification for Focal.

I started a fresh Focal VM, and installed qemu-utils. I then ran
reproducer.sh from the testcase section.

The kernel is 5.4.0-124-generic from -updates.

Within 30 seconds of starting the reproducer, the testcase script hung,
and the following was in dmesg:

Aug 23 04:49:26 focal-nbd kernel: block nbd15: NBD_DISCONNECT
Aug 23 04:49:26 focal-nbd kernel: block nbd15: Send disconnect failed -32
Aug 23 04:49:26 focal-nbd sudo[1804]: pam_unix(sudo:session): session closed 
for user root
Aug 23 04:49:26 focal-nbd sudo[1807]:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; 
USER=root ; COMMAND=/usr/bin/qemu-nbd --connect=/dev/nbd15 --cache=writeback 
--format=qcow2 foo.img
Aug 23 04:49:26 focal-nbd sudo[1807]: pam_unix(sudo:session): session opened 
for user root by ubuntu(uid=0)
Aug 23 04:49:26 focal-nbd kernel: ldm_validate_partition_table(): Disk read 
failed.
Aug 23 04:49:26 focal-nbd kernel: Dev nbd15: unable to read RDB block 0
Aug 23 04:49:26 focal-nbd kernel:  nbd15: unable to read partition table
Aug 23 04:49:56 focal-nbd kernel: block nbd15: Possible stuck request 
4d5cc344: control (read@523988992,36864B). Runtime 30 seconds
Aug 23 04:50:26 focal-nbd systemd-udevd[419]: nbd15: Worker [1198] processing 
SEQNUM=5582 is taking a long time
Aug 23 04:50:27 focal-nbd kernel: block nbd15: Possible stuck request 
4d5cc344: control (read@523988992,36864B). Runtime 60 seconds
Aug 23 04:50:58 focal-nbd kernel: block nbd15: Possible stuck request 
4d5cc344: control (read@523988992,36864B). Runtime 90 seconds
Aug 23 04:51:29 focal-nbd kernel: block nbd15: Possible stuck request 
4d5cc344: control (read@523988992,36864B). Runtime 120 seconds
Aug 23 04:51:59 focal-nbd kernel: block nbd15: Possible stuck request 
4d5cc344: control (read@523988992,36864B). Runtime 150 seconds
Aug 23 04:52:26 focal-nbd systemd-udevd[419]: nbd15: Worker [1198] processing 
SEQNUM=5582 killed
Aug 23 04:52:30 focal-nbd kernel: block nbd15: Possible stuck request 
4d5cc344: control (read@523988992,36864B). Runtime 180 seconds
Aug 23 04:53:27 focal-nbd kernel: INFO: task qemu-nbd:1815 blocked for more 
than 120 seconds.
Aug 23 04:53:27 focal-nbd kernel:   Not tainted 5.4.0-124-generic 
#140-Ubuntu
Aug 23 04:53:27 focal-nbd kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 23 04:53:27 focal-nbd kernel: qemu-nbdD0  1815  1 0x
Aug 23 04:53:27 focal-nbd kernel: Call Trace:
Aug 23 04:53:27 focal-nbd kernel:  __schedule+0x2e3/0x740
Aug 23 04:53:27 focal-nbd kernel:  ? __kfifo_to_user_r+0xa0/0xa0
Aug 23 04:53:27 focal-nbd kernel:  schedule+0x42/0xb0
Aug 23 04:53:27 focal-nbd kernel:  blk_mq_freeze_queue_wait+0x4b/0xb0
Aug 23 04:53:27 focal-nbd kernel:  ? __wake_up_pollfree+0x40/0x40
Aug 23 04:53:27 focal-nbd kernel:  blk_mq_freeze_queue+0x1b/0x20
Aug 23 04:53:27 focal-nbd kernel:  nbd_add_socket+0x5e/0x1d0 [nbd]
Aug 23 04:53:27 focal-nbd kernel:  nbd_ioctl+0x2f7/0x410 [nbd]
Aug 23 04:53:27 focal-nbd kernel:  blkdev_ioctl+0x383/0xa30
Aug 23 04:53:27 focal-nbd kernel:  block_ioctl+0x3d/0x50
Aug 23 04:53:27 focal-nbd kernel:  do_vfs_ioctl+0x407/0x670
Aug 23 04:53:27 focal-nbd kernel:  ? putname+0x4a/0x50
Aug 23 04:53:27 focal-nbd kernel:  ksys_ioctl+0x67/0x90
Aug 23 04:53:27 focal-nbd kernel:  __x64_sys_ioctl+0x1a/0x20
Aug 23 04:53:27 focal-nbd kernel:  do_syscall_64+0x57/0x190
Aug 23 04:53:27 focal-nbd kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 23 04:53:27 focal-nbd kernel: RIP: 0033:0x7fd12b7573ab
Aug 23 04:53:27 focal-nbd kernel: Code: Bad RIP value.
Aug 23 04:53:27 focal-nbd kernel: RSP: 002b:7fd129fa2a18 EFLAGS: 0246 
ORIG_RAX: 0010
Aug 23 04:53:27 focal-nbd kernel: RAX: ffda RBX: 0001 
RCX: 7fd12b7573ab
Aug 23 04:53:27 focal-nbd kernel: RDX: 000b RSI: ab00 
RDI: 000d
Aug 23 04:53:27 focal-nbd kernel: RBP: 7fd129fa2aa8 R08:  
R09: 0001
Aug 23 04:53:27 focal-nbd kernel: R10:  R11: 0246 
R12: 7fd129fa2ab0
Aug 23 04:53:27 focal-nbd kernel: R13: 000d R14: 1f40 
R15: 7fd12b80

I then rebooted, enabled -proposed and installed kernel
5.4.0-125-generic.

I left the reproducer.sh script running for a bit over an hour, and it
was still running perfectly fine when I got back to it. Requests are
still moving smoothly, and no longer getting stuck.

The 5.4.0-125-generic kernel in -proposed fixes the issue. Happy to mark
it verified.

** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Won't Fix
Status

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-08-08 Thread Stefan Bader

** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Won't Fix
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Impish:
  Won't Fix
Status in linux source package in Jammy:
  Fix Released
Status in linux source package in Kinetic:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896350

  [Impact]

  After 2516ab1("nbd: only clear the queue on device teardown"), present
  in 4.12-rc1 onward, the ioctl NBD_CLEAR_SOCK can no longer clear
  requests currently being processed. This change was made to fix a race
  between using the NBD_CLEAR_SOCK ioctl to clear requests, and teardown
  of the device clearing requests. This worked for the most part, as
  several years ago systemd was not set up to watch nbd devices for
  changes in their state.

  But after:

  commit f82abfcda58168d9f667e2094d438763531d3fa6
  From: Tony Asleson 
  Date: Fri, 8 Feb 2019 15:47:10 -0600
  Subject: rules: watch metadata changes on nbd devices
  Link: 
https://github.com/systemd/systemd/commit/f82abfcda58168d9f667e2094d438763531d3fa6

  in systemd v242-rc1, nbd* devices were added to a udev rule to watch
  those devices for changes with the inotify subsystem. From man udev:

  > watch
  >   Watch the device node with inotify; when the node is closed after being 
  >   opened for writing, a change uevent is synthesized.
  >
  > nowatch
  >   Disable the watching of a device node with inotify.

  This changed the behaviour of device teardown, since systemd now keeps
  tabs on the device with inotify, outstanding requests cannot be
  cleared as nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER',
  and requests get stuck, never to complete, because a disconnect has
  occurred, and never to timeout, as their timers keep being reset.

  Symptoms of this issue is that the nbd subsystem gets stuck with
  messages like:

  block nbd15: NBD_DISCONNECT
  block nbd15: Send disconnect failed -32
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 30 seconds
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 150 seconds
  ...
  INFO: task qemu-nbd:1267 blocked for more than 120 seconds.
Not tainted 5.15.0-23-generic #23-Ubuntu
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:qemu-nbdstate:D stack:0 pid: 1267 ppid: 1 
flags:0x0002
  Call Trace:
   
   __schedule+0x23d/0x590
   ? call_rcu+0xe/0x10
   schedule+0x4e/0xb0
   blk_mq_freeze_queue_wait+0x69/0xa0
   ? wait_woken+0x70/0x70
   blk_mq_freeze_queue+0x1b/0x30
   nbd_add_socket+0x76/0x1f0 [nbd]
   __nbd_ioctl+0x18b/0x340 [nbd]
   ? security_capable+0x3d/0x60
   nbd_ioctl+0x81/0xb0 [nbd]
   blkdev_ioctl+0x12e/0x270
   ? __fget_files+0x86/0xc0
   block_ioctl+0x46/0x50
   __x64_sys_ioctl+0x91/0xc0
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   

  Additionally, in syslog you will also see systemd-udevd get stuck:

  systemd-udevd[419]: nbd15: Worker [2004] processing SEQNUM=5661 is
  taking a long time

  $ ps aux
  ...
  4191194 root D 0.1 systemd-udevd   -

  We can workaround the issue by adding a higher priority udev rule to
  not watch nbd* devices.

  $ cat << EOF >> /etc/udev/rules.d/97-nbd-device.rules
  # Disable inotify watching of change events for NBD devices
  ACTION=="add|change", KERNEL=="nbd*", OPTIONS:="nowatch"
  EOF

  $ sudo udevadm control --reload-rules
  $ sudo udevadm trigger

  [Fix]

  The fix relies on infrastructure provided by the flag
  NBD_CMD_INFLIGHT, which was introduced in 5.16, and added to in 5.19.
  We need to backport all commits related to NBD_CMD_INFLIGHT to our
  kernels for the fix to be effective.

  For Focal, Impish and Jammy:

  commit 4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:44 2021 +0800
  Subject: nbd: don't handle response without a corresponding request message
  Link: 
https://github.com/torvalds/linux/commit/4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d

  commit 07175cb1baf4c51051b1fbd391097e349f9a02a9
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:45 2021 +0800
  Subject: nbd: make sure request completion won't concurrent
  Link: 
https://github.com/torvalds/linux/commit/07175cb1baf4c51051b1fbd391097e349f9a02a9

  commit 2895f1831e911ca87d4efdf43e35eb72a0c7e66e
  Author: Yu Kuai 
  Date:   Sat May 21 15:37:46 2022 +0800
  Subject: nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
  Link:

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-07-28 Thread Launchpad Bug Tracker

This bug was fixed in the package linux - 5.15.0-43.46

---
linux (5.15.0-43.46) jammy; urgency=medium

  * jammy/linux: 5.15.0-43.46 -proposed tracker (LP: #1981243)

  * Packaging resync (LP: #1786013)
- debian/dkms-versions -- update from kernel-versions (main/2022.07.11)

  * nbd: requests can become stuck when disconnecting from server with qemu-nbd
(LP: #1896350)
- nbd: don't handle response without a corresponding request message
- nbd: make sure request completion won't concurrent
- nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
- nbd: fix io hung while disconnecting device

  * Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment
events (LP: #1965241)
- PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()
- PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
- [Config] Enable config option CONFIG_PCIE_EDR

  * [SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client
- TP 8010 (LP: #1948626)
- nvme: add CNTRLTYPE definitions for 'identify controller'
- nvme: send uevent on connection up
- nvme: expose cntrltype and dctype through sysfs

  * [UBUNTU 22.04] Kernel oops while removing device from cio_ignore list
(LP: #1980951)
- s390/cio: derive cdev information only for IO-subchannels

  * Jammy Charmed OpenStack deployment fails over connectivity issues when using
converged OVS bridge for control and data planes (LP: #1978820)
- net/mlx5e: TC NIC mode, fix tc chains miss table

  * Hairpin traffic does not work with centralized NAT gw (LP: #1967856)
- net: openvswitch: fix misuse of the cached connection on tuple changes

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
(LP: #1980700)
- ASoC: amd: Add driver data to acp6x machine driver
- ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * AMD ACP 6.x DMIC Supports (LP: #1949245)
- ASoC: amd: add Yellow Carp ACP6x IP register header
- ASoC: amd: add Yellow Carp ACP PCI driver
- ASoC: amd: add acp6x init/de-init functions
- ASoC: amd: add platform devices for acp6x pdm driver and dmic driver
- ASoC: amd: add acp6x pdm platform driver
- ASoC: amd: add acp6x irq handler
- ASoC: amd: add acp6x pdm driver dma ops
- ASoC: amd: add acp6x pci driver pm ops
- ASoC: amd: add acp6x pdm driver pm ops
- ASoC: amd: enable Yellow carp acp6x drivers build
- ASoC: amd: create platform device for acp6x machine driver
- ASoC: amd: add YC machine driver using dmic
- ASoC: amd: enable Yellow Carp platform machine driver build
- ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
- [Config] Enable AMD ACP 6 DMIC Support

  * [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure
Execution (LP: #1979296)
- KVM: s390: pv: add macros for UVC CC values
- KVM: s390: pv: avoid stalls when making pages secure

  * [22.04 FEAT] KVM: Attestation support for Secure Execution (crypto)
(LP: #1959973)
- drivers/s390/char: Add Ultravisor io device
- s390/uv_uapi: depend on CONFIG_S390
- [Config] CONFIG_S390_UV_UAPI=y for s390x

  * CVE-2022-1679
- SAUCE: ath9k: fix use-after-free in ath9k_hif_usb_rx_cb

  * CVE-2022-28893
- SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
- SUNRPC: Don't leak sockets in xs_local_connect()

  * CVE-2022-34918
- netfilter: nf_tables: stricter validation of element data

  * CVE-2022-1652
- floppy: use a statically allocated error counter

 -- Stefan Bader   Tue, 12 Jul 2022 10:51:03
+0200

** Changed in: linux (Ubuntu Jammy)
   Status: Fix Committed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2022-1652

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2022-1679

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2022-28893

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2022-34918

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Won't Fix
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Won't Fix
Status in linux source package in Jammy:
  Fix Released
Status in linux source package in Kinetic:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896350

  [Impact]

  After 2516ab1("nbd: only clear the queue on device teardown"), present
  in 4.12-rc1 onward, the ioctl NBD_CLEAR_SOCK can no longer clear
  requests currently being processed. This change was made to fix a race
  between using the NBD_CLEAR_SOCK ioctl to clear requests, and teardown

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-07-19 Thread Matthew Ruffell

Performing verification for Jammy.

I created a new Jammy VM, and installed qemu-utils.

The kernel is 5.15.0-41-generic from -updates.

I ran my reproducer.sh script from the testcase, and within a minute,
the nbd request got stuck, and we started seeing hung task timeout oops
messages in dmesg:

Jul 20 04:56:20 jammy-nbd kernel: block nbd15: NBD_DISCONNECT
Jul 20 04:56:20 jammy-nbd kernel: block nbd15: Send disconnect failed -32
Jul 20 04:56:20 jammy-nbd sudo[5267]: pam_unix(sudo:session): session closed 
for user root
Jul 20 04:56:20 jammy-nbd sudo[5271]:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; 
USER=root ; COMMAND=/usr/bin/qemu-nbd --connect=/dev/nbd15 --cache=writeback 
--format=qcow2 foo.img
Jul 20 04:56:20 jammy-nbd sudo[5271]: pam_unix(sudo:session): session opened 
for user root(uid=0) by ubuntu(uid=1000)
Jul 20 04:56:20 jammy-nbd kernel: ldm_validate_partition_table(): Disk read 
failed.
Jul 20 04:56:20 jammy-nbd kernel: Dev nbd15: unable to read RDB block 0
Jul 20 04:56:20 jammy-nbd kernel:  nbd15: unable to read partition table
Jul 20 04:56:51 jammy-nbd kernel: block nbd15: Possible stuck request 
64946bb4: control (read@524087296,65536B). Runtime 30 seconds
Jul 20 04:57:19 jammy-nbd systemd-udevd[440]: nbd15: Worker [2561] processing 
SEQNUM=3062 is taking a long time
Jul 20 04:57:21 jammy-nbd kernel: block nbd15: Possible stuck request 
64946bb4: control (read@524087296,65536B). Runtime 60 seconds
Jul 20 04:57:52 jammy-nbd kernel: block nbd15: Possible stuck request 
64946bb4: control (read@524087296,65536B). Runtime 90 seconds
Jul 20 04:58:23 jammy-nbd kernel: block nbd15: Possible stuck request 
64946bb4: control (read@524087296,65536B). Runtime 120 seconds
Jul 20 04:58:23 jammy-nbd kernel: INFO: task qemu-nbd:5280 blocked for more 
than 120 seconds.
Jul 20 04:58:23 jammy-nbd kernel:   Not tainted 5.15.0-41-generic #44-Ubuntu
Jul 20 04:58:23 jammy-nbd kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 04:58:23 jammy-nbd kernel: task:qemu-nbdstate:D stack:0 pid: 
5280 ppid: 1 flags:0x0002
Jul 20 04:58:23 jammy-nbd kernel: Call Trace:
Jul 20 04:58:23 jammy-nbd kernel:  
Jul 20 04:58:23 jammy-nbd kernel:  __schedule+0x23d/0x590
Jul 20 04:58:23 jammy-nbd kernel:  ? call_rcu+0xe/0x10
Jul 20 04:58:23 jammy-nbd kernel:  schedule+0x4e/0xb0
Jul 20 04:58:23 jammy-nbd kernel:  blk_mq_freeze_queue_wait+0x69/0xa0
Jul 20 04:58:23 jammy-nbd kernel:  ? wait_woken+0x70/0x70
Jul 20 04:58:23 jammy-nbd kernel:  blk_mq_freeze_queue+0x1b/0x30
Jul 20 04:58:23 jammy-nbd kernel:  nbd_add_socket+0x76/0x1f0 [nbd]
Jul 20 04:58:23 jammy-nbd kernel:  __nbd_ioctl+0x18b/0x340 [nbd]
Jul 20 04:58:23 jammy-nbd kernel:  ? security_capable+0x3d/0x60
Jul 20 04:58:23 jammy-nbd kernel:  nbd_ioctl+0x81/0xb0 [nbd]
Jul 20 04:58:23 jammy-nbd kernel:  blkdev_ioctl+0x12e/0x270
Jul 20 04:58:23 jammy-nbd kernel:  ? __fget_files+0x86/0xc0
Jul 20 04:58:23 jammy-nbd kernel:  block_ioctl+0x46/0x50
Jul 20 04:58:23 jammy-nbd kernel:  __x64_sys_ioctl+0x91/0xc0
Jul 20 04:58:23 jammy-nbd kernel:  do_syscall_64+0x5c/0xc0
Jul 20 04:58:23 jammy-nbd kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Jul 20 04:58:23 jammy-nbd kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 20 04:58:23 jammy-nbd kernel:  ? __x64_sys_recvmsg+0x1d/0x20
Jul 20 04:58:23 jammy-nbd kernel:  ? do_syscall_64+0x69/0xc0
Jul 20 04:58:23 jammy-nbd kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 20 04:58:23 jammy-nbd kernel:  ? __x64_sys_recvmsg+0x1d/0x20
Jul 20 04:58:23 jammy-nbd kernel:  ? do_syscall_64+0x69/0xc0
Jul 20 04:58:23 jammy-nbd kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jul 20 04:58:23 jammy-nbd kernel: RIP: 0033:0x7f6c47e47aff
Jul 20 04:58:23 jammy-nbd kernel: RSP: 002b:7f6c464d1820 EFLAGS: 0246 
ORIG_RAX: 0010
Jul 20 04:58:23 jammy-nbd kernel: RAX: ffda RBX: 0001 
RCX: 7f6c47e47aff
Jul 20 04:58:23 jammy-nbd kernel: RDX: 0009 RSI: ab00 
RDI: 000b
Jul 20 04:58:23 jammy-nbd kernel: RBP: 7f6c464d1910 R08:  
R09: 0001
Jul 20 04:58:23 jammy-nbd kernel: R10:  R11: 0246 
R12: 000b
Jul 20 04:58:23 jammy-nbd kernel: R13: 7f6c464d1900 R14: 1f40 
R15: 7f6c3c000b90
Jul 20 04:58:23 jammy-nbd kernel:  

I then rebooted, and enabled -proposed and installed the
5.15.0-43-generic kernel.

I started the reproducer.sh script and left it to run for an hour.

At the end of the hour, the script was still running strong. Requests no
longer get stuck when we issue NBD_DISCONNECT, and the issue is solved.

The kernel in -proposed fixes the issue, happy to mark verified.

** Tags removed: verification-needed-focal verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-07-19 Thread Matthew Ruffell

Fix released for linux-azure:

linux-azure (5.4.0-1086.91)
linux-azure (5.15.0.1014.17)

Marking back to Fix Committed for Jammy and In progress for Focal to
track progress in -generic variants.

** Changed in: linux (Ubuntu Jammy)
   Status: Fix Released => Fix Committed

** Changed in: linux (Ubuntu Focal)
   Status: Fix Released => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Won't Fix
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Won't Fix
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Kinetic:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896350

  [Impact]

  After 2516ab1("nbd: only clear the queue on device teardown"), present
  in 4.12-rc1 onward, the ioctl NBD_CLEAR_SOCK can no longer clear
  requests currently being processed. This change was made to fix a race
  between using the NBD_CLEAR_SOCK ioctl to clear requests, and teardown
  of the device clearing requests. This worked for the most part, as
  several years ago systemd was not set up to watch nbd devices for
  changes in their state.

  But after:

  commit f82abfcda58168d9f667e2094d438763531d3fa6
  From: Tony Asleson 
  Date: Fri, 8 Feb 2019 15:47:10 -0600
  Subject: rules: watch metadata changes on nbd devices
  Link: 
https://github.com/systemd/systemd/commit/f82abfcda58168d9f667e2094d438763531d3fa6

  in systemd v242-rc1, nbd* devices were added to a udev rule to watch
  those devices for changes with the inotify subsystem. From man udev:

  > watch
  >   Watch the device node with inotify; when the node is closed after being 
  >   opened for writing, a change uevent is synthesized.
  >
  > nowatch
  >   Disable the watching of a device node with inotify.

  This changed the behaviour of device teardown, since systemd now keeps
  tabs on the device with inotify, outstanding requests cannot be
  cleared as nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER',
  and requests get stuck, never to complete, because a disconnect has
  occurred, and never to timeout, as their timers keep being reset.

  Symptoms of this issue is that the nbd subsystem gets stuck with
  messages like:

  block nbd15: NBD_DISCONNECT
  block nbd15: Send disconnect failed -32
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 30 seconds
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 150 seconds
  ...
  INFO: task qemu-nbd:1267 blocked for more than 120 seconds.
Not tainted 5.15.0-23-generic #23-Ubuntu
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:qemu-nbdstate:D stack:0 pid: 1267 ppid: 1 
flags:0x0002
  Call Trace:
   
   __schedule+0x23d/0x590
   ? call_rcu+0xe/0x10
   schedule+0x4e/0xb0
   blk_mq_freeze_queue_wait+0x69/0xa0
   ? wait_woken+0x70/0x70
   blk_mq_freeze_queue+0x1b/0x30
   nbd_add_socket+0x76/0x1f0 [nbd]
   __nbd_ioctl+0x18b/0x340 [nbd]
   ? security_capable+0x3d/0x60
   nbd_ioctl+0x81/0xb0 [nbd]
   blkdev_ioctl+0x12e/0x270
   ? __fget_files+0x86/0xc0
   block_ioctl+0x46/0x50
   __x64_sys_ioctl+0x91/0xc0
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   

  Additionally, in syslog you will also see systemd-udevd get stuck:

  systemd-udevd[419]: nbd15: Worker [2004] processing SEQNUM=5661 is
  taking a long time

  $ ps aux
  ...
  4191194 root D 0.1 systemd-udevd   -

  We can workaround the issue by adding a higher priority udev rule to
  not watch nbd* devices.

  $ cat << EOF >> /etc/udev/rules.d/97-nbd-device.rules
  # Disable inotify watching of change events for NBD devices
  ACTION=="add|change", KERNEL=="nbd*", OPTIONS:="nowatch"
  EOF

  $ sudo udevadm control --reload-rules
  $ sudo udevadm trigger

  [Fix]

  The fix relies on infrastructure provided by the flag
  NBD_CMD_INFLIGHT, which was introduced in 5.16, and added to in 5.19.
  We need to backport all commits related to NBD_CMD_INFLIGHT to our
  kernels for the fix to be effective.

  For Focal, Impish and Jammy:

  commit 4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:44 2021 +0800
  Subject: nbd: don't handle response without a corresponding request message
  Link: 
https://github.com/torvalds/linux/commit/4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d

  commit 07175cb1baf4c51051b1fbd391097e349f9a02a9
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:45 2021 +0800
  Subject: nbd: make sure request completion won't concurrent
  Link:

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-07-19 Thread Tim Gardner

** Changed in: linux (Ubuntu Jammy)
   Status: Fix Committed => Fix Released

** Changed in: linux (Ubuntu Kinetic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Won't Fix
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Impish:
  Won't Fix
Status in linux source package in Jammy:
  Fix Released
Status in linux source package in Kinetic:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896350

  [Impact]

  After 2516ab1("nbd: only clear the queue on device teardown"), present
  in 4.12-rc1 onward, the ioctl NBD_CLEAR_SOCK can no longer clear
  requests currently being processed. This change was made to fix a race
  between using the NBD_CLEAR_SOCK ioctl to clear requests, and teardown
  of the device clearing requests. This worked for the most part, as
  several years ago systemd was not set up to watch nbd devices for
  changes in their state.

  But after:

  commit f82abfcda58168d9f667e2094d438763531d3fa6
  From: Tony Asleson 
  Date: Fri, 8 Feb 2019 15:47:10 -0600
  Subject: rules: watch metadata changes on nbd devices
  Link: 
https://github.com/systemd/systemd/commit/f82abfcda58168d9f667e2094d438763531d3fa6

  in systemd v242-rc1, nbd* devices were added to a udev rule to watch
  those devices for changes with the inotify subsystem. From man udev:

  > watch
  >   Watch the device node with inotify; when the node is closed after being 
  >   opened for writing, a change uevent is synthesized.
  >
  > nowatch
  >   Disable the watching of a device node with inotify.

  This changed the behaviour of device teardown, since systemd now keeps
  tabs on the device with inotify, outstanding requests cannot be
  cleared as nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER',
  and requests get stuck, never to complete, because a disconnect has
  occurred, and never to timeout, as their timers keep being reset.

  Symptoms of this issue is that the nbd subsystem gets stuck with
  messages like:

  block nbd15: NBD_DISCONNECT
  block nbd15: Send disconnect failed -32
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 30 seconds
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 150 seconds
  ...
  INFO: task qemu-nbd:1267 blocked for more than 120 seconds.
Not tainted 5.15.0-23-generic #23-Ubuntu
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:qemu-nbdstate:D stack:0 pid: 1267 ppid: 1 
flags:0x0002
  Call Trace:
   
   __schedule+0x23d/0x590
   ? call_rcu+0xe/0x10
   schedule+0x4e/0xb0
   blk_mq_freeze_queue_wait+0x69/0xa0
   ? wait_woken+0x70/0x70
   blk_mq_freeze_queue+0x1b/0x30
   nbd_add_socket+0x76/0x1f0 [nbd]
   __nbd_ioctl+0x18b/0x340 [nbd]
   ? security_capable+0x3d/0x60
   nbd_ioctl+0x81/0xb0 [nbd]
   blkdev_ioctl+0x12e/0x270
   ? __fget_files+0x86/0xc0
   block_ioctl+0x46/0x50
   __x64_sys_ioctl+0x91/0xc0
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   

  Additionally, in syslog you will also see systemd-udevd get stuck:

  systemd-udevd[419]: nbd15: Worker [2004] processing SEQNUM=5661 is
  taking a long time

  $ ps aux
  ...
  4191194 root D 0.1 systemd-udevd   -

  We can workaround the issue by adding a higher priority udev rule to
  not watch nbd* devices.

  $ cat << EOF >> /etc/udev/rules.d/97-nbd-device.rules
  # Disable inotify watching of change events for NBD devices
  ACTION=="add|change", KERNEL=="nbd*", OPTIONS:="nowatch"
  EOF

  $ sudo udevadm control --reload-rules
  $ sudo udevadm trigger

  [Fix]

  The fix relies on infrastructure provided by the flag
  NBD_CMD_INFLIGHT, which was introduced in 5.16, and added to in 5.19.
  We need to backport all commits related to NBD_CMD_INFLIGHT to our
  kernels for the fix to be effective.

  For Focal, Impish and Jammy:

  commit 4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:44 2021 +0800
  Subject: nbd: don't handle response without a corresponding request message
  Link: 
https://github.com/torvalds/linux/commit/4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d

  commit 07175cb1baf4c51051b1fbd391097e349f9a02a9
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:45 2021 +0800
  Subject: nbd: make sure request completion won't concurrent
  Link: 
https://github.com/torvalds/linux/commit/07175cb1baf4c51051b1fbd391097e349f9a02a9

  commit 2895f1831e911ca87d4efdf43e35eb72a0c7e66e
  Author: Yu Kuai 
  Date:   Sat May 21 15:37:46 2022 +0800
  Subject: nbd: don't clear 'NBD_CMD_INFLIGHT' flag if

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-07-19 Thread Tim Gardner

** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Won't Fix
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Impish:
  Won't Fix
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Kinetic:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896350

  [Impact]

  After 2516ab1("nbd: only clear the queue on device teardown"), present
  in 4.12-rc1 onward, the ioctl NBD_CLEAR_SOCK can no longer clear
  requests currently being processed. This change was made to fix a race
  between using the NBD_CLEAR_SOCK ioctl to clear requests, and teardown
  of the device clearing requests. This worked for the most part, as
  several years ago systemd was not set up to watch nbd devices for
  changes in their state.

  But after:

  commit f82abfcda58168d9f667e2094d438763531d3fa6
  From: Tony Asleson 
  Date: Fri, 8 Feb 2019 15:47:10 -0600
  Subject: rules: watch metadata changes on nbd devices
  Link: 
https://github.com/systemd/systemd/commit/f82abfcda58168d9f667e2094d438763531d3fa6

  in systemd v242-rc1, nbd* devices were added to a udev rule to watch
  those devices for changes with the inotify subsystem. From man udev:

  > watch
  >   Watch the device node with inotify; when the node is closed after being 
  >   opened for writing, a change uevent is synthesized.
  >
  > nowatch
  >   Disable the watching of a device node with inotify.

  This changed the behaviour of device teardown, since systemd now keeps
  tabs on the device with inotify, outstanding requests cannot be
  cleared as nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER',
  and requests get stuck, never to complete, because a disconnect has
  occurred, and never to timeout, as their timers keep being reset.

  Symptoms of this issue is that the nbd subsystem gets stuck with
  messages like:

  block nbd15: NBD_DISCONNECT
  block nbd15: Send disconnect failed -32
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 30 seconds
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 150 seconds
  ...
  INFO: task qemu-nbd:1267 blocked for more than 120 seconds.
Not tainted 5.15.0-23-generic #23-Ubuntu
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:qemu-nbdstate:D stack:0 pid: 1267 ppid: 1 
flags:0x0002
  Call Trace:
   
   __schedule+0x23d/0x590
   ? call_rcu+0xe/0x10
   schedule+0x4e/0xb0
   blk_mq_freeze_queue_wait+0x69/0xa0
   ? wait_woken+0x70/0x70
   blk_mq_freeze_queue+0x1b/0x30
   nbd_add_socket+0x76/0x1f0 [nbd]
   __nbd_ioctl+0x18b/0x340 [nbd]
   ? security_capable+0x3d/0x60
   nbd_ioctl+0x81/0xb0 [nbd]
   blkdev_ioctl+0x12e/0x270
   ? __fget_files+0x86/0xc0
   block_ioctl+0x46/0x50
   __x64_sys_ioctl+0x91/0xc0
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   

  Additionally, in syslog you will also see systemd-udevd get stuck:

  systemd-udevd[419]: nbd15: Worker [2004] processing SEQNUM=5661 is
  taking a long time

  $ ps aux
  ...
  4191194 root D 0.1 systemd-udevd   -

  We can workaround the issue by adding a higher priority udev rule to
  not watch nbd* devices.

  $ cat << EOF >> /etc/udev/rules.d/97-nbd-device.rules
  # Disable inotify watching of change events for NBD devices
  ACTION=="add|change", KERNEL=="nbd*", OPTIONS:="nowatch"
  EOF

  $ sudo udevadm control --reload-rules
  $ sudo udevadm trigger

  [Fix]

  The fix relies on infrastructure provided by the flag
  NBD_CMD_INFLIGHT, which was introduced in 5.16, and added to in 5.19.
  We need to backport all commits related to NBD_CMD_INFLIGHT to our
  kernels for the fix to be effective.

  For Focal, Impish and Jammy:

  commit 4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:44 2021 +0800
  Subject: nbd: don't handle response without a corresponding request message
  Link: 
https://github.com/torvalds/linux/commit/4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d

  commit 07175cb1baf4c51051b1fbd391097e349f9a02a9
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:45 2021 +0800
  Subject: nbd: make sure request completion won't concurrent
  Link: 
https://github.com/torvalds/linux/commit/07175cb1baf4c51051b1fbd391097e349f9a02a9

  commit 2895f1831e911ca87d4efdf43e35eb72a0c7e66e
  Author: Yu Kuai 
  Date:   Sat May 21 15:37:46 2022 +0800
  Subject: nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
  Link:

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

2022-07-08 Thread Stefan Bader

** Changed in: linux (Ubuntu Impish)
   Status: In Progress => Won't Fix

** Changed in: linux (Ubuntu Jammy)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896350

Title:
  nbd: requests can become stuck when disconnecting from server with
  qemu-nbd

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Won't Fix
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Won't Fix
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Kinetic:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896350

  [Impact]

  After 2516ab1("nbd: only clear the queue on device teardown"), present
  in 4.12-rc1 onward, the ioctl NBD_CLEAR_SOCK can no longer clear
  requests currently being processed. This change was made to fix a race
  between using the NBD_CLEAR_SOCK ioctl to clear requests, and teardown
  of the device clearing requests. This worked for the most part, as
  several years ago systemd was not set up to watch nbd devices for
  changes in their state.

  But after:

  commit f82abfcda58168d9f667e2094d438763531d3fa6
  From: Tony Asleson 
  Date: Fri, 8 Feb 2019 15:47:10 -0600
  Subject: rules: watch metadata changes on nbd devices
  Link: 
https://github.com/systemd/systemd/commit/f82abfcda58168d9f667e2094d438763531d3fa6

  in systemd v242-rc1, nbd* devices were added to a udev rule to watch
  those devices for changes with the inotify subsystem. From man udev:

  > watch
  >   Watch the device node with inotify; when the node is closed after being 
  >   opened for writing, a change uevent is synthesized.
  >
  > nowatch
  >   Disable the watching of a device node with inotify.

  This changed the behaviour of device teardown, since systemd now keeps
  tabs on the device with inotify, outstanding requests cannot be
  cleared as nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER',
  and requests get stuck, never to complete, because a disconnect has
  occurred, and never to timeout, as their timers keep being reset.

  Symptoms of this issue is that the nbd subsystem gets stuck with
  messages like:

  block nbd15: NBD_DISCONNECT
  block nbd15: Send disconnect failed -32
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 30 seconds
  ...
  block nbd15: Possible stuck request 7fcf62ba: control 
(read@523915264,24576B). Runtime 150 seconds
  ...
  INFO: task qemu-nbd:1267 blocked for more than 120 seconds.
Not tainted 5.15.0-23-generic #23-Ubuntu
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:qemu-nbdstate:D stack:0 pid: 1267 ppid: 1 
flags:0x0002
  Call Trace:
   
   __schedule+0x23d/0x590
   ? call_rcu+0xe/0x10
   schedule+0x4e/0xb0
   blk_mq_freeze_queue_wait+0x69/0xa0
   ? wait_woken+0x70/0x70
   blk_mq_freeze_queue+0x1b/0x30
   nbd_add_socket+0x76/0x1f0 [nbd]
   __nbd_ioctl+0x18b/0x340 [nbd]
   ? security_capable+0x3d/0x60
   nbd_ioctl+0x81/0xb0 [nbd]
   blkdev_ioctl+0x12e/0x270
   ? __fget_files+0x86/0xc0
   block_ioctl+0x46/0x50
   __x64_sys_ioctl+0x91/0xc0
   do_syscall_64+0x5c/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   

  Additionally, in syslog you will also see systemd-udevd get stuck:

  systemd-udevd[419]: nbd15: Worker [2004] processing SEQNUM=5661 is
  taking a long time

  $ ps aux
  ...
  4191194 root D 0.1 systemd-udevd   -

  We can workaround the issue by adding a higher priority udev rule to
  not watch nbd* devices.

  $ cat << EOF >> /etc/udev/rules.d/97-nbd-device.rules
  # Disable inotify watching of change events for NBD devices
  ACTION=="add|change", KERNEL=="nbd*", OPTIONS:="nowatch"
  EOF

  $ sudo udevadm control --reload-rules
  $ sudo udevadm trigger

  [Fix]

  The fix relies on infrastructure provided by the flag
  NBD_CMD_INFLIGHT, which was introduced in 5.16, and added to in 5.19.
  We need to backport all commits related to NBD_CMD_INFLIGHT to our
  kernels for the fix to be effective.

  For Focal, Impish and Jammy:

  commit 4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:44 2021 +0800
  Subject: nbd: don't handle response without a corresponding request message
  Link: 
https://github.com/torvalds/linux/commit/4e6eef5dc25b528e08ac5b5f64f6ca9d9987241d

  commit 07175cb1baf4c51051b1fbd391097e349f9a02a9
  Author: Yu Kuai 
  Date:   Thu Sep 16 17:33:45 2021 +0800
  Subject: nbd: make sure request completion won't concurrent
  Link: 
https://github.com/torvalds/linux/commit/07175cb1baf4c51051b1fbd391097e349f9a02a9

  commit 2895f1831e911ca87d4efdf43e35eb72a0c7e66e
  Author: Yu Kuai 
  Date:   Sat May 21 15:37:46 2022 +0800
  Subject: nbd: don't clear 'NBD_CMD_INFLIGHT' flag if

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

[Kernel-packages] [Bug 1896350] Re: nbd: requests can become stuck when disconnecting from server with qemu-nbd

9 matches

Site Navigation

Mail list logo

Footer information