[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
This bug was fixed in the package linux - 4.15.0-23.25 --- linux (4.15.0-23.25) bionic; urgency=medium * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927) * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630) - arm64: mmu: add the entry trampolines start/end section markers into sections.h - arm64: sdei: Add trampoline code for remapping the kernel * Some PCIe errors not surfaced through rasdaemon (LP: #1769730) - ACPI: APEI: handle PCIe AER errors in separate function - ACPI: APEI: call into AER handling regardless of severity * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003) - scsi: qla2xxx: Fix session cleanup for N2N - scsi: qla2xxx: Remove unused argument from qlt_schedule_sess_for_deletion() - scsi: qla2xxx: Serialize session deletion by using work_lock - scsi: qla2xxx: Serialize session free in qlt_free_session_done - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled. - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout() - scsi: qla2xxx: Prevent relogin trigger from sending too many commands - scsi: qla2xxx: Fix double free bug after firmware timeout - scsi: qla2xxx: Fixup locking for session deletion * Several hisi_sas bug fixes (LP: #1768974) - scsi: hisi_sas: dt-bindings: add an property of signal attenuation - scsi: hisi_sas: support the property of signal attenuation for v2 hw - scsi: hisi_sas: fix the issue of link rate inconsistency - scsi: hisi_sas: fix the issue of setting linkrate register - scsi: hisi_sas: increase timer expire of internal abort task - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req - scsi: hisi_sas: fix return value of hisi_sas_task_prep() - scsi: hisi_sas: Code cleanup and minor bug fixes * [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded (LP: #1764982) - nvmet-rdma: Don't flush system_wq by default during remove_one - nvme-rdma: Don't flush delete_wq by default during remove_one * Warnings/hang during error handling of SATA disks on SAS controller (LP: #1768971) - scsi: libsas: defer ata device eh commands to libata * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948) - ata: do not schedule hot plug if it is a sas host * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927) - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write() - powerpc/64s: return more carefully from sreset NMI - powerpc/64s: sreset panic if there is no debugger or crash dump handlers * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564) - fsnotify: Fix fsnotify_mark_connector race * Hang on network interface removal in Xen virtual machine (LP: #1771620) - xen-netfront: Fix hang on device removal * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977) - net: hns: Avoid action name truncation * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849) - SAUCE: powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus() * Switch Build-Depends: transfig to fig2dev (LP: #1770770) - [Config] update Build-Depends: transfig to fig2dev * smp_call_function_single/many core hangs with stop4 alone (LP: #1768898) - cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt * Add d-i support for Huawei NICs (LP: #1767490) - d-i: add hinic to nic-modules udeb * unregister_netdevice: waiting for eth0 to become free. Usage count = 5 (LP: #1746474) - xfrm: reuse uncached_list to track xdsts * Include nfp driver in linux-modules (LP: #1768526) - [Config] Add nfp.ko to generic inclusion list * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679) - x86/xen: Reset VCPU0 info pointer after shared_info remap * CVE-2018-3639 (x86) - x86/bugs: Fix the parameters alignment and missing void - KVM: SVM: Move spec control call after restore of GS - x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP - x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS - x86/cpufeatures: Disentangle SSBD enumeration - x86/cpufeatures: Add FEATURE_ZEN - x86/speculation: Handle HT correctly on AMD - x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL - x86/speculation: Add virtualized speculative store bypass disable support - x86/speculation: Rework speculative_store_bypass_update() - x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} - x86/bugs: Expose x86_spec_ctrl_base directly - x86/bugs: Remove x86_spec_ctrl_set() - x86/bugs: Rework spec_ctrl base and mask logic - x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG - KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD - x86/bugs:
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
This bug was fixed in the package linux - 4.15.0-23.25 --- linux (4.15.0-23.25) bionic; urgency=medium * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927) * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630) - arm64: mmu: add the entry trampolines start/end section markers into sections.h - arm64: sdei: Add trampoline code for remapping the kernel * Some PCIe errors not surfaced through rasdaemon (LP: #1769730) - ACPI: APEI: handle PCIe AER errors in separate function - ACPI: APEI: call into AER handling regardless of severity * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003) - scsi: qla2xxx: Fix session cleanup for N2N - scsi: qla2xxx: Remove unused argument from qlt_schedule_sess_for_deletion() - scsi: qla2xxx: Serialize session deletion by using work_lock - scsi: qla2xxx: Serialize session free in qlt_free_session_done - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled. - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout() - scsi: qla2xxx: Prevent relogin trigger from sending too many commands - scsi: qla2xxx: Fix double free bug after firmware timeout - scsi: qla2xxx: Fixup locking for session deletion * Several hisi_sas bug fixes (LP: #1768974) - scsi: hisi_sas: dt-bindings: add an property of signal attenuation - scsi: hisi_sas: support the property of signal attenuation for v2 hw - scsi: hisi_sas: fix the issue of link rate inconsistency - scsi: hisi_sas: fix the issue of setting linkrate register - scsi: hisi_sas: increase timer expire of internal abort task - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req - scsi: hisi_sas: fix return value of hisi_sas_task_prep() - scsi: hisi_sas: Code cleanup and minor bug fixes * [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded (LP: #1764982) - nvmet-rdma: Don't flush system_wq by default during remove_one - nvme-rdma: Don't flush delete_wq by default during remove_one * Warnings/hang during error handling of SATA disks on SAS controller (LP: #1768971) - scsi: libsas: defer ata device eh commands to libata * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948) - ata: do not schedule hot plug if it is a sas host * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927) - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write() - powerpc/64s: return more carefully from sreset NMI - powerpc/64s: sreset panic if there is no debugger or crash dump handlers * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564) - fsnotify: Fix fsnotify_mark_connector race * Hang on network interface removal in Xen virtual machine (LP: #1771620) - xen-netfront: Fix hang on device removal * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977) - net: hns: Avoid action name truncation * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849) - SAUCE: powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus() * Switch Build-Depends: transfig to fig2dev (LP: #1770770) - [Config] update Build-Depends: transfig to fig2dev * smp_call_function_single/many core hangs with stop4 alone (LP: #1768898) - cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt * Add d-i support for Huawei NICs (LP: #1767490) - d-i: add hinic to nic-modules udeb * unregister_netdevice: waiting for eth0 to become free. Usage count = 5 (LP: #1746474) - xfrm: reuse uncached_list to track xdsts * Include nfp driver in linux-modules (LP: #1768526) - [Config] Add nfp.ko to generic inclusion list * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679) - x86/xen: Reset VCPU0 info pointer after shared_info remap * CVE-2018-3639 (x86) - x86/bugs: Fix the parameters alignment and missing void - KVM: SVM: Move spec control call after restore of GS - x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP - x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS - x86/cpufeatures: Disentangle SSBD enumeration - x86/cpufeatures: Add FEATURE_ZEN - x86/speculation: Handle HT correctly on AMD - x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL - x86/speculation: Add virtualized speculative store bypass disable support - x86/speculation: Rework speculative_store_bypass_update() - x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} - x86/bugs: Expose x86_spec_ctrl_base directly - x86/bugs: Remove x86_spec_ctrl_set() - x86/bugs: Rework spec_ctrl base and mask logic - x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG - KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD - x86/bugs:
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Bug description: == SRU Justification == This bug causes the machine to get stuck and bonding to not work when the nvmet_rdma module is loaded. Both of these commits are in mainline as of v4.17-rc1. == Fixes == a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") == Regression Potential == Low. Limited to nvme driver and tested by Mellanox. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max Gurtovoy Date: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
Hi, Sorry I missed that, will do it today. yours, Talat -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Bug description: == SRU Justification == This bug causes the machine to get stuck and bonding to not work when the nvmet_rdma module is loaded. Both of these commits are in mainline as of v4.17-rc1. == Fixes == a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") == Regression Potential == Low. Limited to nvme driver and tested by Mellanox. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max Gurtovoy Date: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
Any progress on the verification for this? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Bug description: == SRU Justification == This bug causes the machine to get stuck and bonding to not work when the nvmet_rdma module is loaded. Both of these commits are in mainline as of v4.17-rc1. == Fixes == a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") == Regression Potential == Low. Limited to nvme driver and tested by Mellanox. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max Gurtovoy Date: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */ mutex_lock(_rdma_queue_mutex);
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Bug description: == SRU Justification == This bug causes the machine to get stuck and bonding to not work when the nvmet_rdma module is loaded. Both of these commits are in mainline as of v4.17-rc1. == Fixes == a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") == Regression Potential == Low. Limited to nvme driver and tested by Mellanox. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max GurtovoyDate: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops =
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
** Changed in: linux (Ubuntu Bionic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Bug description: == SRU Justification == This bug causes the machine to get stuck and bonding to not work when the nvmet_rdma module is loaded. Both of these commits are in mainline as of v4.17-rc1. == Fixes == a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") == Regression Potential == Low. Limited to nvme driver and tested by Mellanox. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max GurtovoyDate: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + *
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
SRU request submitted for Bionic: https://lists.ubuntu.com/archives/kernel-team/2018-May/092158.html ** Description changed: - Hi + + == SRU Justification == + This bug causes the machine to get stuck and bonding to not work when + the nvmet_rdma module is loaded. + + Both of these commits are in mainline as of v4.17-rc1. + + == Fixes == + a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") + 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") + + == Regression Potential == + Low. Limited to nvme driver and tested by Mellanox. + + == Test Case == + A test kernel was built with these patches and tested by the original bug reporter. + The bug reporter states the test kernel resolved the bug. + + + + == Original Bug Description == + + Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: - - # modprobe nvmet_rdma - # modprobe -r bonding - # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 - # ifdown eth4 - # ifdown eth5 - # ip addr add 15.209.12.173/8 dev bond0 - # ip link set bond0 up - # echo +eth5 > /sys/class/net/bond0/bonding/slaves - # echo +eth4 > /sys/class/net/bond0/bonding/slaves - # echo -eth4 > /sys/class/net/bond0/bonding/slaves - # echo -eth5 > /sys/class/net/bond0/bonding/slaves - # echo -bond0 > /sys/class/net/bonding_masters - + # modprobe nvmet_rdma + # modprobe -r bonding + # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 + # ifdown eth4 + # ifdown eth5 + # ip addr add 15.209.12.173/8 dev bond0 + # ip link set bond0 up + # echo +eth5 > /sys/class/net/bond0/bonding/slaves + # echo +eth4 > /sys/class/net/bond0/bonding/slaves + # echo -eth4 > /sys/class/net/bond0/bonding/slaves + # echo -eth5 > /sys/class/net/bond0/bonding/slaves + # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 - The following upstream commits that fix this issue - commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max GurtovoyDate: Wed Feb 28 13:12:38 2018 +0200 - nvmet-rdma: Don't flush system_wq by default during remove_one + nvmet-rdma: Don't flush system_wq by default during remove_one - The .remove_one function is called for any ib_device removal. - In case the removed device has no reference in our driver, there - is no need to flush the system work queue. + The .remove_one function is called for any ib_device removal. + In case the removed device has no reference in our driver, there + is no need to flush the system work queue. - Reviewed-by: Israel Rukshin - Signed-off-by: Max Gurtovoy - Reviewed-by: Sagi Grimberg - Signed-off-by: Keith Busch - Signed-off-by: Jens Axboe + Reviewed-by: Israel Rukshin + Signed-off-by: Max Gurtovoy + Reviewed-by: Sagi Grimberg + Signed-off-by: Keith Busch + Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
Thank you for the build, I tested with those patches and it is work. Thanks, Talat -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max GurtovoyDate: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */ mutex_lock(_rdma_queue_mutex); list_for_each_entry_safe(queue, tmp, _rdma_queue_list, queue_list) { commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a Author: Max Gurtovoy Date: Wed Feb 28 13:12:39 2018 +0200 nvme-rdma: Don't flush delete_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
I built a test kernel with commits 9bad0404 and a3dd7d002. The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1764982 Can you test this kernel and see if it resolves this bug? Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages. Thanks in advance! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max GurtovoyDate: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */ mutex_lock(_rdma_queue_mutex); list_for_each_entry_safe(queue, tmp, _rdma_queue_list, queue_list) { commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a Author: Max Gurtovoy Date: Wed Feb 28 13:12:39
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Joseph Salisbury (jsalisbury) ** Changed in: linux (Ubuntu Bionic) Status: Triaged => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max GurtovoyDate: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */ mutex_lock(_rdma_queue_mutex); list_for_each_entry_safe(queue, tmp, _rdma_queue_list, queue_list) { commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a Author: Max Gurtovoy Date: Wed Feb 28 13:12:39 2018 +0200 nvme-rdma: Don't flush delete_wq by default during remove_one The .remove_one function is called for any ib_device removal. In
[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
** Changed in: linux (Ubuntu) Importance: Undecided => High ** Also affects: linux (Ubuntu Bionic) Importance: High Status: Incomplete ** Changed in: linux (Ubuntu Bionic) Status: Incomplete => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max GurtovoyDate: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(_list_mutex); + list_for_each_entry(ndev, _list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */ mutex_lock(_rdma_queue_mutex); list_for_each_entry_safe(queue, tmp, _rdma_queue_list, queue_list) { commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a Author: Max Gurtovoy Date: Wed Feb 28 13:12:39 2018 +0200 nvme-rdma: Don't flush delete_wq by default during remove_one The .remove_one function is