SRU request submitted for Bionic:
https://lists.ubuntu.com/archives/kernel-team/2018-May/092158.html
** Description changed:
- Hi
+
+ == SRU Justification ==
+ This bug causes the machine to get stuck and bonding to not work when
+ the nvmet_rdma module is loaded.
+
+ Both of these commits are in mainline as of v4.17-rc1.
+
+ == Fixes ==
+ a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during
remove_one")
+ 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one")
+
+ == Regression Potential ==
+ Low. Limited to nvme driver and tested by Mellanox.
+
+ == Test Case ==
+ A test kernel was built with these patches and tested by the original bug
reporter.
+ The bug reporter states the test kernel resolved the bug.
+
+
+
+ == Original Bug Description ==
+
+ Hi
Machine stuck after unregistering bonding interface when the nvmet_rdma
module is loading.
scenario:
-
- # modprobe nvmet_rdma
- # modprobe -r bonding
- # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0
- # ifdown eth4
- # ifdown eth5
- # ip addr add 15.209.12.173/8 dev bond0
- # ip link set bond0 up
- # echo +eth5 > /sys/class/net/bond0/bonding/slaves
- # echo +eth4 > /sys/class/net/bond0/bonding/slaves
- # echo -eth4 > /sys/class/net/bond0/bonding/slaves
- # echo -eth5 > /sys/class/net/bond0/bonding/slaves
- # echo -bond0 > /sys/class/net/bonding_masters
-
+ # modprobe nvmet_rdma
+ # modprobe -r bonding
+ # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0
+ # ifdown eth4
+ # ifdown eth5
+ # ip addr add 15.209.12.173/8 dev bond0
+ # ip link set bond0 up
+ # echo +eth5 > /sys/class/net/bond0/bonding/slaves
+ # echo +eth4 > /sys/class/net/bond0/bonding/slaves
+ # echo -eth4 > /sys/class/net/bond0/bonding/slaves
+ # echo -eth5 > /sys/class/net/bond0/bonding/slaves
+ # echo -bond0 > /sys/class/net/bonding_masters
dmesg:
kernel: [78348.225556] bond0 (unregistering): Released all slaves
kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become
free. Usage count = 2
-
The following upstream commits that fix this issue
-
commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
Author: Max Gurtovoy <[email protected]>
Date: Wed Feb 28 13:12:38 2018 +0200
- nvmet-rdma: Don't flush system_wq by default during remove_one
+ nvmet-rdma: Don't flush system_wq by default during remove_one
- The .remove_one function is called for any ib_device removal.
- In case the removed device has no reference in our driver, there
- is no need to flush the system work queue.
+ The .remove_one function is called for any ib_device removal.
+ In case the removed device has no reference in our driver, there
+ is no need to flush the system work queue.
- Reviewed-by: Israel Rukshin <[email protected]>
- Signed-off-by: Max Gurtovoy <[email protected]>
- Reviewed-by: Sagi Grimberg <[email protected]>
- Signed-off-by: Keith Busch <[email protected]>
- Signed-off-by: Jens Axboe <[email protected]>
+ Reviewed-by: Israel Rukshin <[email protected]>
+ Signed-off-by: Max Gurtovoy <[email protected]>
+ Reviewed-by: Sagi Grimberg <[email protected]>
+ Signed-off-by: Keith Busch <[email protected]>
+ Signed-off-by: Jens Axboe <[email protected]>
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index aa8068f..a59263d 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
- static void nvmet_rdma_remove_one(struct ib_device *ib_device, void
*client_data)
- {
- struct nvmet_rdma_queue *queue, *tmp;
+ static void nvmet_rdma_remove_one(struct ib_device *ib_device, void
*client_data)
+ {
+ struct nvmet_rdma_queue *queue, *tmp;
+ struct nvmet_rdma_device *ndev;
+ bool found = false;
+
+ mutex_lock(&device_list_mutex);
+ list_for_each_entry(ndev, &device_list, entry) {
+ if (ndev->device == ib_device) {
+ found = true;
+ break;
+ }
+ }
+ mutex_unlock(&device_list_mutex);
+
+ if (!found)
+ return;
- /* Device is being removed, delete all queues using this device */
+ /*
+ * IB Device that is used by nvmet controllers is being removed,
+ * delete all queues using this device.
+ */
- mutex_lock(&nvmet_rdma_queue_mutex);
- list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list,
- queue_list) {
-
-
+ mutex_lock(&nvmet_rdma_queue_mutex);
+ list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list,
+ queue_list) {
commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a
Author: Max Gurtovoy <[email protected]>
Date: Wed Feb 28 13:12:39 2018 +0200
- nvme-rdma: Don't flush delete_wq by default during remove_one
+ nvme-rdma: Don't flush delete_wq by default during remove_one
- The .remove_one function is called for any ib_device removal.
- In case the removed device has no reference in our driver, there
- is no need to flush the work queue.
+ The .remove_one function is called for any ib_device removal.
+ In case the removed device has no reference in our driver, there
+ is no need to flush the work queue.
- Reviewed-by: Israel Rukshin <[email protected]>
- Signed-off-by: Max Gurtovoy <[email protected]>
- Reviewed-by: Sagi Grimberg <[email protected]>
- Signed-off-by: Keith Busch <[email protected]>
- Signed-off-by: Jens Axboe <[email protected]>
+ Reviewed-by: Israel Rukshin <[email protected]>
+ Signed-off-by: Max Gurtovoy <[email protected]>
+ Reviewed-by: Sagi Grimberg <[email protected]>
+ Signed-off-by: Keith Busch <[email protected]>
+ Signed-off-by: Jens Axboe <[email protected]>
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f5f460b..250b277 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2024,6 +2024,20 @@ static struct nvmf_transport_ops nvme_rdma_transport =
{
- static void nvme_rdma_remove_one(struct ib_device *ib_device, void
*client_data)
- {
- struct nvme_rdma_ctrl *ctrl;
+ static void nvme_rdma_remove_one(struct ib_device *ib_device, void
*client_data)
+ {
+ struct nvme_rdma_ctrl *ctrl;
+ struct nvme_rdma_device *ndev;
+ bool found = false;
+
+ mutex_lock(&device_list_mutex);
+ list_for_each_entry(ndev, &device_list, entry) {
+ if (ndev->dev == ib_device) {
+ found = true;
+ break;
+ }
+ }
+ mutex_unlock(&device_list_mutex);
+
+ if (!found)
+ return;
- /* Delete all controllers using this device */
- mutex_lock(&nvme_rdma_ctrl_mutex);
+ /* Delete all controllers using this device */
+ mutex_lock(&nvme_rdma_ctrl_mutex);
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1764982
Title:
[bionic] machine stuck and bonding not working well when nvmet_rdma
module is loaded
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1764982/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs