[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-06-14 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-23.25

---
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
- arm64: mmu: add the entry trampolines start/end section markers into
  sections.h
- arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
- ACPI: APEI: handle PCIe AER errors in separate function
- ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
- scsi: qla2xxx: Fix session cleanup for N2N
- scsi: qla2xxx: Remove unused argument from 
qlt_schedule_sess_for_deletion()
- scsi: qla2xxx: Serialize session deletion by using work_lock
- scsi: qla2xxx: Serialize session free in qlt_free_session_done
- scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
- scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
- scsi: qla2xxx: Prevent relogin trigger from sending too many commands
- scsi: qla2xxx: Fix double free bug after firmware timeout
- scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
- scsi: hisi_sas: dt-bindings: add an property of signal attenuation
- scsi: hisi_sas: support the property of signal attenuation for v2 hw
- scsi: hisi_sas: fix the issue of link rate inconsistency
- scsi: hisi_sas: fix the issue of setting linkrate register
- scsi: hisi_sas: increase timer expire of internal abort task
- scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
- scsi: hisi_sas: fix return value of hisi_sas_task_prep()
- scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
is loaded (LP: #1764982)
- nvmet-rdma: Don't flush system_wq by default during remove_one
- nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
(LP: #1768971)
- scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
- ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
- powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
- powerpc/64s: return more carefully from sreset NMI
- powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
- fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
- xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
- net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
- SAUCE: powerpc/perf: Fix memory allocation for core-imc based on
  num_possible_cpus()

  * Switch Build-Depends: transfig to fig2dev (LP: #1770770)
- [Config] update Build-Depends: transfig to fig2dev

  * smp_call_function_single/many core hangs with stop4 alone (LP: #1768898)
- cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer
  interrupt

  * Add d-i support for Huawei NICs (LP: #1767490)
- d-i: add hinic to nic-modules udeb

  * unregister_netdevice: waiting for eth0 to become free. Usage count = 5
(LP: #1746474)
- xfrm: reuse uncached_list to track xdsts

  * Include nfp driver in linux-modules (LP: #1768526)
- [Config] Add nfp.ko to generic inclusion list

  * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679)
- x86/xen: Reset VCPU0 info pointer after shared_info remap

  * CVE-2018-3639 (x86)
- x86/bugs: Fix the parameters alignment and missing void
- KVM: SVM: Move spec control call after restore of GS
- x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
- x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
- x86/cpufeatures: Disentangle SSBD enumeration
- x86/cpufeatures: Add FEATURE_ZEN
- x86/speculation: Handle HT correctly on AMD
- x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
- x86/speculation: Add virtualized speculative store bypass disable support
- x86/speculation: Rework speculative_store_bypass_update()
- x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
- x86/bugs: Expose x86_spec_ctrl_base directly
- x86/bugs: Remove x86_spec_ctrl_set()
- x86/bugs: Rework spec_ctrl base and mask logic
- x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
- KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
- x86/bugs: 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-06-11 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-23.25

---
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
- arm64: mmu: add the entry trampolines start/end section markers into
  sections.h
- arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
- ACPI: APEI: handle PCIe AER errors in separate function
- ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
- scsi: qla2xxx: Fix session cleanup for N2N
- scsi: qla2xxx: Remove unused argument from 
qlt_schedule_sess_for_deletion()
- scsi: qla2xxx: Serialize session deletion by using work_lock
- scsi: qla2xxx: Serialize session free in qlt_free_session_done
- scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
- scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
- scsi: qla2xxx: Prevent relogin trigger from sending too many commands
- scsi: qla2xxx: Fix double free bug after firmware timeout
- scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
- scsi: hisi_sas: dt-bindings: add an property of signal attenuation
- scsi: hisi_sas: support the property of signal attenuation for v2 hw
- scsi: hisi_sas: fix the issue of link rate inconsistency
- scsi: hisi_sas: fix the issue of setting linkrate register
- scsi: hisi_sas: increase timer expire of internal abort task
- scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
- scsi: hisi_sas: fix return value of hisi_sas_task_prep()
- scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
is loaded (LP: #1764982)
- nvmet-rdma: Don't flush system_wq by default during remove_one
- nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
(LP: #1768971)
- scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
- ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
- powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
- powerpc/64s: return more carefully from sreset NMI
- powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
- fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
- xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
- net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
- SAUCE: powerpc/perf: Fix memory allocation for core-imc based on
  num_possible_cpus()

  * Switch Build-Depends: transfig to fig2dev (LP: #1770770)
- [Config] update Build-Depends: transfig to fig2dev

  * smp_call_function_single/many core hangs with stop4 alone (LP: #1768898)
- cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer
  interrupt

  * Add d-i support for Huawei NICs (LP: #1767490)
- d-i: add hinic to nic-modules udeb

  * unregister_netdevice: waiting for eth0 to become free. Usage count = 5
(LP: #1746474)
- xfrm: reuse uncached_list to track xdsts

  * Include nfp driver in linux-modules (LP: #1768526)
- [Config] Add nfp.ko to generic inclusion list

  * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679)
- x86/xen: Reset VCPU0 info pointer after shared_info remap

  * CVE-2018-3639 (x86)
- x86/bugs: Fix the parameters alignment and missing void
- KVM: SVM: Move spec control call after restore of GS
- x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
- x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
- x86/cpufeatures: Disentangle SSBD enumeration
- x86/cpufeatures: Add FEATURE_ZEN
- x86/speculation: Handle HT correctly on AMD
- x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
- x86/speculation: Add virtualized speculative store bypass disable support
- x86/speculation: Rework speculative_store_bypass_update()
- x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
- x86/bugs: Expose x86_spec_ctrl_base directly
- x86/bugs: Remove x86_spec_ctrl_set()
- x86/bugs: Rework spec_ctrl base and mask logic
- x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
- KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
- x86/bugs: 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-06-07 Thread Talat Batheesh
** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:

  == SRU Justification ==
  This bug causes the machine to get stuck and bonding to not work when
  the nvmet_rdma module is loaded.

  Both of these commits are in mainline as of v4.17-rc1.

  == Fixes ==
  a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during 
remove_one")
  9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one")

  == Regression Potential ==
  Low.  Limited to nvme driver and tested by Mellanox.

  == Test Case ==
  A test kernel was built with these patches and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  
  == Original Bug Description ==

  Hi
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:

   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2

  The following upstream commits that fix this issue

  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * IB Device that is used by nvmet controllers is being removed,
  +  * delete all queues using this device.
  +  */
 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-06-07 Thread Talat Batheesh
Hi,

Sorry I missed that, will do it today.

yours,
Talat

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:

  == SRU Justification ==
  This bug causes the machine to get stuck and bonding to not work when
  the nvmet_rdma module is loaded.

  Both of these commits are in mainline as of v4.17-rc1.

  == Fixes ==
  a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during 
remove_one")
  9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one")

  == Regression Potential ==
  Low.  Limited to nvme driver and tested by Mellanox.

  == Test Case ==
  A test kernel was built with these patches and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  
  == Original Bug Description ==

  Hi
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:

   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2

  The following upstream commits that fix this issue

  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * IB Device that is used by nvmet controllers is being removed,
  +  * delete all queues using this device.
  +  */
  

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-06-06 Thread Stefan Bader
Any progress on the verification for this?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:

  == SRU Justification ==
  This bug causes the machine to get stuck and bonding to not work when
  the nvmet_rdma module is loaded.

  Both of these commits are in mainline as of v4.17-rc1.

  == Fixes ==
  a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during 
remove_one")
  9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one")

  == Regression Potential ==
  Low.  Limited to nvme driver and tested by Mellanox.

  == Test Case ==
  A test kernel was built with these patches and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  
  == Original Bug Description ==

  Hi
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:

   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2

  The following upstream commits that fix this issue

  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * IB Device that is used by nvmet controllers is being removed,
  +  * delete all queues using this device.
  +  */
  mutex_lock(_rdma_queue_mutex);
 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-05-24 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:

  == SRU Justification ==
  This bug causes the machine to get stuck and bonding to not work when
  the nvmet_rdma module is loaded.

  Both of these commits are in mainline as of v4.17-rc1.

  == Fixes ==
  a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during 
remove_one")
  9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one")

  == Regression Potential ==
  Low.  Limited to nvme driver and tested by Mellanox.

  == Test Case ==
  A test kernel was built with these patches and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  
  == Original Bug Description ==

  Hi
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:

   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2

  The following upstream commits that fix this issue

  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-05-23 Thread Stefan Bader
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:

  == SRU Justification ==
  This bug causes the machine to get stuck and bonding to not work when
  the nvmet_rdma module is loaded.

  Both of these commits are in mainline as of v4.17-rc1.

  == Fixes ==
  a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during 
remove_one")
  9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one")

  == Regression Potential ==
  Low.  Limited to nvme driver and tested by Mellanox.

  == Test Case ==
  A test kernel was built with these patches and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  
  == Original Bug Description ==

  Hi
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:

   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2

  The following upstream commits that fix this issue

  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-05-07 Thread Joseph Salisbury
SRU request submitted for Bionic:
https://lists.ubuntu.com/archives/kernel-team/2018-May/092158.html

** Description changed:

- Hi 
+ 
+ == SRU Justification ==
+ This bug causes the machine to get stuck and bonding to not work when
+ the nvmet_rdma module is loaded.
+ 
+ Both of these commits are in mainline as of v4.17-rc1.
+ 
+ == Fixes ==
+ a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during 
remove_one")
+ 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one")
+ 
+ == Regression Potential ==
+ Low.  Limited to nvme driver and tested by Mellanox.
+ 
+ == Test Case ==
+ A test kernel was built with these patches and tested by the original bug 
reporter.
+ The bug reporter states the test kernel resolved the bug.
+ 
+ 
+ 
+ == Original Bug Description ==
+ 
+ Hi
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.
  
  scenario:
  
-   
-  # modprobe nvmet_rdma
-  # modprobe -r bonding
-  # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
-  # ifdown eth4
-  # ifdown eth5
-  # ip addr add 15.209.12.173/8 dev bond0
-  # ip link set bond0 up
-  # echo +eth5 > /sys/class/net/bond0/bonding/slaves
-  # echo +eth4 > /sys/class/net/bond0/bonding/slaves
-  # echo -eth4 > /sys/class/net/bond0/bonding/slaves
-  # echo -eth5 > /sys/class/net/bond0/bonding/slaves
-  # echo -bond0 > /sys/class/net/bonding_masters
- 
+  # modprobe nvmet_rdma
+  # modprobe -r bonding
+  # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
+  # ifdown eth4
+  # ifdown eth5
+  # ip addr add 15.209.12.173/8 dev bond0
+  # ip link set bond0 up
+  # echo +eth5 > /sys/class/net/bond0/bonding/slaves
+  # echo +eth4 > /sys/class/net/bond0/bonding/slaves
+  # echo -eth4 > /sys/class/net/bond0/bonding/slaves
+  # echo -eth5 > /sys/class/net/bond0/bonding/slaves
+  # echo -bond0 > /sys/class/net/bonding_masters
  
  dmesg:
  
  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  
- 
  The following upstream commits that fix this issue
- 
  
  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200
  
- nvmet-rdma: Don't flush system_wq by default during remove_one
+ nvmet-rdma: Don't flush system_wq by default during remove_one
  
- The .remove_one function is called for any ib_device removal.
- In case the removed device has no reference in our driver, there
- is no need to flush the system work queue.
+ The .remove_one function is called for any ib_device removal.
+ In case the removed device has no reference in our driver, there
+ is no need to flush the system work queue.
  
- Reviewed-by: Israel Rukshin 
- Signed-off-by: Max Gurtovoy 
- Reviewed-by: Sagi Grimberg 
- Signed-off-by: Keith Busch 
- Signed-off-by: Jens Axboe 
+ Reviewed-by: Israel Rukshin 
+ Signed-off-by: Max Gurtovoy 
+ Reviewed-by: Sagi Grimberg 
+ Signed-off-by: Keith Busch 
+ Signed-off-by: Jens Axboe 
  
  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-05-01 Thread Talat Batheesh
Thank you for the build,
I tested with those patches and it is work.

Thanks,
Talat

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Hi 
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:


   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  
  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2


  The following upstream commits that fix this issue

  
  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * IB Device that is used by nvmet controllers is being removed,
  +  * delete all queues using this device.
  +  */
  mutex_lock(_rdma_queue_mutex);
  list_for_each_entry_safe(queue, tmp, _rdma_queue_list,
   queue_list) {


  
  commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:39 2018 +0200

  nvme-rdma: Don't flush delete_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-04-23 Thread Joseph Salisbury
I built a test kernel with commits 9bad0404 and a3dd7d002.  The test kernel can 
be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1764982

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and
linux-image-extra .deb packages.

Thanks in advance!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Hi 
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:


   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  
  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2


  The following upstream commits that fix this issue

  
  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * IB Device that is used by nvmet controllers is being removed,
  +  * delete all queues using this device.
  +  */
  mutex_lock(_rdma_queue_mutex);
  list_for_each_entry_safe(queue, tmp, _rdma_queue_list,
   queue_list) {


  
  commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:39 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-04-23 Thread Joseph Salisbury
** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux (Ubuntu Bionic)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Hi 
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:


   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  
  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2


  The following upstream commits that fix this issue

  
  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * IB Device that is used by nvmet controllers is being removed,
  +  * delete all queues using this device.
  +  */
  mutex_lock(_rdma_queue_mutex);
  list_for_each_entry_safe(queue, tmp, _rdma_queue_list,
   queue_list) {


  
  commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:39 2018 +0200

  nvme-rdma: Don't flush delete_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In 

[Kernel-packages] [Bug 1764982] Re: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded

2018-04-20 Thread Joseph Salisbury
** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Also affects: linux (Ubuntu Bionic)
   Importance: High
   Status: Incomplete

** Changed in: linux (Ubuntu Bionic)
   Status: Incomplete => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764982

Title:
  [bionic] machine stuck and bonding not working well when nvmet_rdma
  module is loaded

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Hi 
  Machine stuck after unregistering bonding interface when the nvmet_rdma 
module is loading.

  scenario:


   # modprobe nvmet_rdma
   # modprobe -r bonding
   # modprobe bonding  -v mode=1 miimon=100 fail_over_mac=0
   # ifdown eth4
   # ifdown eth5
   # ip addr add 15.209.12.173/8 dev bond0
   # ip link set bond0 up
   # echo +eth5 > /sys/class/net/bond0/bonding/slaves
   # echo +eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth4 > /sys/class/net/bond0/bonding/slaves
   # echo -eth5 > /sys/class/net/bond0/bonding/slaves
   # echo -bond0 > /sys/class/net/bonding_masters

  
  dmesg:

  kernel: [78348.225556] bond0 (unregistering): Released all slaves
  kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2
  kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become 
free. Usage count = 2


  The following upstream commits that fix this issue

  
  commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:38 2018 +0200

  nvmet-rdma: Don't flush system_wq by default during remove_one

  The .remove_one function is called for any ib_device removal.
  In case the removed device has no reference in our driver, there
  is no need to flush the system work queue.

  Reviewed-by: Israel Rukshin 
  Signed-off-by: Max Gurtovoy 
  Reviewed-by: Sagi Grimberg 
  Signed-off-by: Keith Busch 
  Signed-off-by: Jens Axboe 

  diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
  index aa8068f..a59263d 100644
  --- a/drivers/nvme/target/rdma.c
  +++ b/drivers/nvme/target/rdma.c
  @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
   static void nvmet_rdma_remove_one(struct ib_device *ib_device, void 
*client_data)
   {
  struct nvmet_rdma_queue *queue, *tmp;
  + struct nvmet_rdma_device *ndev;
  + bool found = false;
  +
  + mutex_lock(_list_mutex);
  + list_for_each_entry(ndev, _list, entry) {
  + if (ndev->device == ib_device) {
  + found = true;
  + break;
  + }
  + }
  + mutex_unlock(_list_mutex);
  +
  + if (!found)
  + return;

  -   /* Device is being removed, delete all queues using this device */
  + /*
  +  * IB Device that is used by nvmet controllers is being removed,
  +  * delete all queues using this device.
  +  */
  mutex_lock(_rdma_queue_mutex);
  list_for_each_entry_safe(queue, tmp, _rdma_queue_list,
   queue_list) {


  
  commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a
  Author: Max Gurtovoy 
  Date:   Wed Feb 28 13:12:39 2018 +0200

  nvme-rdma: Don't flush delete_wq by default during remove_one

  The .remove_one function is