[Kernel-packages] [Bug 1620317] xmon debug session

2016-09-28 Thread bugproxy
--- Comment on attachment From bjki...@us.ibm.com 2016-09-23 14:47 
EDT---


Looking at the blocked tasks, this looks like it could be an issue fixed 
recently upstream:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/sched/core.c?id=135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf

Gabriel is building a kernel that has this fix added and we'll kick off
a weekend run.

** Attachment added: "xmon debug session"
   https://bugs.launchpad.net/bugs/1620317/+attachment/4750567/+files/xmon.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1620317

Title:
  ISST-LTE:pNV: system ben is hung during ST (nvme)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  On when we are running I/O intensive tasks and CPU addition/removal,
  the block may hang stalling the entire machine.

  The backtrace below is one of the symptoms:

  [12747.49] ---[ end trace b4d8d720952460b5 ]---
  [12747.126885] Trying to free IRQ 357 from IRQ context!
  [12747.146930] [ cut here ]
  [12747.166674] WARNING: at 
/build/linux-iLHNl3/linux-4.4.0/kernel/irq/manage.c:1438
  [12747.184069] Modules linked in: minix nls_iso8859_1 rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) 
iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) 
mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_en(OE) 
mlx4_core(OE) binfmt_misc xfs joydev input_leds mac_hid ofpart cmdlinepart 
powernv_flash ipmi_powernv mtd ipmi_msghandler at24 opal_prd powernv_rng 
ibmpowernv uio_pdrv_genirq uio sunrpc knem(OE) autofs4 btrfs xor raid6_pq 
hid_generic usbhid hid uas usb_storage nouveau ast bnx2x i2c_algo_bit ttm 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core(OE) ahci 
drm mdio libcrc32c mlx_compat(OE) libahci vxlan nvme ip6_udp_tunnel udp_tunnel
  [12747.349013] CPU: 80 PID: 0 Comm: swapper/80 Tainted: GW  OEL  
4.4.0-21-generic #37-Ubuntu
  [12747.369046] task: c00f1fab89b0 ti: c00f1fb6c000 task.ti: 
c00f1fb6c000
  [12747.404848] NIP: c0131888 LR: c0131884 CTR: 
300303f0
  [12747.808333] REGS: c00f1fb6e550 TRAP: 0700   Tainted: GW  OEL   
(4.4.0-21-generic)
  [12747.867658] MSR: 900100029033   CR: 2802  
XER: 2000
  [12747.884783] CFAR: c0aea8f4 SOFTE: 1 
  GPR00: c0131884 c00f1fb6e7d0 c15b4200 0028 
  GPR04: c00f2a409c50 c00f2a41b4e0 000f2948 33da 
  GPR08: 0007 c0f8b27c 000f2948 90011003 
  GPR12: 2200 c7b6f800 c00f2a40a938 0100 
  GPR16: c00f1148 3a98   
  GPR20:  d9521008 d95146a0 f000 
  GPR24: c4a19ef0  0003 007d 
  GPR28: 0165 c00eefeb1800 c00eef830600 0165 
  [12748.243270] NIP [c0131888] __free_irq+0x238/0x370
  [12748.254089] LR [c0131884] __free_irq+0x234/0x370
  [12748.269738] Call Trace:
  [12748.286740] [c00f1fb6e7d0] [c0131884] __free_irq+0x234/0x370 
(unreliable)
  [12748.289687] [c00f1fb6e860] [c0131af8] free_irq+0x88/0xb0
  [12748.304594] [c00f1fb6e890] [d9514528] 
nvme_suspend_queue+0xc8/0x150 [nvme]
  [12748.333825] [c00f1fb6e8c0] [d951681c] 
nvme_dev_disable+0x3fc/0x400 [nvme]
  [12748.340913] [c00f1fb6e9a0] [d9516ae4] nvme_timeout+0xe4/0x260 
[nvme]
  [12748.357136] [c00f1fb6ea60] [c0548a34] 
blk_mq_rq_timed_out+0x64/0x110
  [12748.383939] [c00f1fb6ead0] [c054c540] bt_for_each+0x160/0x170
  [12748.399292] [c00f1fb6eb40] [c054d4e8] 
blk_mq_queue_tag_busy_iter+0x78/0x110
  [12748.402665] [c00f1fb6eb90] [c0547358] 
blk_mq_rq_timer+0x48/0x140
  [12748.438649] [c00f1fb6ebd0] [c014a13c] call_timer_fn+0x5c/0x1c0
  [12748.468126] [c00f1fb6ec60] [c014a5fc] 
run_timer_softirq+0x31c/0x3f0
  [12748.483367] [c00f1fb6ed30] [c00beb78] __do_softirq+0x188/0x3e0
  [12748.498378] [c00f1fb6ee20] [c00bf048] irq_exit+0xc8/0x100
  [12748.501048] [c00f1fb6ee40] [c001f954] timer_interrupt+0xa4/0xe0
  [12748.516377] [c00f1fb6ee70] [c0002714] 
decrementer_common+0x114/0x180
  [12748.547282] --- interrupt: 901 at arch_local_irq_restore+0x74/0x90
  [12748.547282] LR = arch_local_irq_restore+0x74/0x90
  [12748.574141] [c00f1fb6f160] [0001] 0x1 (unreliable)
  [12748.592405] [c00f1fb6f180] [c0aedc3c] dump_stack+0xd0/0xf0
  [12748.596461] 

[Kernel-packages] [Bug 1620317] xmon debug session

2016-09-23 Thread bugproxy
--- Comment on attachment From bjki...@us.ibm.com 2016-09-23 14:47 
EDT---


Looking at the blocked tasks, this looks like it could be an issue fixed 
recently upstream:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/sched/core.c?id=135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf

Gabriel is building a kernel that has this fix added and we'll kick off
a weekend run.

** Attachment added: "xmon debug session"
   https://bugs.launchpad.net/bugs/1620317/+attachment/4747315/+files/xmon.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1620317

Title:
  ISST-LTE:pNV: system ben is hung during ST (nvme)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  On when we are running I/O intensive tasks and CPU addition/removal,
  the block may hang stalling the entire machine.

  The backtrace below is one of the symptoms:

  [12747.49] ---[ end trace b4d8d720952460b5 ]---
  [12747.126885] Trying to free IRQ 357 from IRQ context!
  [12747.146930] [ cut here ]
  [12747.166674] WARNING: at 
/build/linux-iLHNl3/linux-4.4.0/kernel/irq/manage.c:1438
  [12747.184069] Modules linked in: minix nls_iso8859_1 rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) 
iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) 
mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_en(OE) 
mlx4_core(OE) binfmt_misc xfs joydev input_leds mac_hid ofpart cmdlinepart 
powernv_flash ipmi_powernv mtd ipmi_msghandler at24 opal_prd powernv_rng 
ibmpowernv uio_pdrv_genirq uio sunrpc knem(OE) autofs4 btrfs xor raid6_pq 
hid_generic usbhid hid uas usb_storage nouveau ast bnx2x i2c_algo_bit ttm 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core(OE) ahci 
drm mdio libcrc32c mlx_compat(OE) libahci vxlan nvme ip6_udp_tunnel udp_tunnel
  [12747.349013] CPU: 80 PID: 0 Comm: swapper/80 Tainted: GW  OEL  
4.4.0-21-generic #37-Ubuntu
  [12747.369046] task: c00f1fab89b0 ti: c00f1fb6c000 task.ti: 
c00f1fb6c000
  [12747.404848] NIP: c0131888 LR: c0131884 CTR: 
300303f0
  [12747.808333] REGS: c00f1fb6e550 TRAP: 0700   Tainted: GW  OEL   
(4.4.0-21-generic)
  [12747.867658] MSR: 900100029033   CR: 2802  
XER: 2000
  [12747.884783] CFAR: c0aea8f4 SOFTE: 1 
  GPR00: c0131884 c00f1fb6e7d0 c15b4200 0028 
  GPR04: c00f2a409c50 c00f2a41b4e0 000f2948 33da 
  GPR08: 0007 c0f8b27c 000f2948 90011003 
  GPR12: 2200 c7b6f800 c00f2a40a938 0100 
  GPR16: c00f1148 3a98   
  GPR20:  d9521008 d95146a0 f000 
  GPR24: c4a19ef0  0003 007d 
  GPR28: 0165 c00eefeb1800 c00eef830600 0165 
  [12748.243270] NIP [c0131888] __free_irq+0x238/0x370
  [12748.254089] LR [c0131884] __free_irq+0x234/0x370
  [12748.269738] Call Trace:
  [12748.286740] [c00f1fb6e7d0] [c0131884] __free_irq+0x234/0x370 
(unreliable)
  [12748.289687] [c00f1fb6e860] [c0131af8] free_irq+0x88/0xb0
  [12748.304594] [c00f1fb6e890] [d9514528] 
nvme_suspend_queue+0xc8/0x150 [nvme]
  [12748.333825] [c00f1fb6e8c0] [d951681c] 
nvme_dev_disable+0x3fc/0x400 [nvme]
  [12748.340913] [c00f1fb6e9a0] [d9516ae4] nvme_timeout+0xe4/0x260 
[nvme]
  [12748.357136] [c00f1fb6ea60] [c0548a34] 
blk_mq_rq_timed_out+0x64/0x110
  [12748.383939] [c00f1fb6ead0] [c054c540] bt_for_each+0x160/0x170
  [12748.399292] [c00f1fb6eb40] [c054d4e8] 
blk_mq_queue_tag_busy_iter+0x78/0x110
  [12748.402665] [c00f1fb6eb90] [c0547358] 
blk_mq_rq_timer+0x48/0x140
  [12748.438649] [c00f1fb6ebd0] [c014a13c] call_timer_fn+0x5c/0x1c0
  [12748.468126] [c00f1fb6ec60] [c014a5fc] 
run_timer_softirq+0x31c/0x3f0
  [12748.483367] [c00f1fb6ed30] [c00beb78] __do_softirq+0x188/0x3e0
  [12748.498378] [c00f1fb6ee20] [c00bf048] irq_exit+0xc8/0x100
  [12748.501048] [c00f1fb6ee40] [c001f954] timer_interrupt+0xa4/0xe0
  [12748.516377] [c00f1fb6ee70] [c0002714] 
decrementer_common+0x114/0x180
  [12748.547282] --- interrupt: 901 at arch_local_irq_restore+0x74/0x90
  [12748.547282] LR = arch_local_irq_restore+0x74/0x90
  [12748.574141] [c00f1fb6f160] [0001] 0x1 (unreliable)
  [12748.592405] [c00f1fb6f180] [c0aedc3c] dump_stack+0xd0/0xf0
  [12748.596461]