[Bug 1483343] Comment bridged from LTC Bugzilla

bugproxy Mon, 06 Feb 2017 00:26:14 -0800

------- Comment From [email protected] 2017-02-06 03:07 EDT-------
(In reply to comment #71)
> Hi,
>
> Ran runltp on 17.04, and ended up with continuous call traces on console.
>
> ---uname output---
> Linux ltc-garri1 4.9.0-15-generic #16-Ubuntu SMP Fri Jan 20 15:28:49 UTC
> 2017 ppc64le ppc64le ppc64le GNU/Linux
>
> ---Call Trace---
> [332850.158228] Call Trace:
> [332850.158253] [c000001fcec4fb50] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
> [332850.159375] [c000001fcec4fbb0] [c000000000b193a0] __schedule+0x300/0x990
> [332850.159872] [c000001fcec4fc90] [c000000000b19e88] _cond_resched+0x78/0xa0
> [332850.160475] [c000001fcec4fcc0] [d00000001c810278]
> torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]


This one is weird, because torture_rwlock_write_lock_irq will acquire a
lock with irq disabled, so it will not get any change to reschedule..

And the asm code for
/lib/modules/4.9.0-15-generic/kernel/kernel/locking/locktorture.ko:

0000000000000248 <torture_rwlock_write_lock_irq>:
248:       f8 ff 4c e8     ld      r2,-8(r12)
248: R_PPC64_ENTRY      *ABS*
24c:       14 62 42 7c     add     r2,r2,r12
250:       a6 02 08 7c     mflr    r0
254:       01 00 00 48     bl      254 <torture_rwlock_write_lock_irq+0xc>
254: R_PPC64_REL24      _mcount
258:       a6 02 08 7c     mflr    r0
25c:       f8 ff e1 fb     std     r31,-8(r1)
260:       10 00 01 f8     std     r0,16(r1)
264:       d1 ff 21 f8     stdu    r1,-48(r1)
268:       00 00 22 3d     addis   r9,r2,0
268: R_PPC64_TOC16_HA   .toc
26c:       00 00 e9 eb     ld      r31,0(r9)
26c: R_PPC64_TOC16_LO_DS        .toc
270:       30 00 7f 38     addi    r3,r31,48
274:       01 00 00 48     bl      274 <torture_rwlock_write_lock_irq+0x2c>
274: R_PPC64_REL24      _raw_write_lock_irqsave
278:       00 00 00 60     nop
27c:       18 00 3f e9     ld      r9,24(r31)
280:       40 00 69 f8     std     r3,64(r9)
284:       00 00 60 38     li      r3,0
288:       30 00 21 38     addi    r1,r1,48

so torture_rwlock_write_lock_irq+0x30/0x60 is

278:       00 00 00 60     nop

right after the return of call to _raw_write_lock_irqsave, which should
be fixed up by the instruction that recover the toc, IIUC. How that end
up being a call to _cond_resched() is still unknown.

Besides, Pooja, I cannot reproduce this in a PowerKVM guest with 16
VCPUs. What's your setup for the machine that hit the problem? And how
many times you need to run the testcase in order to hit one?

Regards,
Boqun

> [332850.161157] [c000001fcec4fcf0] [d00000001c810988]
> lock_torture_writer+0xc0/0x270 [locktorture]
> [332850.170046] [c000001fcec4fd80] [c0000000000f53c0] kthread+0x110/0x130
> [332850.173108] [c000001fcec4fe30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
> [332850.174169] Task dump for CPU 10:
> [332850.174202] lock_torture_wr R  running task        0 37019      2
> 0x00000804
> [332850.174734] Call Trace:
> [332850.175218] [c000001fcf3ab980] [c0000000000d0008] irq_exit+0xe8/0x120
> (unreliable)
> [332850.175813] [c000001fcf3ab9a0] [c000000000024e10]
> timer_interrupt+0xa0/0xe0
> [332850.176438] --- interrupt: d00000001c6008c0 at
> finish_task_switch+0xb0/0x340
> [332850.176438]     LR = stutter_wait+0x8c/0x130 [torture]
> [332850.177950] [c000001fcf3ab9d0] [c000000000009514]
> decrementer_common+0x114/0x180 (unreliable)
> [332850.178517] --- interrupt: 901 at torture_shutdown_init+0x2a4/0x770
> [torture]
> [332850.178517]     LR = stutter_wait+0x8c/0x130 [torture]
> [332850.179557] [c000001fcf3abcc0] [d00000001c810278]
> torture_rwlock_write_lock_irq+0x30/0x60 [locktorture] (unreliable)
> [332850.180675] [c000001fcf3abcf0] [d00000001c810988]
> lock_torture_writer+0xc0/0x270 [locktorture]
> [332850.181646] [c000001fcf3abd80] [c0000000000f53c0] kthread+0x110/0x130
> [332850.194048] [c000001fcf3abe30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
> [332850.196658] Task dump for CPU 12:
> [332850.198840] lock_torture_wr R  running task        0 36921      2
> 0x00000804
> [332850.201377] Call Trace:
> [332850.201402] [c000001fc3747890] [c00000000016127c]
> hrtimer_interrupt+0x11c/0x2a0 (unreliable)
> [332850.206296] [c000001fc3747a60] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
> [332850.208647] [c000001fc3747ac0] [c000000000b193a0] __schedule+0x300/0x990
> [332850.211007] [c000001fc3747b00] [c00000000015bc78]
> lock_timer_base+0xa8/0x100
> [332850.214641] [c000001fc3747b60] [c00000000015bde8]
> try_to_del_timer_sync+0x68/0xa0
> [332850.215236] [c000001fc3747ba0] [c00000000015be90]
> del_timer_sync+0x70/0x90
> [332850.215817] [c000001fc3747bd0] [c000000000b1de14]
> schedule_timeout+0x234/0x470
> [332850.216369] [c000001fc3747cc0] [d00000001c810278]
> torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]
> [332850.217397] [c000001fc3747cf0] [d00000001c810988]
> lock_torture_writer+0xc0/0x270 [locktorture]
> [332850.218433] [c000001fc3747d80] [c0000000000f53c0] kthread+0x110/0x130
> [332850.219013] [c000001fc3747e30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
>
>
> Thanks,
> Pooja

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1483343

Title:
  NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr
  tests

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483343/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1483343] Comment bridged from LTC Bugzilla

Reply via email to