------- Comment From [email protected] 2017-02-06 03:07 EDT------- (In reply to comment #71) > Hi, > > Ran runltp on 17.04, and ended up with continuous call traces on console. > > ---uname output--- > Linux ltc-garri1 4.9.0-15-generic #16-Ubuntu SMP Fri Jan 20 15:28:49 UTC > 2017 ppc64le ppc64le ppc64le GNU/Linux > > ---Call Trace--- > [332850.158228] Call Trace: > [332850.158253] [c000001fcec4fb50] [c00000000001c3a0] __switch_to+0x2e0/0x4c0 > [332850.159375] [c000001fcec4fbb0] [c000000000b193a0] __schedule+0x300/0x990 > [332850.159872] [c000001fcec4fc90] [c000000000b19e88] _cond_resched+0x78/0xa0 > [332850.160475] [c000001fcec4fcc0] [d00000001c810278] > torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]
This one is weird, because torture_rwlock_write_lock_irq will acquire a lock with irq disabled, so it will not get any change to reschedule.. And the asm code for /lib/modules/4.9.0-15-generic/kernel/kernel/locking/locktorture.ko: 0000000000000248 <torture_rwlock_write_lock_irq>: 248: f8 ff 4c e8 ld r2,-8(r12) 248: R_PPC64_ENTRY *ABS* 24c: 14 62 42 7c add r2,r2,r12 250: a6 02 08 7c mflr r0 254: 01 00 00 48 bl 254 <torture_rwlock_write_lock_irq+0xc> 254: R_PPC64_REL24 _mcount 258: a6 02 08 7c mflr r0 25c: f8 ff e1 fb std r31,-8(r1) 260: 10 00 01 f8 std r0,16(r1) 264: d1 ff 21 f8 stdu r1,-48(r1) 268: 00 00 22 3d addis r9,r2,0 268: R_PPC64_TOC16_HA .toc 26c: 00 00 e9 eb ld r31,0(r9) 26c: R_PPC64_TOC16_LO_DS .toc 270: 30 00 7f 38 addi r3,r31,48 274: 01 00 00 48 bl 274 <torture_rwlock_write_lock_irq+0x2c> 274: R_PPC64_REL24 _raw_write_lock_irqsave 278: 00 00 00 60 nop 27c: 18 00 3f e9 ld r9,24(r31) 280: 40 00 69 f8 std r3,64(r9) 284: 00 00 60 38 li r3,0 288: 30 00 21 38 addi r1,r1,48 so torture_rwlock_write_lock_irq+0x30/0x60 is 278: 00 00 00 60 nop right after the return of call to _raw_write_lock_irqsave, which should be fixed up by the instruction that recover the toc, IIUC. How that end up being a call to _cond_resched() is still unknown. Besides, Pooja, I cannot reproduce this in a PowerKVM guest with 16 VCPUs. What's your setup for the machine that hit the problem? And how many times you need to run the testcase in order to hit one? Regards, Boqun > [332850.161157] [c000001fcec4fcf0] [d00000001c810988] > lock_torture_writer+0xc0/0x270 [locktorture] > [332850.170046] [c000001fcec4fd80] [c0000000000f53c0] kthread+0x110/0x130 > [332850.173108] [c000001fcec4fe30] [c00000000000c0e8] > ret_from_kernel_thread+0x5c/0x74 > [332850.174169] Task dump for CPU 10: > [332850.174202] lock_torture_wr R running task 0 37019 2 > 0x00000804 > [332850.174734] Call Trace: > [332850.175218] [c000001fcf3ab980] [c0000000000d0008] irq_exit+0xe8/0x120 > (unreliable) > [332850.175813] [c000001fcf3ab9a0] [c000000000024e10] > timer_interrupt+0xa0/0xe0 > [332850.176438] --- interrupt: d00000001c6008c0 at > finish_task_switch+0xb0/0x340 > [332850.176438] LR = stutter_wait+0x8c/0x130 [torture] > [332850.177950] [c000001fcf3ab9d0] [c000000000009514] > decrementer_common+0x114/0x180 (unreliable) > [332850.178517] --- interrupt: 901 at torture_shutdown_init+0x2a4/0x770 > [torture] > [332850.178517] LR = stutter_wait+0x8c/0x130 [torture] > [332850.179557] [c000001fcf3abcc0] [d00000001c810278] > torture_rwlock_write_lock_irq+0x30/0x60 [locktorture] (unreliable) > [332850.180675] [c000001fcf3abcf0] [d00000001c810988] > lock_torture_writer+0xc0/0x270 [locktorture] > [332850.181646] [c000001fcf3abd80] [c0000000000f53c0] kthread+0x110/0x130 > [332850.194048] [c000001fcf3abe30] [c00000000000c0e8] > ret_from_kernel_thread+0x5c/0x74 > [332850.196658] Task dump for CPU 12: > [332850.198840] lock_torture_wr R running task 0 36921 2 > 0x00000804 > [332850.201377] Call Trace: > [332850.201402] [c000001fc3747890] [c00000000016127c] > hrtimer_interrupt+0x11c/0x2a0 (unreliable) > [332850.206296] [c000001fc3747a60] [c00000000001c3a0] __switch_to+0x2e0/0x4c0 > [332850.208647] [c000001fc3747ac0] [c000000000b193a0] __schedule+0x300/0x990 > [332850.211007] [c000001fc3747b00] [c00000000015bc78] > lock_timer_base+0xa8/0x100 > [332850.214641] [c000001fc3747b60] [c00000000015bde8] > try_to_del_timer_sync+0x68/0xa0 > [332850.215236] [c000001fc3747ba0] [c00000000015be90] > del_timer_sync+0x70/0x90 > [332850.215817] [c000001fc3747bd0] [c000000000b1de14] > schedule_timeout+0x234/0x470 > [332850.216369] [c000001fc3747cc0] [d00000001c810278] > torture_rwlock_write_lock_irq+0x30/0x60 [locktorture] > [332850.217397] [c000001fc3747cf0] [d00000001c810988] > lock_torture_writer+0xc0/0x270 [locktorture] > [332850.218433] [c000001fc3747d80] [c0000000000f53c0] kthread+0x110/0x130 > [332850.219013] [c000001fc3747e30] [c00000000000c0e8] > ret_from_kernel_thread+0x5c/0x74 > > > Thanks, > Pooja -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1483343 Title: NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483343/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
