Re: lockdep badness
Hi, On Thu, 24 Jul 2008 14:23:00 -0500 Nathan Lynch [EMAIL PROTECTED] wrote: I'm seeing warnings from the lockdep code itself in recent kernels on a Power6 blade (v2.6.26 and benh's -next branch). Something to do with powerpc's lazy interrupt-disabling, perhaps? A couple of stack traces below, the first is from benh's tree, the second is from 2.6.26. The lockdep self-tests all pass at boot. RTAS daemon started RTAS: event: 295, Type: Dump Notification Event, Severity: 2 [ cut here ] Badness at kernel/lockdep.c:2719 snip I'm observing nearly the same thing (minus the ipr related badness) here with Ben's powerpc.git (master) on a JS22 blade: Freeing initrd memory: 2676k freed RTAS daemon started RTAS: event: 94, Type: Platform Information Event, Severity: 1 [ cut here ] Badness at kernel/lockdep.c:2079 NIP: c00846dc LR: c00846c0 CTR: REGS: c001e353f9c0 TRAP: 0700 Not tainted (2.6.26) MSR: 80029032 EE,ME,IR,DR CR: 2442 XER: 2010 TASK = c000efa58640[115] 'rtasd' THREAD: c001e353c000 CPU: 3 GPR00: c001e353fc40 c04efe50 0001 GPR04: 0001 c003cc94 0001 636820202d203230 GPR08: 34653030 c09627a8 c000efa58e88 0001 GPR12: 3465303038200d0a c0525900 c0395ba0 40c0 GPR16: c03943c8 0029d000 0101c920 GPR20: 0001 0744 0001 GPR24: c0538cf0 c001e3e74000 c041b48c c04c0060 GPR28: c00847a4 c000efa58640 c04b18d0 c001e353fc40 NIP [c00846dc] .trace_hardirqs_on_caller+0xc4/0x170 LR [c00846c0] .trace_hardirqs_on_caller+0xa8/0x170 Call Trace: [c001e353fcd0] [c00847a4] .trace_hardirqs_on+0x1c/0x30 [c001e353fd50] [c02df554] ._spin_unlock_irqrestore+0x68/0xb8 [c001e353fde0] [c003cc94] .pSeries_log_error+0x3a4/0x400 [c001e353fef0] [c003cf28] .rtasd+0x98/0x100 [c001e353ff90] [c00295f8] .kernel_thread+0x4c/0x68 Instruction dump: 901d07ec 880d01ca 2fa0 41be002c 481300a1 6000 2fa3 419e00a0 e93e80f8 8009 2f80 409e0090 0fe0 4888 e92d01a0 8009082c Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]). msgmni has been set to 15071 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) ... Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: lockdep badness
Nathan Lynch wrote: A couple of stack traces below, the first is from benh's tree, the second is from 2.6.26. The lockdep self-tests all pass at boot. Sorry, should have pointed out the code that is warning more specifically. RTAS daemon started RTAS: event: 295, Type: Dump Notification Event, Severity: 2 [ cut here ] Badness at kernel/lockdep.c:2719 check_flags(): if (irqs_disabled_flags(flags)) { if (DEBUG_LOCKS_WARN_ON(current-hardirqs_enabled)) { printk(possible reason: unannotated irqs-off.\n); } } else { if (DEBUG_LOCKS_WARN_ON(!current-hardirqs_enabled)) { printk(possible reason: unannotated irqs-on.\n); } } NIP: c00b06dc LR: c00b06c0 CTR: REGS: c000e787b9a0 TRAP: 0700 Not tainted (2.6.26) MSR: 80029032 EE,ME,IR,DR CR: 2442 XER: 2000 TASK = c000e91a6100[428] 'rtasd' THREAD: c000e7878000 CPU: 1 GPR00: c000e787bc20 c0a71270 0001 GPR04: 0001 c004d648 0401 0401 GPR08: c0f1f858 0020 0001 GPR12: 0001 c0aac600 002146f4 00214744 GPR16: 4204 029015b0 002a3c00 c08cc988 GPR20: 02901598 0124 0002 GPR24: 0001 0008 0001 c09fdf00 GPR28: c004d648 c0943e98 c09d9b40 c000e787bc20 NIP [c00b06dc] .check_flags+0x9c/0x174 LR [c00b06c0] .check_flags+0x80/0x174 Call Trace: [c000e787bc20] [c000e787bc70] 0xc000e787bc70 (unreliable) [c000e787bca0] [c00b5ac8] .lock_release+0x7c/0x208 [c000e787bd50] [c05e12c0] ._spin_unlock_irqrestore+0x34/0x94 [c000e787bde0] [c004d648] .pSeries_log_error+0x380/0x3f0 [c000e787bef0] [c004d8e4] .rtasd+0x98/0x100 [c000e787bf90] [c0029d20] .kernel_thread+0x4c/0x68 Instruction dump: e92d01b0 80090904 2f80 40be002c 481ece4d 6000 2fa3 419e00c4 e93e80c0 8009 2f80 409e00b4 0fe0 48ac 78290464 80090014 possible reason: unannotated irqs-on. irq event stamp: 20 hardirqs last enabled at (19): [c00b405c] .trace_hardirqs_on+0x1c/0x3 0 hardirqs last disabled at (20): [c00b0d80] .trace_hardirqs_off+0x1c/0x 30 softirqs last enabled at (0): [c008293c] .copy_process+0x3c8/0x1040 softirqs last disabled at (0): [] 0x0 === ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007) ipr :00:01.0: Found IOA with IRQ: 289 ipr :00:01.0: Starting IOA initialization sequence. ipr :00:01.0: Adapter firmware version: 02200023 ipr :00:01.0: IOA initialized. [ cut here ] Badness at kernel/lockdep.c:2037 trace_hardirqs_on(): if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) return; if (DEBUG_LOCKS_WARN_ON(current-hardirq_context)) return; NIP: c00b29fc LR: c00b29e0 CTR: REGS: cfffb890 TRAP: 0700 Not tainted (2.6.26) MSR: 80021032 ME,IR,DR CR: 2822 XER: 0004 TASK = c0996730[0] 'swapper' THREAD: c0a6 CPU: 0 GPR00: cfffbb10 c0a61ad8 0001 GPR04: 00035112 c0426708 0970 GPR08: c0ec6ad0 0001 0001 GPR12: 00043bd8 c0a9d400 002146f4 00214744 GPR16: 4204 028f37a8 002a3c00 c08b3790 GPR20: 028f3790 01a3f920 c08b37a8 c000e6252000 GPR24: c000e6454a48 00100100 00200200 c05d8824 GPR28: c0afba18 c0996730 c09cb7f0 cfffbb10 NIP [c00b29fc] .trace_hardirqs_on+0x120/0x1b0 LR [c00b29e0] .trace_hardirqs_on+0x104/0x1b0 Call Trace: [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable) [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68 [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4 [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628 [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4 [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8 [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248 [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
Re: lockdep badness
On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote: I'm seeing warnings from the lockdep code itself in recent kernels on a Power6 blade (v2.6.26 and benh's -next branch). Something to do with powerpc's lazy interrupt-disabling, perhaps? A couple of stack traces below, the first is from benh's tree, the second is from 2.6.26. The lockdep self-tests all pass at boot. Interesting. [c000e787bc20] [c000e787bc70] 0xc000e787bc70 (unreliable) [c000e787bca0] [c00b5ac8] .lock_release+0x7c/0x208 [c000e787bd50] [c05e12c0] ._spin_unlock_irqrestore+0x34/0x94 [c000e787bde0] [c004d648] .pSeries_log_error+0x380/0x3f0 [c000e787bef0] [c004d8e4] .rtasd+0x98/0x100 [c000e787bf90] [c0029d20] .kernel_thread+0x4c/0x68 Instruction dump: This one is one I haven't managed to reproduce and didn't quite find out what could be causing it, but it was already reported by Badari (and in fact is referenced as a regression in Rafael list). Call Trace: [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable) [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68 [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4 [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628 [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4 [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8 [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248 [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4 LR = .cpu_idle+0x140/0x210 [c0a63e60] [c05da07c] .rest_init+0x7c/0x98 [c0a63ee0] [c0866f10] .start_kernel+0x488/0x4b0 [c0a63f90] [c0008584] .start_here_common+0x4c/0xc8 Instruction dump: This one is new to me. I will have a look. What machine is this ? I suspect the error is to do spin_lock/unlock_irq rather than save/restore variants at IRQ time, which would be an IPR bug... or rather something legal that Ingo decided shouldn't be anymore :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: lockdep badness
On Thu, 2008-07-24 at 14:38 -0500, Nathan Lynch wrote: Nathan Lynch wrote: A couple of stack traces below, the first is from benh's tree, the second is from 2.6.26. The lockdep self-tests all pass at boot. Sorry, should have pointed out the code that is warning more specifically. RTAS daemon started RTAS: event: 295, Type: Dump Notification Event, Severity: 2 [ cut here ] Badness at kernel/lockdep.c:2719 check_flags(): if (irqs_disabled_flags(flags)) { if (DEBUG_LOCKS_WARN_ON(current-hardirqs_enabled)) { printk(possible reason: unannotated irqs-off.\n); } } else { if (DEBUG_LOCKS_WARN_ON(!current-hardirqs_enabled)) { printk(possible reason: unannotated irqs-on.\n); } } Yes, it looks like at some point the IRQ state gets out of sync, and I haven't found out where yet. Having a repro case would help, what is the setup of your machine ? I seems to be related to having an RTAS event at boot. ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007) ipr :00:01.0: Found IOA with IRQ: 289 ipr :00:01.0: Starting IOA initialization sequence. ipr :00:01.0: Adapter firmware version: 02200023 ipr :00:01.0: IOA initialized. [ cut here ] Badness at kernel/lockdep.c:2037 trace_hardirqs_on(): if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) return; if (DEBUG_LOCKS_WARN_ON(current-hardirq_context)) return; As I said before, I'm not 100% sure what that is, it smells like something legal that Ingo made not so anymore but I need to double check. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: lockdep badness
Benjamin Herrenschmidt wrote: On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote: I'm seeing warnings from the lockdep code itself in recent kernels on a Power6 blade (v2.6.26 and benh's -next branch). Something to do with powerpc's lazy interrupt-disabling, perhaps? A couple of stack traces below, the first is from benh's tree, the second is from 2.6.26. The lockdep self-tests all pass at boot. Interesting. [c000e787bc20] [c000e787bc70] 0xc000e787bc70 (unreliable) [c000e787bca0] [c00b5ac8] .lock_release+0x7c/0x208 [c000e787bd50] [c05e12c0] ._spin_unlock_irqrestore+0x34/0x94 [c000e787bde0] [c004d648] .pSeries_log_error+0x380/0x3f0 [c000e787bef0] [c004d8e4] .rtasd+0x98/0x100 [c000e787bf90] [c0029d20] .kernel_thread+0x4c/0x68 Instruction dump: This one is one I haven't managed to reproduce and didn't quite find out what could be causing it, but it was already reported by Badari (and in fact is referenced as a regression in Rafael list). Okay. Call Trace: [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable) [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68 [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4 [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628 [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4 [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8 [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248 [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4 LR = .cpu_idle+0x140/0x210 [c0a63e60] [c05da07c] .rest_init+0x7c/0x98 [c0a63ee0] [c0866f10] .start_kernel+0x488/0x4b0 [c0a63f90] [c0008584] .start_here_common+0x4c/0xc8 Instruction dump: This one is new to me. I will have a look. What machine is this ? Power6 blade - JS22 (four cores), with single disk attached via IPR, HEA for network... nothing exotic, I guess. Not sure how recreatable the ipr trace is, I've only seen it once (and with 2.6.26 only). The rtasd trace is pretty consistent on powerpc/next. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: lockdep badness
On Friday 25 July 2008, Benjamin Herrenschmidt wrote: Call Trace: [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable) [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68 [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4 [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628 [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4 [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8 [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248 [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4 LR = .cpu_idle+0x140/0x210 [c0a63e60] [c05da07c] .rest_init+0x7c/0x98 [c0a63ee0] [c0866f10] .start_kernel+0x488/0x4b0 [c0a63f90] [c0008584] .start_here_common+0x4c/0xc8 Instruction dump: This one is new to me. I will have a look. What machine is this ? I suspect the error is to do spin_lock/unlock_irq rather than save/restore variants at IRQ time, which would be an IPR bug... or rather something legal that Ingo decided shouldn't be anymore :-) Almost: The ipr_ioa_reset_done function does spin_unlock_irq(ioa_cfg-host-host_lock); scsi_unblock_requests(ioa_cfg-host); spin_lock_irq(ioa_cfg-host-host_lock); It seem that it is always called with interrupts disabled and ioa_cfg-host-host_lock held (otherwise we would get a different warning), so to get rid of the lockdep warning, it should be replaced with spin_unlock(ioa_cfg-host-host_lock); scsi_unblock_requests(ioa_cfg-host); spin_lock(ioa_cfg-host-host_lock); I.e. leave the interrupts off, but still give up the lock. It still feels wrong to do this, but I don't understand much about the driver either. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev