Re: lockdep badness

2008-07-25 Thread Sebastien Dugue

  Hi,

On Thu, 24 Jul 2008 14:23:00 -0500 Nathan Lynch [EMAIL PROTECTED] wrote:

 I'm seeing warnings from the lockdep code itself in recent kernels on
 a Power6 blade (v2.6.26 and benh's -next branch).
 
 Something to do with powerpc's lazy interrupt-disabling, perhaps?
 
 A couple of stack traces below, the first is from benh's tree, the
 second is from 2.6.26.  The lockdep self-tests all pass at boot.
 
 RTAS daemon started
 RTAS: event: 295, Type: Dump Notification Event, Severity: 2
 [ cut here ]
 Badness at kernel/lockdep.c:2719

snip

  I'm observing nearly the same thing (minus the ipr related badness) here with
Ben's powerpc.git (master) on a JS22 blade:

Freeing initrd memory: 2676k freed
RTAS daemon started
RTAS: event: 94, Type: Platform Information Event, Severity: 1
[ cut here ]
Badness at kernel/lockdep.c:2079
NIP: c00846dc LR: c00846c0 CTR: 
REGS: c001e353f9c0 TRAP: 0700   Not tainted  (2.6.26)
MSR: 80029032 EE,ME,IR,DR  CR: 2442  XER: 2010
TASK = c000efa58640[115] 'rtasd' THREAD: c001e353c000 CPU: 3
GPR00:  c001e353fc40 c04efe50 0001 
GPR04: 0001 c003cc94 0001 636820202d203230 
GPR08: 34653030 c09627a8 c000efa58e88 0001 
GPR12: 3465303038200d0a c0525900 c0395ba0 40c0 
GPR16: c03943c8  0029d000 0101c920 
GPR20:  0001 0744 0001 
GPR24: c0538cf0 c001e3e74000 c041b48c c04c0060 
GPR28: c00847a4 c000efa58640 c04b18d0 c001e353fc40 
NIP [c00846dc] .trace_hardirqs_on_caller+0xc4/0x170
LR [c00846c0] .trace_hardirqs_on_caller+0xa8/0x170
Call Trace:
[c001e353fcd0] [c00847a4] .trace_hardirqs_on+0x1c/0x30
[c001e353fd50] [c02df554] ._spin_unlock_irqrestore+0x68/0xb8
[c001e353fde0] [c003cc94] .pSeries_log_error+0x3a4/0x400
[c001e353fef0] [c003cf28] .rtasd+0x98/0x100
[c001e353ff90] [c00295f8] .kernel_thread+0x4c/0x68
Instruction dump:
901d07ec 880d01ca 2fa0 41be002c 481300a1 6000 2fa3 419e00a0 
e93e80f8 8009 2f80 409e0090 0fe0 4888 e92d01a0 8009082c 
Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]).
msgmni has been set to 15071
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)

 ...


  Sebastien.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: lockdep badness

2008-07-24 Thread Nathan Lynch
Nathan Lynch wrote:
 
 A couple of stack traces below, the first is from benh's tree, the
 second is from 2.6.26.  The lockdep self-tests all pass at boot.

Sorry, should have pointed out the code that is warning more
specifically.


 RTAS daemon started
 RTAS: event: 295, Type: Dump Notification Event, Severity: 2
 [ cut here ]
 Badness at kernel/lockdep.c:2719

check_flags():

if (irqs_disabled_flags(flags)) {
if (DEBUG_LOCKS_WARN_ON(current-hardirqs_enabled)) {
printk(possible reason: unannotated
irqs-off.\n);
}
} else {
 if (DEBUG_LOCKS_WARN_ON(!current-hardirqs_enabled)) {
printk(possible reason: unannotated
irqs-on.\n);
}
}

 NIP: c00b06dc LR: c00b06c0 CTR: 
 REGS: c000e787b9a0 TRAP: 0700   Not tainted  (2.6.26)
 MSR: 80029032 EE,ME,IR,DR  CR: 2442  XER: 2000
 TASK = c000e91a6100[428] 'rtasd' THREAD: c000e7878000 CPU: 1
 GPR00:  c000e787bc20 c0a71270 0001 
 GPR04: 0001 c004d648 0401 0401 
 GPR08:  c0f1f858 0020 0001 
 GPR12: 0001 c0aac600 002146f4 00214744 
 GPR16: 4204 029015b0 002a3c00 c08cc988 
 GPR20: 02901598  0124 0002 
 GPR24: 0001 0008 0001 c09fdf00 
 GPR28: c004d648 c0943e98 c09d9b40 c000e787bc20 
 NIP [c00b06dc] .check_flags+0x9c/0x174
 LR [c00b06c0] .check_flags+0x80/0x174
 Call Trace:
 [c000e787bc20] [c000e787bc70] 0xc000e787bc70 (unreliable)
 [c000e787bca0] [c00b5ac8] .lock_release+0x7c/0x208
 [c000e787bd50] [c05e12c0] ._spin_unlock_irqrestore+0x34/0x94
 [c000e787bde0] [c004d648] .pSeries_log_error+0x380/0x3f0
 [c000e787bef0] [c004d8e4] .rtasd+0x98/0x100
 [c000e787bf90] [c0029d20] .kernel_thread+0x4c/0x68
 Instruction dump:
 e92d01b0 80090904 2f80 40be002c 481ece4d 6000 2fa3 419e00c4 
 e93e80c0 8009 2f80 409e00b4 0fe0 48ac 78290464 80090014 
 possible reason: unannotated irqs-on.
 irq event stamp: 20
 hardirqs last  enabled at (19): [c00b405c] 
 .trace_hardirqs_on+0x1c/0x3
 0
 hardirqs last disabled at (20): [c00b0d80] 
 .trace_hardirqs_off+0x1c/0x
 30
 softirqs last  enabled at (0): [c008293c] .copy_process+0x3c8/0x1040
 softirqs last disabled at (0): [] 0x0
 
 ===
 
 ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
 ipr :00:01.0: Found IOA with IRQ: 289
 ipr :00:01.0: Starting IOA initialization sequence.
 ipr :00:01.0: Adapter firmware version: 02200023
 ipr :00:01.0: IOA initialized.
 [ cut here ]
 Badness at kernel/lockdep.c:2037

trace_hardirqs_on():

if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return;
 if (DEBUG_LOCKS_WARN_ON(current-hardirq_context))
return;


 NIP: c00b29fc LR: c00b29e0 CTR: 
 REGS: cfffb890 TRAP: 0700   Not tainted  (2.6.26)
 MSR: 80021032 ME,IR,DR  CR: 2822  XER: 0004
 TASK = c0996730[0] 'swapper' THREAD: c0a6 CPU: 0
 GPR00:  cfffbb10 c0a61ad8 0001 
 GPR04: 00035112 c0426708  0970 
 GPR08:  c0ec6ad0 0001 0001 
 GPR12: 00043bd8 c0a9d400 002146f4 00214744 
 GPR16: 4204 028f37a8 002a3c00 c08b3790 
 GPR20: 028f3790 01a3f920 c08b37a8 c000e6252000 
 GPR24: c000e6454a48 00100100 00200200 c05d8824 
 GPR28: c0afba18 c0996730 c09cb7f0 cfffbb10 
 NIP [c00b29fc] .trace_hardirqs_on+0x120/0x1b0
 LR [c00b29e0] .trace_hardirqs_on+0x104/0x1b0
 Call Trace:
 [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable)
 [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68
 [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac
 [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
 [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628
 [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4
 [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8
 [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c
 [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248
 [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c
 --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4

Re: lockdep badness

2008-07-24 Thread Benjamin Herrenschmidt
On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote:
 I'm seeing warnings from the lockdep code itself in recent kernels on
 a Power6 blade (v2.6.26 and benh's -next branch).
 
 Something to do with powerpc's lazy interrupt-disabling, perhaps?
 
 A couple of stack traces below, the first is from benh's tree, the
 second is from 2.6.26.  The lockdep self-tests all pass at boot.

Interesting.

 [c000e787bc20] [c000e787bc70] 0xc000e787bc70 (unreliable)
 [c000e787bca0] [c00b5ac8] .lock_release+0x7c/0x208
 [c000e787bd50] [c05e12c0] ._spin_unlock_irqrestore+0x34/0x94
 [c000e787bde0] [c004d648] .pSeries_log_error+0x380/0x3f0
 [c000e787bef0] [c004d8e4] .rtasd+0x98/0x100
 [c000e787bf90] [c0029d20] .kernel_thread+0x4c/0x68
 Instruction dump:

This one is one I haven't managed to reproduce and didn't quite find out
what could be causing it, but it was already reported by Badari (and in
fact is referenced as a regression in Rafael list).

 Call Trace:
 [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable)
 [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68
 [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac
 [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
 [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628
 [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4
 [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8
 [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c
 [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248
 [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c
 --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
 LR = .cpu_idle+0x140/0x210
 [c0a63e60] [c05da07c] .rest_init+0x7c/0x98
 [c0a63ee0] [c0866f10] .start_kernel+0x488/0x4b0
 [c0a63f90] [c0008584] .start_here_common+0x4c/0xc8
 Instruction dump:

This one is new to me. I will have a look. What machine is this ?

I suspect the error is to do spin_lock/unlock_irq rather than
save/restore variants at IRQ time, which would be an IPR bug... or
rather something legal that Ingo decided shouldn't be anymore :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: lockdep badness

2008-07-24 Thread Benjamin Herrenschmidt
On Thu, 2008-07-24 at 14:38 -0500, Nathan Lynch wrote:
 Nathan Lynch wrote:
  
  A couple of stack traces below, the first is from benh's tree, the
  second is from 2.6.26.  The lockdep self-tests all pass at boot.
 
 Sorry, should have pointed out the code that is warning more
 specifically.
 
 
  RTAS daemon started
  RTAS: event: 295, Type: Dump Notification Event, Severity: 2
  [ cut here ]
  Badness at kernel/lockdep.c:2719
 
 check_flags():
 
 if (irqs_disabled_flags(flags)) {
 if (DEBUG_LOCKS_WARN_ON(current-hardirqs_enabled)) {
 printk(possible reason: unannotated
 irqs-off.\n);
 }
 } else {
  if (DEBUG_LOCKS_WARN_ON(!current-hardirqs_enabled)) {
 printk(possible reason: unannotated
 irqs-on.\n);
 }
 }

Yes, it looks like at some point the IRQ state gets out of sync, and I
haven't found out where yet. Having a repro case would help, what is the
setup of your machine ? I seems to be related to having an RTAS event at
boot.


  ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
  ipr :00:01.0: Found IOA with IRQ: 289
  ipr :00:01.0: Starting IOA initialization sequence.
  ipr :00:01.0: Adapter firmware version: 02200023
  ipr :00:01.0: IOA initialized.
  [ cut here ]
  Badness at kernel/lockdep.c:2037
 
 trace_hardirqs_on():
 
 if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
 return;
  if (DEBUG_LOCKS_WARN_ON(current-hardirq_context))
 return;

As I said before, I'm not 100% sure what that is, it smells like
something legal that Ingo made not so anymore but I need to double
check.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: lockdep badness

2008-07-24 Thread Nathan Lynch
Benjamin Herrenschmidt wrote:
 On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote:
  I'm seeing warnings from the lockdep code itself in recent kernels on
  a Power6 blade (v2.6.26 and benh's -next branch).
  
  Something to do with powerpc's lazy interrupt-disabling, perhaps?
  
  A couple of stack traces below, the first is from benh's tree, the
  second is from 2.6.26.  The lockdep self-tests all pass at boot.
 
 Interesting.
 
  [c000e787bc20] [c000e787bc70] 0xc000e787bc70 (unreliable)
  [c000e787bca0] [c00b5ac8] .lock_release+0x7c/0x208
  [c000e787bd50] [c05e12c0] ._spin_unlock_irqrestore+0x34/0x94
  [c000e787bde0] [c004d648] .pSeries_log_error+0x380/0x3f0
  [c000e787bef0] [c004d8e4] .rtasd+0x98/0x100
  [c000e787bf90] [c0029d20] .kernel_thread+0x4c/0x68
  Instruction dump:
 
 This one is one I haven't managed to reproduce and didn't quite find out
 what could be causing it, but it was already reported by Badari (and in
 fact is referenced as a regression in Rafael list).

Okay.



  Call Trace:
  [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable)
  [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68
  [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac
  [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
  [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628
  [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4
  [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8
  [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c
  [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248
  [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c
  --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
  LR = .cpu_idle+0x140/0x210
  [c0a63e60] [c05da07c] .rest_init+0x7c/0x98
  [c0a63ee0] [c0866f10] .start_kernel+0x488/0x4b0
  [c0a63f90] [c0008584] .start_here_common+0x4c/0xc8
  Instruction dump:
 
 This one is new to me. I will have a look. What machine is this ?

Power6 blade - JS22 (four cores), with single disk attached via IPR,
HEA for network... nothing exotic, I guess.  Not sure how recreatable
the ipr trace is, I've only seen it once (and with 2.6.26 only).  The
rtasd trace is pretty consistent on powerpc/next.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: lockdep badness

2008-07-24 Thread Arnd Bergmann
On Friday 25 July 2008, Benjamin Herrenschmidt wrote:
  Call Trace:
  [cfffbb10] [cfffbbb0] 0xcfffbbb0 (unreliable)
  [cfffbbb0] [c05d8824] ._spin_unlock_irq+0x40/0x68
  [cfffbc40] [c0426708] .ipr_ioa_reset_done+0x218/0x2ac
  [cfffbd00] [c041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
  [cfffbd90] [c0424ffc] .ipr_isr+0x280/0x628
  [cfffbe50] [c00ccc70] .handle_IRQ_event+0x58/0xd4
  [cfffbef0] [c00cef4c] .handle_fasteoi_irq+0x128/0x1c8
  [cfffbf90] [c0029918] .call_handle_irq+0x1c/0x2c
  [c0a63a20] [c000d9cc] .do_IRQ+0x138/0x248
  [c0a63ad0] [c0004ca8] hardware_interrupt_entry+0x28/0x2c
  --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
      LR = .cpu_idle+0x140/0x210
  [c0a63e60] [c05da07c] .rest_init+0x7c/0x98
  [c0a63ee0] [c0866f10] .start_kernel+0x488/0x4b0
  [c0a63f90] [c0008584] .start_here_common+0x4c/0xc8
  Instruction dump:
 
 This one is new to me. I will have a look. What machine is this ?
 
 I suspect the error is to do spin_lock/unlock_irq rather than
 save/restore variants at IRQ time, which would be an IPR bug... or
 rather something legal that Ingo decided shouldn't be anymore :-)

Almost: The ipr_ioa_reset_done function does

spin_unlock_irq(ioa_cfg-host-host_lock);
scsi_unblock_requests(ioa_cfg-host);
spin_lock_irq(ioa_cfg-host-host_lock);

It seem that it is always called with interrupts disabled and
ioa_cfg-host-host_lock held (otherwise we would get a different
warning), so to get rid of the lockdep warning, it should be
replaced with

spin_unlock(ioa_cfg-host-host_lock);
scsi_unblock_requests(ioa_cfg-host);
spin_lock(ioa_cfg-host-host_lock);

I.e. leave the interrupts off, but still give up the lock. It
still feels wrong to do this, but I don't understand much about
the driver either.

Arnd 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev