Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Andrew Morton
> On Mon, 12 Mar 2007 10:52:22 +0800 Joe Jin <[EMAIL PROTECTED]> wrote:
> > The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
> > been fixed in mainline by other means?
> 
> I cannot confirm if it have fixed in latest kernel, the server is a
> production system, it's hard to debug it and try reproduce.

Well.  That makes it hard to run tests, but perhaps it can be determined
from code review..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Joe Jin
> The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
> been fixed in mainline by other means?

I cannot confirm if it have fixed in latest kernel, the server is a
production system, it's hard to debug it and try reproduce.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Joe Jin
> 
> This is a bug actually in the megaraid.

Aha, I'll track it.

> 
> And this is a direct command submission path:  it already passed both
> online check gates in this path *after* the device was offlined, so
> adding a third won't fix this. 

Yeah, I have notice that, however, from the logs, the device have offline, 
but why still can send cmd to device? isn't the sequences of printk suspectful?

> single disk, so the I/O was definitely bound for sda?  Secondly, can you
> reproduce with a modern (2.6.20) kernel.  Your trace strongly suggests
> that the device came back online for some reason and then the megaraid
> driver died.

It's hard to update the kernel for the system is a production system, and we
cannot debug it at the box :( 

I dont know if you have notice, the logs come from diskdump, if it caused by
diskdump?

Thanks,
Joe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread James Bottomley
On Fri, 2007-03-09 at 09:40 +0800, Joe Jin wrote:
> > What's the error you're trying to fix?  scsi_dispatch_cmd() is only
> > called from scsi_request_fn() which already has an equivalent of this
> > check in it just prior to calling dispatch.
> 
> Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash
> info as following at rhel4 2.6.9-42.0.2.ELsmp,

This kernel is way to old to debug ...

However: 
> scsi0 (0:0): rejecting I/O to offline device
> ...
> EXT3-fs error (device sda8) in start_transaction: Journal has aborted
> 
> Unable to handle kernel NULL pointer dereference at  RIP: 
> {:megaraid_mbox:megaraid_queue_command+2634}

This is a bug actually in the megaraid.

> PML4 21a25d067 PGD 2170ac067 PMD 0 
> Oops: 0002 [1] SMP 
> CPU 0 
> Modules linked in: hangcheck_timer mptctl mptbase ipmi_devintf ipmi_si 
> ipmi_msghandler dell_rbu netconsole netdump autofs4 i2c_dev i2c_core ocfs2(U) 
> debugfs(U) nfs lockd nfs_acl ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) 
> configfs(U) sunrpc ds yenta_socket pcmcia_core ide_dump scsi_dump diskdump 
> zlib_deflate dm_mirror dm_multipath dm_mod emcphr(U) emcpmpap(U) emcpmpaa(U) 
> emcpmpc(U) emcpmp(U) emcp(U) emcplib(U) button battery ac joydev uhci_hcd 
> ehci_hcd hw_random tg3 e1000 bond0(U) floppy sg ext3 jbd lpfc 
> scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod
> Pid: 13238, comm: emagent Tainted: P  2.6.9-42.0.2.ELsmp
> RIP: 0010:[] 
> {:megaraid_mbox:megaraid_queue_command+2634}
> RSP: 0018:01019b5a9b48  EFLAGS: 00010002
> RAX: 000220b8e000 RBX: 0102ffd1b048 RCX: 
> RDX:  RSI: 0001 RDI: 010431124bf0
> RBP: 0001 R08:  R09: 010133ce5b80
> R10: 0102ffd3e5a0 R11: 0060 R12: 010133ce5b80
> R13: 0102ffd3e480 R14: 0100bfb4c8b8 R15: 0101ffcf4000
> FS:  () GS:804e5180(005b) knlGS:f47ffbb0
> CS:  0010 DS: 002b ES: 002b CR0: 8005003b
> CR2:  CR3: 00101000 CR4: 06e0
> Process emagent (pid: 13238, threadinfo 01019b5a8000, task 
> 01003e5a8030)
> Stack:  0046 0046 0102ffd3e480 
>0101fff73980 8015cb38 0100bfb4d4aa 0100bfb4d4a2 
>0100bfb4c8b8 01010080 
> Call Trace:{mempool_alloc+129} 
> {:scsi_mod:scsi_done+0} 
>{__mod_timer+113} 
> {:scsi_mod:scsi_dispatch_cmd+595} 
>{:scsi_mod:scsi_request_fn+990} 
> {generic_unplug_device+24} 
>{__wait_on_buffer+120} 
> {bh_wake_function+0} 
>{bh_wake_function+0} 
> {:ext3:ext3_bread+96} 
>{:ext3:htree_dirblock_to_tree+50} 
>{:ext3:ext3_htree_fill_tree+295} 
>{filldir64+122} {filldir64+0} 
>{:ext3:ext3_readdir+371} {dput+56} 
>{filldir64+0} {path_release+12} 
>{compat_sys_statfs+105} 
> {filldir64+0} 
>{vfs_readdir+155} 
> {sys_getdents64+118} 
>{sysenter_do_call+27} 

And this is a direct command submission path:  it already passed both
online check gates in this path *after* the device was offlined, so
adding a third won't fix this.  Firstly, I'm assuming you have only a
single disk, so the I/O was definitely bound for sda?  Secondly, can you
reproduce with a modern (2.6.20) kernel.  Your trace strongly suggests
that the device came back online for some reason and then the megaraid
driver died.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Andrew Morton
> On Fri, 9 Mar 2007 09:40:40 +0800 Joe Jin <[EMAIL PROTECTED]> wrote:
> > What's the error you're trying to fix?  scsi_dispatch_cmd() is only
> > called from scsi_request_fn() which already has an equivalent of this
> > check in it just prior to calling dispatch.
> 
> Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash
> info as following at rhel4 2.6.9-42.0.2.ELsmp,

The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
been fixed in mainline by other means?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Andrew Morton
 On Fri, 9 Mar 2007 09:40:40 +0800 Joe Jin [EMAIL PROTECTED] wrote:
  What's the error you're trying to fix?  scsi_dispatch_cmd() is only
  called from scsi_request_fn() which already has an equivalent of this
  check in it just prior to calling dispatch.
 
 Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash
 info as following at rhel4 2.6.9-42.0.2.ELsmp,

The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
been fixed in mainline by other means?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread James Bottomley
On Fri, 2007-03-09 at 09:40 +0800, Joe Jin wrote:
  What's the error you're trying to fix?  scsi_dispatch_cmd() is only
  called from scsi_request_fn() which already has an equivalent of this
  check in it just prior to calling dispatch.
 
 Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash
 info as following at rhel4 2.6.9-42.0.2.ELsmp,

This kernel is way to old to debug ...

However: 
 scsi0 (0:0): rejecting I/O to offline device
 ...
 EXT3-fs error (device sda8) in start_transaction: Journal has aborted
 
 Unable to handle kernel NULL pointer dereference at  RIP: 
 a0031e66{:megaraid_mbox:megaraid_queue_command+2634}

This is a bug actually in the megaraid.

 PML4 21a25d067 PGD 2170ac067 PMD 0 
 Oops: 0002 [1] SMP 
 CPU 0 
 Modules linked in: hangcheck_timer mptctl mptbase ipmi_devintf ipmi_si 
 ipmi_msghandler dell_rbu netconsole netdump autofs4 i2c_dev i2c_core ocfs2(U) 
 debugfs(U) nfs lockd nfs_acl ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) 
 configfs(U) sunrpc ds yenta_socket pcmcia_core ide_dump scsi_dump diskdump 
 zlib_deflate dm_mirror dm_multipath dm_mod emcphr(U) emcpmpap(U) emcpmpaa(U) 
 emcpmpc(U) emcpmp(U) emcp(U) emcplib(U) button battery ac joydev uhci_hcd 
 ehci_hcd hw_random tg3 e1000 bond0(U) floppy sg ext3 jbd lpfc 
 scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod
 Pid: 13238, comm: emagent Tainted: P  2.6.9-42.0.2.ELsmp
 RIP: 0010:[a0031e66] 
 a0031e66{:megaraid_mbox:megaraid_queue_command+2634}
 RSP: 0018:01019b5a9b48  EFLAGS: 00010002
 RAX: 000220b8e000 RBX: 0102ffd1b048 RCX: 
 RDX:  RSI: 0001 RDI: 010431124bf0
 RBP: 0001 R08:  R09: 010133ce5b80
 R10: 0102ffd3e5a0 R11: 0060 R12: 010133ce5b80
 R13: 0102ffd3e480 R14: 0100bfb4c8b8 R15: 0101ffcf4000
 FS:  () GS:804e5180(005b) knlGS:f47ffbb0
 CS:  0010 DS: 002b ES: 002b CR0: 8005003b
 CR2:  CR3: 00101000 CR4: 06e0
 Process emagent (pid: 13238, threadinfo 01019b5a8000, task 
 01003e5a8030)
 Stack:  0046 0046 0102ffd3e480 
0101fff73980 8015cb38 0100bfb4d4aa 0100bfb4d4a2 
0100bfb4c8b8 01010080 
 Call Trace:8015cb38{mempool_alloc+129} 
 a0002874{:scsi_mod:scsi_done+0} 
8013fc00{__mod_timer+113} 
 a0002adf{:scsi_mod:scsi_dispatch_cmd+595} 
a0007a72{:scsi_mod:scsi_request_fn+990} 
 8024e385{generic_unplug_device+24} 
8017a6d3{__wait_on_buffer+120} 
 8017a55e{bh_wake_function+0} 
8017a55e{bh_wake_function+0} 
 a00877fe{:ext3:ext3_bread+96} 
a008935c{:ext3:htree_dirblock_to_tree+50} 
a008952c{:ext3:ext3_htree_fill_tree+295} 
8018b232{filldir64+122} 8018b1b8{filldir64+0} 
a0083ace{:ext3:ext3_readdir+371} 8018f019{dput+56} 
8018b1b8{filldir64+0} 8018599c{path_release+12} 
8019e335{compat_sys_statfs+105} 
 8018b1b8{filldir64+0} 
8018aef7{vfs_readdir+155} 
 8018b2e8{sys_getdents64+118} 
80125bbb{sysenter_do_call+27} 

And this is a direct command submission path:  it already passed both
online check gates in this path *after* the device was offlined, so
adding a third won't fix this.  Firstly, I'm assuming you have only a
single disk, so the I/O was definitely bound for sda?  Secondly, can you
reproduce with a modern (2.6.20) kernel.  Your trace strongly suggests
that the device came back online for some reason and then the megaraid
driver died.

James


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Joe Jin
 
 This is a bug actually in the megaraid.

Aha, I'll track it.

 
 And this is a direct command submission path:  it already passed both
 online check gates in this path *after* the device was offlined, so
 adding a third won't fix this. 

Yeah, I have notice that, however, from the logs, the device have offline, 
but why still can send cmd to device? isn't the sequences of printk suspectful?

 single disk, so the I/O was definitely bound for sda?  Secondly, can you
 reproduce with a modern (2.6.20) kernel.  Your trace strongly suggests
 that the device came back online for some reason and then the megaraid
 driver died.

It's hard to update the kernel for the system is a production system, and we
cannot debug it at the box :( 

I dont know if you have notice, the logs come from diskdump, if it caused by
diskdump?

Thanks,
Joe
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Joe Jin
 The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
 been fixed in mainline by other means?

I cannot confirm if it have fixed in latest kernel, the server is a
production system, it's hard to debug it and try reproduce.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Andrew Morton
 On Mon, 12 Mar 2007 10:52:22 +0800 Joe Jin [EMAIL PROTECTED] wrote:
  The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
  been fixed in mainline by other means?
 
 I cannot confirm if it have fixed in latest kernel, the server is a
 production system, it's hard to debug it and try reproduce.

Well.  That makes it hard to run tests, but perhaps it can be determined
from code review..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-08 Thread Joe Jin
> What's the error you're trying to fix?  scsi_dispatch_cmd() is only
> called from scsi_request_fn() which already has an equivalent of this
> check in it just prior to calling dispatch.

Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash
info as following at rhel4 2.6.9-42.0.2.ELsmp,

>
megaraid: aborting-150766876 cmd=2a 
megaraid abort: 150766876:15[255:128], fw owner
...
egaraid: aborting-150767541 cmd=2a 
megaraid abort: 150767541[255:128], driver owner
megaraid: resetting the host...
megaraid: 150766876:129[65535:65535], reset from pending list
megaraid: 1 outstanding commands. Max wait 180 sec
megaraid mbox: Wait for 1 commands to complete:180
...
megaraid mbox: Wait for 1 commands to complete:0
megaraid mbox: critical hardware error!
megaraid: resetting the host...
megaraid: hw error, cannot reset
megaraid: resetting the host...
megaraid: hw error, cannot reset
scsi: Device offlined - not ready after error recovery: host 0 channel 2 id 0 
lun 0
SCSI error : <0 2 0 0> return code = 0x600
end_request: I/O error, dev sda, sector 24117409
Buffer I/O error on device sda5, logical block 327797
...
EXT3-fs error (device sda8) in start_transaction: Journal has aborted
scsi0 (0:0): rejecting I/O to offline device
printk: 85 messages suppressed.
Buffer I/O error on device sda5, logical block 327691
lost page write due to I/O error on sda5
scsi0 (0:0): rejecting I/O to offline device
...
EXT3-fs error (device sda8) in start_transaction: Journal has aborted

Unable to handle kernel NULL pointer dereference at  RIP: 
{:megaraid_mbox:megaraid_queue_command+2634}
PML4 21a25d067 PGD 2170ac067 PMD 0 
Oops: 0002 [1] SMP 
CPU 0 
Modules linked in: hangcheck_timer mptctl mptbase ipmi_devintf ipmi_si 
ipmi_msghandler dell_rbu netconsole netdump autofs4 i2c_dev i2c_core ocfs2(U) 
debugfs(U) nfs lockd nfs_acl ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) 
configfs(U) sunrpc ds yenta_socket pcmcia_core ide_dump scsi_dump diskdump 
zlib_deflate dm_mirror dm_multipath dm_mod emcphr(U) emcpmpap(U) emcpmpaa(U) 
emcpmpc(U) emcpmp(U) emcp(U) emcplib(U) button battery ac joydev uhci_hcd 
ehci_hcd hw_random tg3 e1000 bond0(U) floppy sg ext3 jbd lpfc scsi_transport_fc 
megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 13238, comm: emagent Tainted: P  2.6.9-42.0.2.ELsmp
RIP: 0010:[] 
{:megaraid_mbox:megaraid_queue_command+2634}
RSP: 0018:01019b5a9b48  EFLAGS: 00010002
RAX: 000220b8e000 RBX: 0102ffd1b048 RCX: 
RDX:  RSI: 0001 RDI: 010431124bf0
RBP: 0001 R08:  R09: 010133ce5b80
R10: 0102ffd3e5a0 R11: 0060 R12: 010133ce5b80
R13: 0102ffd3e480 R14: 0100bfb4c8b8 R15: 0101ffcf4000
FS:  () GS:804e5180(005b) knlGS:f47ffbb0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2:  CR3: 00101000 CR4: 06e0
Process emagent (pid: 13238, threadinfo 01019b5a8000, task 01003e5a8030)
Stack:  0046 0046 0102ffd3e480 
   0101fff73980 8015cb38 0100bfb4d4aa 0100bfb4d4a2 
   0100bfb4c8b8 01010080 
Call Trace:{mempool_alloc+129} 
{:scsi_mod:scsi_done+0} 
   {__mod_timer+113} 
{:scsi_mod:scsi_dispatch_cmd+595} 
   {:scsi_mod:scsi_request_fn+990} 
{generic_unplug_device+24} 
   {__wait_on_buffer+120} 
{bh_wake_function+0} 
   {bh_wake_function+0} 
{:ext3:ext3_bread+96} 
   {:ext3:htree_dirblock_to_tree+50} 
   {:ext3:ext3_htree_fill_tree+295} 
   {filldir64+122} {filldir64+0} 
   {:ext3:ext3_readdir+371} {dput+56} 
   {filldir64+0} {path_release+12} 
   {compat_sys_statfs+105} 
{filldir64+0} 
   {vfs_readdir+155} 
{sys_getdents64+118} 
   {sysenter_do_call+27} 

Code: 48 89 04 11 41 8b 44 24 18 49 83 c4 20 49 8b 56 20 89 44 11 
RIP {:megaraid_mbox:megaraid_queue_command+2634} RSP 
<01019b5a9b48>
CR2: 
<

full crash info have update to 
http://patch.linux-security.cn/crashinfo/megaraid_crashinfo.log

>From crashinfo, befor kernel panic, device have setting state to OFFLINE, but
at that time, scsi cmd still will send to device.

any advice?

-Joe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-08 Thread James Bottomley
On Thu, 2007-03-08 at 17:22 +0800, Joe Jin wrote:
> While a scsi device hw error occured, device's status maybe setting 
> to SDEV_OFFLINE, So at scsi_dispatch_cmd function, we should checking
> if device have offline, if yes, do nothing and just return error to
> user directly.

What's the error you're trying to fix?  scsi_dispatch_cmd() is only
called from scsi_request_fn() which already has an equivalent of this
check in it just prior to calling dispatch.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-08 Thread James Bottomley
On Thu, 2007-03-08 at 17:22 +0800, Joe Jin wrote:
 While a scsi device hw error occured, device's status maybe setting 
 to SDEV_OFFLINE, So at scsi_dispatch_cmd function, we should checking
 if device have offline, if yes, do nothing and just return error to
 user directly.

What's the error you're trying to fix?  scsi_dispatch_cmd() is only
called from scsi_request_fn() which already has an equivalent of this
check in it just prior to calling dispatch.

James


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-08 Thread Joe Jin
 What's the error you're trying to fix?  scsi_dispatch_cmd() is only
 called from scsi_request_fn() which already has an equivalent of this
 check in it just prior to calling dispatch.

Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash
info as following at rhel4 2.6.9-42.0.2.ELsmp,


megaraid: aborting-150766876 cmd=2a c=2 t=0 l=0
megaraid abort: 150766876:15[255:128], fw owner
...
egaraid: aborting-150767541 cmd=2a c=2 t=0 l=0
megaraid abort: 150767541[255:128], driver owner
megaraid: resetting the host...
megaraid: 150766876:129[65535:65535], reset from pending list
megaraid: 1 outstanding commands. Max wait 180 sec
megaraid mbox: Wait for 1 commands to complete:180
...
megaraid mbox: Wait for 1 commands to complete:0
megaraid mbox: critical hardware error!
megaraid: resetting the host...
megaraid: hw error, cannot reset
megaraid: resetting the host...
megaraid: hw error, cannot reset
scsi: Device offlined - not ready after error recovery: host 0 channel 2 id 0 
lun 0
SCSI error : 0 2 0 0 return code = 0x600
end_request: I/O error, dev sda, sector 24117409
Buffer I/O error on device sda5, logical block 327797
...
EXT3-fs error (device sda8) in start_transaction: Journal has aborted
scsi0 (0:0): rejecting I/O to offline device
printk: 85 messages suppressed.
Buffer I/O error on device sda5, logical block 327691
lost page write due to I/O error on sda5
scsi0 (0:0): rejecting I/O to offline device
...
EXT3-fs error (device sda8) in start_transaction: Journal has aborted

Unable to handle kernel NULL pointer dereference at  RIP: 
a0031e66{:megaraid_mbox:megaraid_queue_command+2634}
PML4 21a25d067 PGD 2170ac067 PMD 0 
Oops: 0002 [1] SMP 
CPU 0 
Modules linked in: hangcheck_timer mptctl mptbase ipmi_devintf ipmi_si 
ipmi_msghandler dell_rbu netconsole netdump autofs4 i2c_dev i2c_core ocfs2(U) 
debugfs(U) nfs lockd nfs_acl ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) 
configfs(U) sunrpc ds yenta_socket pcmcia_core ide_dump scsi_dump diskdump 
zlib_deflate dm_mirror dm_multipath dm_mod emcphr(U) emcpmpap(U) emcpmpaa(U) 
emcpmpc(U) emcpmp(U) emcp(U) emcplib(U) button battery ac joydev uhci_hcd 
ehci_hcd hw_random tg3 e1000 bond0(U) floppy sg ext3 jbd lpfc scsi_transport_fc 
megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 13238, comm: emagent Tainted: P  2.6.9-42.0.2.ELsmp
RIP: 0010:[a0031e66] 
a0031e66{:megaraid_mbox:megaraid_queue_command+2634}
RSP: 0018:01019b5a9b48  EFLAGS: 00010002
RAX: 000220b8e000 RBX: 0102ffd1b048 RCX: 
RDX:  RSI: 0001 RDI: 010431124bf0
RBP: 0001 R08:  R09: 010133ce5b80
R10: 0102ffd3e5a0 R11: 0060 R12: 010133ce5b80
R13: 0102ffd3e480 R14: 0100bfb4c8b8 R15: 0101ffcf4000
FS:  () GS:804e5180(005b) knlGS:f47ffbb0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2:  CR3: 00101000 CR4: 06e0
Process emagent (pid: 13238, threadinfo 01019b5a8000, task 01003e5a8030)
Stack:  0046 0046 0102ffd3e480 
   0101fff73980 8015cb38 0100bfb4d4aa 0100bfb4d4a2 
   0100bfb4c8b8 01010080 
Call Trace:8015cb38{mempool_alloc+129} 
a0002874{:scsi_mod:scsi_done+0} 
   8013fc00{__mod_timer+113} 
a0002adf{:scsi_mod:scsi_dispatch_cmd+595} 
   a0007a72{:scsi_mod:scsi_request_fn+990} 
8024e385{generic_unplug_device+24} 
   8017a6d3{__wait_on_buffer+120} 
8017a55e{bh_wake_function+0} 
   8017a55e{bh_wake_function+0} 
a00877fe{:ext3:ext3_bread+96} 
   a008935c{:ext3:htree_dirblock_to_tree+50} 
   a008952c{:ext3:ext3_htree_fill_tree+295} 
   8018b232{filldir64+122} 8018b1b8{filldir64+0} 
   a0083ace{:ext3:ext3_readdir+371} 8018f019{dput+56} 
   8018b1b8{filldir64+0} 8018599c{path_release+12} 
   8019e335{compat_sys_statfs+105} 
8018b1b8{filldir64+0} 
   8018aef7{vfs_readdir+155} 
8018b2e8{sys_getdents64+118} 
   80125bbb{sysenter_do_call+27} 

Code: 48 89 04 11 41 8b 44 24 18 49 83 c4 20 49 8b 56 20 89 44 11 
RIP a0031e66{:megaraid_mbox:megaraid_queue_command+2634} RSP 
01019b5a9b48
CR2: 


full crash info have update to 
http://patch.linux-security.cn/crashinfo/megaraid_crashinfo.log

From crashinfo, befor kernel panic, device have setting state to OFFLINE, but
at that time, scsi cmd still will send to device.

any advice?

-Joe
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/