Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Johannes Weiner
Hi,

On Thu, Mar 22, 2007 at 07:42:35PM +0100, Jens Axboe wrote:
> > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> > index b6491c0..ca84f0b 100644
> > --- a/block/cfq-iosched.c
> > +++ b/block/cfq-iosched.c
> > @@ -961,8 +961,8 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct 
> > cfq_queue *cfqq,
> > /*
> >  * follow expired path, else get first next available
> >  */
> > -   if ((rq = cfq_check_fifo(cfqq)) == NULL)
> > -   rq = cfqq->next_rq;
> > +   if (!(rq = cfq_check_fifo(cfqq)) && !(rq = cfqq->next_rq))
> > +   break;
> 
> That still only hides a bug. It is illegal for ->next_rq to be NULL
> while the RB tree is non-empty.

As I noticed afterwards this isn't even the point where the NULL ptr is
dereferenced.  It must be in the next code-line, cfqd->queue or
cfqd->queue->elevator was NULL when the oops occured or am I wrong?

'Hannes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Jens Axboe
On Wed, Mar 21 2007, Johannes Weiner wrote:
> Hi,
> 
> I think I found where the NULL may come from.  Please, anybody, do not
> apply this patch before a trustful person reviewed it... Jens? ;)
> 
> My thoughts on this are, that there are two possibilities cfqq->next_rq
> could be NULL: End of list or a bug when it is set (or not set).
> But why does RB_EMPTY_ROOT() as last call in this loop does not trigger?
> 
> Did I even get the right place on where the NULL pointer dereference
> happens? :)
> 
> =Hannes
> 
> Signed-off-by: Johannes Weiner <[EMAIL PROTECTED]>

> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index b6491c0..ca84f0b 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -961,8 +961,8 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct 
> cfq_queue *cfqq,
>   /*
>* follow expired path, else get first next available
>*/
> - if ((rq = cfq_check_fifo(cfqq)) == NULL)
> - rq = cfqq->next_rq;
> + if (!(rq = cfq_check_fifo(cfqq)) && !(rq = cfqq->next_rq))
> + break;

That still only hides a bug. It is illegal for ->next_rq to be NULL
while the RB tree is non-empty.


-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Johannes Weiner
Hi,

On Wed, Mar 21, 2007 at 08:04:00PM +0100, Johannes Weiner wrote:
> Did I even get the right place on where the NULL pointer dereference
> happens? :)

No, I did not. Sorry for the noise.

=Hannes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Dale Blount
On Wed, 2007-03-21 at 20:59 +0100, Jens Axboe wrote:
> On Wed, Mar 21 2007, Dale Blount wrote:
> > On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
> > > Dale Blount wrote:
> > > >> I'm puzzled why this is hitting Dan, but no one else has reported
> > > >> anything. Dan, did 2.6.19 work for you?
> > > > 
> > > > Actually, I believe it is happening to me too.  This is on a 4-disk 
> > > > raid5 with
> > > > one failed disk on two 2-port sata_sil pci controller cards.
> > > > 
> > > > The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
> > > > The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
> > > > worked
> > > > or not.
> > > > 
> > > > 
> > > 
> > > Just to be clear: you have a RAID array (what level?) with a failed disk
> > > and it's giving you the same error that Dan gets?
> > 
> > 
> > Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
> > one disk missing (dead and sent in for RMA).
> 
> Interesting, I'll definitely see if I can reproduce it like that.
> 

FWIW, the server made it through a round using 2.6.21-rc4 without
errors.  2.6.20.x failed 5 times straight and worked 4 times prior to
that (also with a missing disk), so it could be a coincidence too.

The disk is back from RMA, do you need me to hold off replacing it?

Thanks,

Dale

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Dale Blount
On Wed, 2007-03-21 at 20:59 +0100, Jens Axboe wrote:
 On Wed, Mar 21 2007, Dale Blount wrote:
  On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
   Dale Blount wrote:
I'm puzzled why this is hitting Dan, but no one else has reported
anything. Dan, did 2.6.19 work for you?

Actually, I believe it is happening to me too.  This is on a 4-disk 
raid5 with
one failed disk on two 2-port sata_sil pci controller cards.

The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
worked
or not.


   
   Just to be clear: you have a RAID array (what level?) with a failed disk
   and it's giving you the same error that Dan gets?
  
  
  Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
  one disk missing (dead and sent in for RMA).
 
 Interesting, I'll definitely see if I can reproduce it like that.
 

FWIW, the server made it through a round using 2.6.21-rc4 without
errors.  2.6.20.x failed 5 times straight and worked 4 times prior to
that (also with a missing disk), so it could be a coincidence too.

The disk is back from RMA, do you need me to hold off replacing it?

Thanks,

Dale

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Johannes Weiner
Hi,

On Wed, Mar 21, 2007 at 08:04:00PM +0100, Johannes Weiner wrote:
 Did I even get the right place on where the NULL pointer dereference
 happens? :)

No, I did not. Sorry for the noise.

=Hannes
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Jens Axboe
On Wed, Mar 21 2007, Johannes Weiner wrote:
 Hi,
 
 I think I found where the NULL may come from.  Please, anybody, do not
 apply this patch before a trustful person reviewed it... Jens? ;)
 
 My thoughts on this are, that there are two possibilities cfqq-next_rq
 could be NULL: End of list or a bug when it is set (or not set).
 But why does RB_EMPTY_ROOT() as last call in this loop does not trigger?
 
 Did I even get the right place on where the NULL pointer dereference
 happens? :)
 
 =Hannes
 
 Signed-off-by: Johannes Weiner [EMAIL PROTECTED]

 diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
 index b6491c0..ca84f0b 100644
 --- a/block/cfq-iosched.c
 +++ b/block/cfq-iosched.c
 @@ -961,8 +961,8 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct 
 cfq_queue *cfqq,
   /*
* follow expired path, else get first next available
*/
 - if ((rq = cfq_check_fifo(cfqq)) == NULL)
 - rq = cfqq-next_rq;
 + if (!(rq = cfq_check_fifo(cfqq))  !(rq = cfqq-next_rq))
 + break;

That still only hides a bug. It is illegal for -next_rq to be NULL
while the RB tree is non-empty.


-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-22 Thread Johannes Weiner
Hi,

On Thu, Mar 22, 2007 at 07:42:35PM +0100, Jens Axboe wrote:
  diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
  index b6491c0..ca84f0b 100644
  --- a/block/cfq-iosched.c
  +++ b/block/cfq-iosched.c
  @@ -961,8 +961,8 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct 
  cfq_queue *cfqq,
  /*
   * follow expired path, else get first next available
   */
  -   if ((rq = cfq_check_fifo(cfqq)) == NULL)
  -   rq = cfqq-next_rq;
  +   if (!(rq = cfq_check_fifo(cfqq))  !(rq = cfqq-next_rq))
  +   break;
 
 That still only hides a bug. It is illegal for -next_rq to be NULL
 while the RB tree is non-empty.

As I noticed afterwards this isn't even the point where the NULL ptr is
dereferenced.  It must be in the next code-line, cfqd-queue or
cfqd-queue-elevator was NULL when the oops occured or am I wrong?

'Hannes
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Jens Axboe
On Wed, Mar 21 2007, Dale Blount wrote:
> On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
> > Dale Blount wrote:
> > >> I'm puzzled why this is hitting Dan, but no one else has reported
> > >> anything. Dan, did 2.6.19 work for you?
> > > 
> > > Actually, I believe it is happening to me too.  This is on a 4-disk raid5 
> > > with
> > > one failed disk on two 2-port sata_sil pci controller cards.
> > > 
> > > The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
> > > The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
> > > worked
> > > or not.
> > > 
> > > 
> > 
> > Just to be clear: you have a RAID array (what level?) with a failed disk
> > and it's giving you the same error that Dan gets?
> 
> 
> Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
> one disk missing (dead and sent in for RMA).

Interesting, I'll definitely see if I can reproduce it like that.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Johannes Weiner
Hi,

I think I found where the NULL may come from.  Please, anybody, do not
apply this patch before a trustful person reviewed it... Jens? ;)

My thoughts on this are, that there are two possibilities cfqq->next_rq
could be NULL: End of list or a bug when it is set (or not set).
But why does RB_EMPTY_ROOT() as last call in this loop does not trigger?

Did I even get the right place on where the NULL pointer dereference
happens? :)

=Hannes

Signed-off-by: Johannes Weiner <[EMAIL PROTECTED]>
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..ca84f0b 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -961,8 +961,8 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct 
cfq_queue *cfqq,
/*
 * follow expired path, else get first next available
 */
-   if ((rq = cfq_check_fifo(cfqq)) == NULL)
-   rq = cfqq->next_rq;
+   if (!(rq = cfq_check_fifo(cfqq)) && !(rq = cfqq->next_rq))
+   break;
 
/*
 * finally, insert request into driver dispatch list


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Chuck Ebbert
Dale Blount wrote:
> On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
>> Dale Blount wrote:
 I'm puzzled why this is hitting Dan, but no one else has reported
 anything. Dan, did 2.6.19 work for you?
>>> Actually, I believe it is happening to me too.  This is on a 4-disk raid5 
>>> with
>>> one failed disk on two 2-port sata_sil pci controller cards.
>>>
>>> The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
>>> The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
>>> worked
>>> or not.
>>>
>>>
>> Just to be clear: you have a RAID array (what level?) with a failed disk
>> and it's giving you the same error that Dan gets?
> 
> 
> Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
> one disk missing (dead and sent in for RMA).
> 

OK, I'm add NeilB to the cc:

(Dan has a RAID6 array with two failed disks.)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Dale Blount
On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
> Dale Blount wrote:
> >> I'm puzzled why this is hitting Dan, but no one else has reported
> >> anything. Dan, did 2.6.19 work for you?
> > 
> > Actually, I believe it is happening to me too.  This is on a 4-disk raid5 
> > with
> > one failed disk on two 2-port sata_sil pci controller cards.
> > 
> > The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
> > The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
> > worked
> > or not.
> > 
> > 
> 
> Just to be clear: you have a RAID array (what level?) with a failed disk
> and it's giving you the same error that Dan gets?


Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
one disk missing (dead and sent in for RMA).

Dale

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Chuck Ebbert
Dale Blount wrote:
>> I'm puzzled why this is hitting Dan, but no one else has reported
>> anything. Dan, did 2.6.19 work for you?
> 
> Actually, I believe it is happening to me too.  This is on a 4-disk raid5 with
> one failed disk on two 2-port sata_sil pci controller cards.
> 
> The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
> The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have worked
> or not.
> 
> 

Just to be clear: you have a RAID array (what level?) with a failed disk
and it's giving you the same error that Dan gets?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Dale Blount
> I'm puzzled why this is hitting Dan, but no one else has reported
> anything. Dan, did 2.6.19 work for you?

Actually, I believe it is happening to me too.  This is on a 4-disk raid5 with
one failed disk on two 2-port sata_sil pci controller cards.

The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have worked
or not.


BUG: unable to handle kernel NULL pointer dereference at virtual address 
005c
 printing eip:
*pde = 
Oops:  [#1]
PREEMPT SMP 
Modules linked in: ipv6 raid456 xor md_mod e1000 usbcore ext3 jbd mbcache
sata_sil sd_mod sr_mod cdrom generic ide_core ata_piix libata
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010086   (2.6.20-ARCH #1)
EIP is at cfq_dispatch_insert+0x19/0x70
eax: c2bc1340   ebx:    ecx: 0049   edx: 
esi: c298802c   edi: f7c17cc0   ebp: 00e47352   esp: f18addfc
ds: 007b   es: 007b   ss: 0068
Process rsync (pid: 3698, ti=f18ac000 task=dfe97570 task.ti=f18ac000)
Stack:  c298ce3c f7c17cc0 00e47352 c0220bd9 0040 c2b3cd40 c2b3c4a8 
   cafa6500 c0294750   0004 d8ed1480  cafa6500 
   c298802c c29f0800 c2b3c000 c298802c c0215690  c0294750 cafa6500 
Call Trace:
 [] cfq_dispatch_requests+0xf9/0x4e0
 [] scsi_done+0x0/0x20
 [] elv_next_request+0x20/0x1d0
 [] scsi_done+0x0/0x20
 [] scsi_dispatch_cmd+0x146/0x230
 [] scsi_request_fn+0x194/0x2d0
 [] blk_run_queue+0x58/0x70
 [] scsi_next_command+0x30/0x50
 [] scsi_end_request+0xab/0xe0
 [] scsi_io_completion+0x86/0x370
 [] ata_hsm_qc_complete+0x90/0x110 [libata]
 [] sd_rw_intr+0x2b/0x200 [sd_mod]
 [] scsi_finish_command+0x49/0x60
 [] blk_done_softirq+0x58/0x70
 [] __do_softirq+0x82/0xf0
 [] do_softirq+0x37/0x40
 [] irq_exit+0x45/0x50
 [] do_IRQ+0x45/0x80
 [] common_interrupt+0x23/0x28
 ===
Code: ae 40 ee ff e9 25 ff ff ff 0f 0b eb fe 0f 0b eb fe 90 83 ec 10 89 1c 24 89
d3 89 74 24 04 89 c6 89 7c 24 08 89 6c 24 0c 8b 40 0c <8b> 7a 5c 8b 68 04 89 d0
e8 7a fe ff ff 8b 43 14 89 da 25 01 80
EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:f18addfc
 <0>Kernel panic - not syncing: Fatal exception in interrupt


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Dale Blount
 I'm puzzled why this is hitting Dan, but no one else has reported
 anything. Dan, did 2.6.19 work for you?

Actually, I believe it is happening to me too.  This is on a 4-disk raid5 with
one failed disk on two 2-port sata_sil pci controller cards.

The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have worked
or not.


BUG: unable to handle kernel NULL pointer dereference at virtual address 
005c
 printing eip:
*pde = 
Oops:  [#1]
PREEMPT SMP 
Modules linked in: ipv6 raid456 xor md_mod e1000 usbcore ext3 jbd mbcache
sata_sil sd_mod sr_mod cdrom generic ide_core ata_piix libata
CPU:0
EIP:0060:[c0220a29]Not tainted VLI
EFLAGS: 00010086   (2.6.20-ARCH #1)
EIP is at cfq_dispatch_insert+0x19/0x70
eax: c2bc1340   ebx:    ecx: 0049   edx: 
esi: c298802c   edi: f7c17cc0   ebp: 00e47352   esp: f18addfc
ds: 007b   es: 007b   ss: 0068
Process rsync (pid: 3698, ti=f18ac000 task=dfe97570 task.ti=f18ac000)
Stack:  c298ce3c f7c17cc0 00e47352 c0220bd9 0040 c2b3cd40 c2b3c4a8 
   cafa6500 c0294750   0004 d8ed1480  cafa6500 
   c298802c c29f0800 c2b3c000 c298802c c0215690  c0294750 cafa6500 
Call Trace:
 [c0220bd9] cfq_dispatch_requests+0xf9/0x4e0
 [c0294750] scsi_done+0x0/0x20
 [c0215690] elv_next_request+0x20/0x1d0
 [c0294750] scsi_done+0x0/0x20
 [c0294ab6] scsi_dispatch_cmd+0x146/0x230
 [c0299904] scsi_request_fn+0x194/0x2d0
 [c0219518] blk_run_queue+0x58/0x70
 [c0298460] scsi_next_command+0x30/0x50
 [c029866b] scsi_end_request+0xab/0xe0
 [c0298776] scsi_io_completion+0x86/0x370
 [f882c770] ata_hsm_qc_complete+0x90/0x110 [libata]
 [f88127eb] sd_rw_intr+0x2b/0x200 [sd_mod]
 [c02945a9] scsi_finish_command+0x49/0x60
 [c0219d88] blk_done_softirq+0x58/0x70
 [c012bc12] __do_softirq+0x82/0xf0
 [c012bcb7] do_softirq+0x37/0x40
 [c012bee5] irq_exit+0x45/0x50
 [c0105dc5] do_IRQ+0x45/0x80
 [c0103c0f] common_interrupt+0x23/0x28
 ===
Code: ae 40 ee ff e9 25 ff ff ff 0f 0b eb fe 0f 0b eb fe 90 83 ec 10 89 1c 24 89
d3 89 74 24 04 89 c6 89 7c 24 08 89 6c 24 0c 8b 40 0c 8b 7a 5c 8b 68 04 89 d0
e8 7a fe ff ff 8b 43 14 89 da 25 01 80
EIP: [c0220a29] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:f18addfc
 0Kernel panic - not syncing: Fatal exception in interrupt


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Chuck Ebbert
Dale Blount wrote:
 I'm puzzled why this is hitting Dan, but no one else has reported
 anything. Dan, did 2.6.19 work for you?
 
 Actually, I believe it is happening to me too.  This is on a 4-disk raid5 with
 one failed disk on two 2-port sata_sil pci controller cards.
 
 The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
 The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have worked
 or not.
 
 

Just to be clear: you have a RAID array (what level?) with a failed disk
and it's giving you the same error that Dan gets?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Dale Blount
On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
 Dale Blount wrote:
  I'm puzzled why this is hitting Dan, but no one else has reported
  anything. Dan, did 2.6.19 work for you?
  
  Actually, I believe it is happening to me too.  This is on a 4-disk raid5 
  with
  one failed disk on two 2-port sata_sil pci controller cards.
  
  The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
  The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
  worked
  or not.
  
  
 
 Just to be clear: you have a RAID array (what level?) with a failed disk
 and it's giving you the same error that Dan gets?


Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
one disk missing (dead and sent in for RMA).

Dale

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Chuck Ebbert
Dale Blount wrote:
 On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
 Dale Blount wrote:
 I'm puzzled why this is hitting Dan, but no one else has reported
 anything. Dan, did 2.6.19 work for you?
 Actually, I believe it is happening to me too.  This is on a 4-disk raid5 
 with
 one failed disk on two 2-port sata_sil pci controller cards.

 The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
 The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
 worked
 or not.


 Just to be clear: you have a RAID array (what level?) with a failed disk
 and it's giving you the same error that Dan gets?
 
 
 Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
 one disk missing (dead and sent in for RMA).
 

OK, I'm add NeilB to the cc:

(Dan has a RAID6 array with two failed disks.)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Johannes Weiner
Hi,

I think I found where the NULL may come from.  Please, anybody, do not
apply this patch before a trustful person reviewed it... Jens? ;)

My thoughts on this are, that there are two possibilities cfqq-next_rq
could be NULL: End of list or a bug when it is set (or not set).
But why does RB_EMPTY_ROOT() as last call in this loop does not trigger?

Did I even get the right place on where the NULL pointer dereference
happens? :)

=Hannes

Signed-off-by: Johannes Weiner [EMAIL PROTECTED]
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..ca84f0b 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -961,8 +961,8 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct 
cfq_queue *cfqq,
/*
 * follow expired path, else get first next available
 */
-   if ((rq = cfq_check_fifo(cfqq)) == NULL)
-   rq = cfqq-next_rq;
+   if (!(rq = cfq_check_fifo(cfqq))  !(rq = cfqq-next_rq))
+   break;
 
/*
 * finally, insert request into driver dispatch list


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-21 Thread Jens Axboe
On Wed, Mar 21 2007, Dale Blount wrote:
 On Wed, 2007-03-21 at 14:09 -0400, Chuck Ebbert wrote:
  Dale Blount wrote:
   I'm puzzled why this is hitting Dan, but no one else has reported
   anything. Dan, did 2.6.19 work for you?
   
   Actually, I believe it is happening to me too.  This is on a 4-disk raid5 
   with
   one failed disk on two 2-port sata_sil pci controller cards.
   
   The BUG below is from 2.6.20.3, but I will try 2.6.21-rc4 tonight.
   The disk didn't fail until 2.6.20, so I don't know if 2.6.19 would have 
   worked
   or not.
   
   
  
  Just to be clear: you have a RAID array (what level?) with a failed disk
  and it's giving you the same error that Dan gets?
 
 
 Yes, the BUG looks to be pretty similar to me.  It's a 4 disk raid5 with
 one disk missing (dead and sent in for RMA).

Interesting, I'll definitely see if I can reproduce it like that.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-01 Thread Dan Williams

On 3/1/07, Jens Axboe <[EMAIL PROTECTED]> wrote:

On Thu, Mar 01 2007, Frank Seidel wrote:
> Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
> > I can reliably reproduce a null pointer dereference on 2.6.20 and
> > 2.6.21-rc2.  I will keep digging to find the kernel version where
> > this last worked, but wanted to see if there were any immediate
> > experiments I should try.
> > ...
> > Kernel 2.6.21-rc2 on an i686
> > ...
> > [  431.709022] BUG: unable to handle kernel NULL pointer dereference
> > at virtual address 005c [  431.717993]  printing eip:
> > ...
> > [  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
> > ...
> > [  431.887396]  [] cfq_dispatch_requests+0x138/0x3f0
> Hi,
> unfortunately i yet don't really have much/enough knowledge of cfq and
> the kernels inwards at the moment...
> but looking at cfq_dispatch_insert+0xb it seems the struct request
> pointer given (as second parameter by cfq_dispatch_request) was NULL
> and dereferencing it in the RQ_CFQQ macro leads to this oops.
>
> The "break"-out patch below for __cfq_dispatch_request might be at least
> a possible workaround for this, but it could also be total bullsh..
> Perhaps someone smarter might pick this up.. and give a real fix.
>
> Have fun,
> Frank
> ---
>
>  block/cfq-iosched.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/block/cfq-iosched.c
> ===
> --- linux-2.6.orig/block/cfq-iosched.c
> +++ linux-2.6/block/cfq-iosched.c
> @@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
>  * follow expired path, else get first next available
>  */
> if ((rq = cfq_check_fifo(cfqq)) == NULL)
> -   rq = cfqq->next_rq;
> +   if ((rq = cfqq->next_rq) == NULL)
> +   break;
>
> /*
>  * finally, insert request into driver dispatch list

That is not the right fix. A little further up in this function, a check
(well BUG_ON()) is done for a non-empty sort list. So we know at this
point, that we have requests pending for this queue. When that is the
case, ->next_rq must always be kept uptodate and non-NULL. The oops at
least tells us this, it should not be papered around. The real fix is
finding out _where_ this now isn't being updated.

I'm puzzled why this is hitting Dan, but no one else has reported
anything. Dan, did 2.6.19 work for you?


I am puzzled as well, although I do not think many people run raid6
arrays with 2-failed disks, so it might be an under-tested path, but a
non-degraded array runs fine...

I fired up a 2.6.19 kernel and tiobench ran past the point (in terms
of time) where it had failed on .20 and .21-rc.  However I noticed
things were running much slower since the cpu optimizations had fallen
back to Pentium-Pro from Core2 which affects the raid6 p+q calculation
speed among other things.  So I need to re-baseline the failure
against a more common config to say whether it is actually gone in
2.6.19.

I should have time to try these tests next week.


--
Jens Axboe


Regards,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-01 Thread Jens Axboe
On Thu, Mar 01 2007, Frank Seidel wrote:
> Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
> > I can reliably reproduce a null pointer dereference on 2.6.20 and
> > 2.6.21-rc2.  I will keep digging to find the kernel version where
> > this last worked, but wanted to see if there were any immediate
> > experiments I should try.
> > ...
> > Kernel 2.6.21-rc2 on an i686
> > ...
> > [  431.709022] BUG: unable to handle kernel NULL pointer dereference
> > at virtual address 005c [  431.717993]  printing eip:
> > ...
> > [  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
> > ...
> > [  431.887396]  [] cfq_dispatch_requests+0x138/0x3f0
> Hi,
> unfortunately i yet don't really have much/enough knowledge of cfq and 
> the kernels inwards at the moment...
> but looking at cfq_dispatch_insert+0xb it seems the struct request 
> pointer given (as second parameter by cfq_dispatch_request) was NULL 
> and dereferencing it in the RQ_CFQQ macro leads to this oops.
> 
> The "break"-out patch below for __cfq_dispatch_request might be at least 
> a possible workaround for this, but it could also be total bullsh.. 
> Perhaps someone smarter might pick this up.. and give a real fix.
> 
> Have fun,
> Frank
> ---
> 
>  block/cfq-iosched.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/block/cfq-iosched.c
> ===
> --- linux-2.6.orig/block/cfq-iosched.c
> +++ linux-2.6/block/cfq-iosched.c
> @@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
>  * follow expired path, else get first next available
>  */
> if ((rq = cfq_check_fifo(cfqq)) == NULL)
> -   rq = cfqq->next_rq;
> +   if ((rq = cfqq->next_rq) == NULL)
> +   break;
> 
> /*
>  * finally, insert request into driver dispatch list

That is not the right fix. A little further up in this function, a check
(well BUG_ON()) is done for a non-empty sort list. So we know at this
point, that we have requests pending for this queue. When that is the
case, ->next_rq must always be kept uptodate and non-NULL. The oops at
least tells us this, it should not be papered around. The real fix is
finding out _where_ this now isn't being updated.

I'm puzzled why this is hitting Dan, but no one else has reported
anything. Dan, did 2.6.19 work for you?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-01 Thread Frank Seidel
Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
> I can reliably reproduce a null pointer dereference on 2.6.20 and
> 2.6.21-rc2.  I will keep digging to find the kernel version where
> this last worked, but wanted to see if there were any immediate
> experiments I should try.
> ...
> Kernel 2.6.21-rc2 on an i686
> ...
> [  431.709022] BUG: unable to handle kernel NULL pointer dereference
> at virtual address 005c [  431.717993]  printing eip:
> ...
> [  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
> ...
> [  431.887396]  [] cfq_dispatch_requests+0x138/0x3f0
Hi,
unfortunately i yet don't really have much/enough knowledge of cfq and 
the kernels inwards at the moment...
but looking at cfq_dispatch_insert+0xb it seems the struct request 
pointer given (as second parameter by cfq_dispatch_request) was NULL 
and dereferencing it in the RQ_CFQQ macro leads to this oops.

The "break"-out patch below for __cfq_dispatch_request might be at least 
a possible workaround for this, but it could also be total bullsh.. 
Perhaps someone smarter might pick this up.. and give a real fix.

Have fun,
Frank
---

 block/cfq-iosched.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/block/cfq-iosched.c
===
--- linux-2.6.orig/block/cfq-iosched.c
+++ linux-2.6/block/cfq-iosched.c
@@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
 * follow expired path, else get first next available
 */
if ((rq = cfq_check_fifo(cfqq)) == NULL)
-   rq = cfqq->next_rq;
+   if ((rq = cfqq->next_rq) == NULL)
+   break;

/*
 * finally, insert request into driver dispatch list
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-01 Thread Frank Seidel
Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
 I can reliably reproduce a null pointer dereference on 2.6.20 and
 2.6.21-rc2.  I will keep digging to find the kernel version where
 this last worked, but wanted to see if there were any immediate
 experiments I should try.
 ...
 Kernel 2.6.21-rc2 on an i686
 ...
 [  431.709022] BUG: unable to handle kernel NULL pointer dereference
 at virtual address 005c [  431.717993]  printing eip:
 ...
 [  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
 ...
 [  431.887396]  [c01e1fc9] cfq_dispatch_requests+0x138/0x3f0
Hi,
unfortunately i yet don't really have much/enough knowledge of cfq and 
the kernels inwards at the moment...
but looking at cfq_dispatch_insert+0xb it seems the struct request 
pointer given (as second parameter by cfq_dispatch_request) was NULL 
and dereferencing it in the RQ_CFQQ macro leads to this oops.

The break-out patch below for __cfq_dispatch_request might be at least 
a possible workaround for this, but it could also be total bullsh.. 
Perhaps someone smarter might pick this up.. and give a real fix.

Have fun,
Frank
---

 block/cfq-iosched.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/block/cfq-iosched.c
===
--- linux-2.6.orig/block/cfq-iosched.c
+++ linux-2.6/block/cfq-iosched.c
@@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
 * follow expired path, else get first next available
 */
if ((rq = cfq_check_fifo(cfqq)) == NULL)
-   rq = cfqq-next_rq;
+   if ((rq = cfqq-next_rq) == NULL)
+   break;

/*
 * finally, insert request into driver dispatch list
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-01 Thread Jens Axboe
On Thu, Mar 01 2007, Frank Seidel wrote:
 Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
  I can reliably reproduce a null pointer dereference on 2.6.20 and
  2.6.21-rc2.  I will keep digging to find the kernel version where
  this last worked, but wanted to see if there were any immediate
  experiments I should try.
  ...
  Kernel 2.6.21-rc2 on an i686
  ...
  [  431.709022] BUG: unable to handle kernel NULL pointer dereference
  at virtual address 005c [  431.717993]  printing eip:
  ...
  [  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
  ...
  [  431.887396]  [c01e1fc9] cfq_dispatch_requests+0x138/0x3f0
 Hi,
 unfortunately i yet don't really have much/enough knowledge of cfq and 
 the kernels inwards at the moment...
 but looking at cfq_dispatch_insert+0xb it seems the struct request 
 pointer given (as second parameter by cfq_dispatch_request) was NULL 
 and dereferencing it in the RQ_CFQQ macro leads to this oops.
 
 The break-out patch below for __cfq_dispatch_request might be at least 
 a possible workaround for this, but it could also be total bullsh.. 
 Perhaps someone smarter might pick this up.. and give a real fix.
 
 Have fun,
 Frank
 ---
 
  block/cfq-iosched.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletion(-)
 
 Index: linux-2.6/block/cfq-iosched.c
 ===
 --- linux-2.6.orig/block/cfq-iosched.c
 +++ linux-2.6/block/cfq-iosched.c
 @@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
  * follow expired path, else get first next available
  */
 if ((rq = cfq_check_fifo(cfqq)) == NULL)
 -   rq = cfqq-next_rq;
 +   if ((rq = cfqq-next_rq) == NULL)
 +   break;
 
 /*
  * finally, insert request into driver dispatch list

That is not the right fix. A little further up in this function, a check
(well BUG_ON()) is done for a non-empty sort list. So we know at this
point, that we have requests pending for this queue. When that is the
case, -next_rq must always be kept uptodate and non-NULL. The oops at
least tells us this, it should not be papered around. The real fix is
finding out _where_ this now isn't being updated.

I'm puzzled why this is hitting Dan, but no one else has reported
anything. Dan, did 2.6.19 work for you?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-03-01 Thread Dan Williams

On 3/1/07, Jens Axboe [EMAIL PROTECTED] wrote:

On Thu, Mar 01 2007, Frank Seidel wrote:
 Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
  I can reliably reproduce a null pointer dereference on 2.6.20 and
  2.6.21-rc2.  I will keep digging to find the kernel version where
  this last worked, but wanted to see if there were any immediate
  experiments I should try.
  ...
  Kernel 2.6.21-rc2 on an i686
  ...
  [  431.709022] BUG: unable to handle kernel NULL pointer dereference
  at virtual address 005c [  431.717993]  printing eip:
  ...
  [  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
  ...
  [  431.887396]  [c01e1fc9] cfq_dispatch_requests+0x138/0x3f0
 Hi,
 unfortunately i yet don't really have much/enough knowledge of cfq and
 the kernels inwards at the moment...
 but looking at cfq_dispatch_insert+0xb it seems the struct request
 pointer given (as second parameter by cfq_dispatch_request) was NULL
 and dereferencing it in the RQ_CFQQ macro leads to this oops.

 The break-out patch below for __cfq_dispatch_request might be at least
 a possible workaround for this, but it could also be total bullsh..
 Perhaps someone smarter might pick this up.. and give a real fix.

 Have fun,
 Frank
 ---

  block/cfq-iosched.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletion(-)

 Index: linux-2.6/block/cfq-iosched.c
 ===
 --- linux-2.6.orig/block/cfq-iosched.c
 +++ linux-2.6/block/cfq-iosched.c
 @@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
  * follow expired path, else get first next available
  */
 if ((rq = cfq_check_fifo(cfqq)) == NULL)
 -   rq = cfqq-next_rq;
 +   if ((rq = cfqq-next_rq) == NULL)
 +   break;

 /*
  * finally, insert request into driver dispatch list

That is not the right fix. A little further up in this function, a check
(well BUG_ON()) is done for a non-empty sort list. So we know at this
point, that we have requests pending for this queue. When that is the
case, -next_rq must always be kept uptodate and non-NULL. The oops at
least tells us this, it should not be papered around. The real fix is
finding out _where_ this now isn't being updated.

I'm puzzled why this is hitting Dan, but no one else has reported
anything. Dan, did 2.6.19 work for you?


I am puzzled as well, although I do not think many people run raid6
arrays with 2-failed disks, so it might be an under-tested path, but a
non-degraded array runs fine...

I fired up a 2.6.19 kernel and tiobench ran past the point (in terms
of time) where it had failed on .20 and .21-rc.  However I noticed
things were running much slower since the cpu optimizations had fallen
back to Pentium-Pro from Core2 which affects the raid6 p+q calculation
speed among other things.  So I need to re-baseline the failure
against a more common config to say whether it is actually gone in
2.6.19.

I should have time to try these tests next week.


--
Jens Axboe


Regards,
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Chuck Ebbert
Chuck Ebbert wrote:
> There are two patches for raid5/6 out there that might fix this. I'll
> attach them (the second just fixes a minor bug in the first one.)

Never mind, those patches are already in 2.6.21-rc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Chuck Ebbert
Dan Williams wrote:
> I can reliably reproduce a null pointer dereference on 2.6.20 and
> 2.6.21-rc2.  I will keep digging to find the kernel version where this
> last worked, but wanted to see if there were any immediate experiments I
> should try.
> 
> The failure is caused by running tiobench on a MD raid6 array with 6 out
> of 8 disks available.  The commands I issued to reproduce this are:
> 
>   mdadm -A /dev/md0 /dev/sd[bcdefg]
>   mount /dev/md0 /mnt/raid
>   tiobench --numruns 5 --size 2048 --dir /mnt/raid
> 
> The filesystem is ext3.  The controller is an LSI 1068.  Here are the
> two BUG messages first 2.6.21-rc2 followed by 2.6.20.  I will reply to
> this message with the config.
> Kernel 2.6.20 on an i686
> 
> [  177.299787] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 005c
> [  177.308526]  printing eip:
> [  177.311287] c01de510
> [  177.313521] *pde = 34d40001
> [  177.316353] Oops:  [#1]
> [  177.319202] SMP 
> [  177.321107] Modules linked in: raid456 xor nfsd exportfs lockd nfs_acl 
> sunrpc autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport 
> ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT 
> ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN 
> ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev 
> xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp 
> xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state 
> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink 
> iptable_filter ip_tables x_tables video sbs i2c_ec dock button battery 
> asus_acpi ac radeon drm ipv6 lp parport_pc parport e1000 uhci_hcd floppy 
> mptsas mptscsih mptbase sg ehci_hcd scsi_transport_sas i2c_i801 i2c_core 
> pcspkr dm_snapshot dm_zero dm_mirror dm_mod ata_piix ata_generic libata 
> sd_mod scsi_mod ext3 jbd
> [  177.402252] CPU:2
> [  177.402253] EIP:0060:[]Not tainted VLI
> [  177.402253] EFLAGS: 00210016   (2.6.20 #5)
> [  177.414194] EIP is at cfq_dispatch_insert+0xb/0x53
> [  177.419056] eax: f7773ec0   ebx:    ecx: f7773cc0   edx: 
> [  177.425982] esi: f70abae0   edi: f7773cc0   ebp:    esp: f34dbcbc
> [  177.432953] ds: 007b   es: 007b   ss: 0068
> [  177.437127] Process tiotest (pid: 5405, ti=f34db000 task=f7efc030 
> task.ti=f34db000)
> [  177.444763] Stack: 0049 f77d3b9c f7773cc0  c01de6ce c014041e 
> f8a26806 0082 
> [  177.453456]f7efc030 fffe22d6    0004 
> f7efc030 f7773cc0 
> [  177.462121]   f70abae0 f7cd5800 f70abae0 
> c01d4fcc 0001 
> [  177.470798] Call Trace:
> [  177.473503]  [] cfq_dispatch_requests+0x12d/0x466
> [  177.479223]  [] __lock_acquire+0x9e9/0xa72
> [  177.484285]  [] scsi_request_fn+0x286/0x336 [scsi_mod]
> [  177.490485]  [] elv_next_request+0x1a2/0x1b2
> [  177.495766]  [] scsi_request_fn+0x286/0x336 [scsi_mod]
> [  177.501912]  [] _spin_lock_irq+0x38/0x43
> [  177.506840]  [] scsi_request_fn+0x59/0x336 [scsi_mod]
> [  177.512981]  [] blk_remove_plug+0x5a/0x66
> [  177.517983]  [] __generic_unplug_device+0x1d/0x1f
> [  177.523705]  [] generic_unplug_device+0x15/0x21
> [  177.529272]  [] unplug_slaves+0x54/0x88 [raid456]
> [  177.535013]  [] blk_backing_dev_unplug+0x73/0x7b
> [  177.540657]  [] _spin_unlock_irqrestore+0x3e/0x4d
> [  177.546382]  [] sync_page+0x0/0x3b
> [  177.550774]  [] trace_hardirqs_on+0x12e/0x158
> [  177.556108]  [] sync_page+0x0/0x3b
> [  177.560471]  [] block_sync_page+0x31/0x32
> [  177.565449]  [] sync_page+0x33/0x3b
> [  177.569916]  [] __wait_on_bit_lock+0x2a/0x52
> [  177.575201]  [] __lock_page+0x58/0x5e
> [  177.579810]  [] wake_bit_function+0x0/0x3c
> [  177.584905]  [] do_generic_mapping_read+0x1db/0x44f
> [  177.590911]  [] generic_file_aio_read+0x173/0x1a4
> [  177.596617]  [] file_read_actor+0x0/0xdb
> [  177.601525]  [] do_sync_read+0xc7/0x10a
> [  177.606365]  [] autoremove_wake_function+0x0/0x35
> [  177.612130]  [] do_sync_read+0x0/0x10a
> [  177.616867]  [] vfs_read+0xa6/0x152
> [  177.621362]  [] sys_read+0x41/0x67
> [  177.625794]  [] syscall_call+0x7/0xb
> [  177.630403]  ===
> [  177.634031] Code: da 11 3b c0 c7 04 24 51 9d 39 c0 e8 c9 a1 f4 ff e8 ca 6e 
> f2 ff ff 4f 34 83 c4 18 5b 5e 5f 5d c3 55 57 56 89 c6 53 8b 40 0c 89 d3 <8b> 
> 7a 5c 8b 68 04 89 d0 e8 b5 fe ff ff 8b 43 14 89 da 25 01 80 
> [  177.654378] EIP: [] cfq_dispatch_insert+0xb/0x53 SS:ESP 
> 0068:f34dbcbc

cfq_dispatch_requests() has called cfq_dispatch_insert() with a NULL
second argument (struct request *rq)

There are two patches for raid5/6 out there that might fix this. I'll
attach them (the second just fixes a minor bug in the first one.)

From: Neil Brown <[EMAIL PROTECTED]>

On Sunday February 11, [EMAIL PROTECTED] wrote:
> > Greetings,
> > 
> > I've been running md on my server for some time now and a few days ago one 
> > of

PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Dan Williams
I can reliably reproduce a null pointer dereference on 2.6.20 and
2.6.21-rc2.  I will keep digging to find the kernel version where this
last worked, but wanted to see if there were any immediate experiments I
should try.

The failure is caused by running tiobench on a MD raid6 array with 6 out
of 8 disks available.  The commands I issued to reproduce this are:

mdadm -A /dev/md0 /dev/sd[bcdefg]
mount /dev/md0 /mnt/raid
tiobench --numruns 5 --size 2048 --dir /mnt/raid

The filesystem is ext3.  The controller is an LSI 1068.  Here are the
two BUG messages first 2.6.21-rc2 followed by 2.6.20.  I will reply to
this message with the config.

Fedora Core release 5 (Bordeaux)
Kernel 2.6.21-rc2 on an i686

[  431.709022] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 005c
[  431.717993]  printing eip:
[  431.720825] c01e1e00
[  431.723112] *pde = 32e70001
[  431.726065] Oops:  [#1]
[  431.728997] SMP 
[  431.730922] Modules linked in: raid456 xor nfsd exportfs lockd nfs_acl 
sunrpc autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport ipt_ULOG 
ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent 
ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN ipt_ecn 
ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE 
xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack 
xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat 
nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter 
ip_tables x_tables video sbs i2c_ec dock button battery asus_acpi ac radeon drm 
ipv6 lp parport_pc parport floppy uhci_hcd ehci_hcd e1000 i2c_i801 sg mptsas 
mptscsih mptbase i2c_core scsi_transport_sas pcspkr dm_snapshot dm_zero 
dm_mirror dm_mod ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd
[  431.812682] CPU:0
[  431.812682] EIP:0060:[]Not tainted VLI
[  431.812683] EFLAGS: 00010002   (2.6.21-rc2 #4)
[  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
[  431.830413] eax: f6c96ec0   ebx:    ecx: c0410568   edx: 
[  431.837608] esi: f7e956a4   edi:    ebp: f6c96cc0   esp: c0491e54
[  431.844760] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[  431.850847] Process swapper (pid: 0, ti=c0491000 task=c03ff4c0 
task.ti=c0447000)
[  431.858360] Stack: f76ae3bc f6c96cc0  f6c96cc0 c01e1fc9  
00e7  
[  431.867165]c03ffa10 c0143123   0004 c03ff4c0 
 f7e957ac 
[  431.875998]f7e956a4 f7e956a4 f7d39000 f7e956a4 c01d8767 0001 
0046  
[  431.884656] Call Trace:
[  431.887396]  [] cfq_dispatch_requests+0x138/0x3f0
[  431.893274]  [] __lock_acquire+0xb64/0xbf4
[  431.898513]  [] elv_next_request+0x1a1/0x1b1
[  431.903923]  [] scsi_request_fn+0x59/0x336 [scsi_mod]
[  431.910148]  [] blk_run_queue+0x37/0x63
[  431.915100]  [] scsi_next_command+0x25/0x2f [scsi_mod]
[  431.921330]  [] scsi_end_request+0x9e/0xa8 [scsi_mod]
[  431.927493]  [] scsi_io_completion+0x15a/0x32b [scsi_mod]
[  431.934113]  [] sd_rw_intr+0x21b/0x245 [sd_mod]
[  431.939787]  [] _spin_unlock_irqrestore+0x3e/0x4d
[  431.945640]  [] scsi_finish_command+0x84/0x8b [scsi_mod]
[  431.952051]  [] trace_hardirqs_on+0x116/0x158
[  431.957446]  [] __do_softirq+0x5a/0xe9
[  431.962329]  [] blk_done_softirq+0x68/0x73
[  431.967447]  [] __do_softirq+0x72/0xe9
[  431.972290]  [] do_softirq+0x6f/0xec
[  431.976888]  [] _spin_unlock_irq+0x20/0x2c
[  431.982064]  [] __sched_text_start+0x96b/0x9f3
[  431.987574]  [] handle_fasteoi_irq+0x0/0xab
[  431.992823]  [] do_IRQ+0xbd/0xd4
[  431.997061]  [] common_interrupt+0x2e/0x34
[  432.002301]  [] mwait_idle_with_hints+0x3b/0x3f
[  432.007931]  [] cpu_idle+0xb5/0xce
[  432.012368]  [] start_kernel+0x4a5/0x4ad
[  432.017398]  [] unknown_bootoption+0x0/0x202
[  432.022829]  ===
[  432.026511] Code: 1f e9 3b c0 c7 04 24 51 6d 3a c0 e8 43 83 f4 ff e8 77 46 
f2 ff ff 4f 34 83 c4 18 5b 5e 5f 5d c3 55 57 56 89 c6 53 8b 40 0c 89 d3 <8b> 7a 
5c 8b 68 04 89 d0 e8 b5 fe ff ff 8b 43 14 89 da 25 01 80 
[  432.046781] EIP: [] cfq_dispatch_insert+0xb/0x53 SS:ESP 
0068:c0491e54
[  432.054403] Kernel panic - not syncing: Fatal exception in interrupt
[  432.060912] BUG: at arch/i386/kernel/smp.c:546 smp_call_function()
[  432.067203]  [] smp_call_function+0x64/0xd0
[  432.072473]  [] do_unblank_screen+0x25/0x11b
[  432.077910]  [] smp_send_stop+0x1b/0x40
[  432.082848]  [] panic+0x54/0xfd
[  432.087033]  [] die+0x202/0x236
[  432.091222]  [] do_page_fault+0x507/0x5e0
[  432.096323]  [] kmem_cache_free+0xa1/0xb2
[  432.101353]  [] kmem_cache_free+0xa1/0xb2
[  432.106415]  [] do_page_fault+0x0/0x5e0
[  432.111334]  [] error_code+0x7c/0x84
[  432.115934]  [] cfq_dispatch_insert+0xb/0x53
[  432.121304]  [] cfq_dispatch_requests+0x138/0x3f0
[  432.127161]  [] __lock_acquire+0xb64/0xbf4
[  432.132338]  [] elv_next_request+0x1a1/0x1b1
[  432.137608]  [] 

PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Dan Williams
I can reliably reproduce a null pointer dereference on 2.6.20 and
2.6.21-rc2.  I will keep digging to find the kernel version where this
last worked, but wanted to see if there were any immediate experiments I
should try.

The failure is caused by running tiobench on a MD raid6 array with 6 out
of 8 disks available.  The commands I issued to reproduce this are:

mdadm -A /dev/md0 /dev/sd[bcdefg]
mount /dev/md0 /mnt/raid
tiobench --numruns 5 --size 2048 --dir /mnt/raid

The filesystem is ext3.  The controller is an LSI 1068.  Here are the
two BUG messages first 2.6.21-rc2 followed by 2.6.20.  I will reply to
this message with the config.

Fedora Core release 5 (Bordeaux)
Kernel 2.6.21-rc2 on an i686

[  431.709022] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 005c
[  431.717993]  printing eip:
[  431.720825] c01e1e00
[  431.723112] *pde = 32e70001
[  431.726065] Oops:  [#1]
[  431.728997] SMP 
[  431.730922] Modules linked in: raid456 xor nfsd exportfs lockd nfs_acl 
sunrpc autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport ipt_ULOG 
ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent 
ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN ipt_ecn 
ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE 
xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack 
xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat 
nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter 
ip_tables x_tables video sbs i2c_ec dock button battery asus_acpi ac radeon drm 
ipv6 lp parport_pc parport floppy uhci_hcd ehci_hcd e1000 i2c_i801 sg mptsas 
mptscsih mptbase i2c_core scsi_transport_sas pcspkr dm_snapshot dm_zero 
dm_mirror dm_mod ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd
[  431.812682] CPU:0
[  431.812682] EIP:0060:[c01e1e00]Not tainted VLI
[  431.812683] EFLAGS: 00010002   (2.6.21-rc2 #4)
[  431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
[  431.830413] eax: f6c96ec0   ebx:    ecx: c0410568   edx: 
[  431.837608] esi: f7e956a4   edi:    ebp: f6c96cc0   esp: c0491e54
[  431.844760] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[  431.850847] Process swapper (pid: 0, ti=c0491000 task=c03ff4c0 
task.ti=c0447000)
[  431.858360] Stack: f76ae3bc f6c96cc0  f6c96cc0 c01e1fc9  
00e7  
[  431.867165]c03ffa10 c0143123   0004 c03ff4c0 
 f7e957ac 
[  431.875998]f7e956a4 f7e956a4 f7d39000 f7e956a4 c01d8767 0001 
0046  
[  431.884656] Call Trace:
[  431.887396]  [c01e1fc9] cfq_dispatch_requests+0x138/0x3f0
[  431.893274]  [c0143123] __lock_acquire+0xb64/0xbf4
[  431.898513]  [c01d8767] elv_next_request+0x1a1/0x1b1
[  431.903923]  [f8a26621] scsi_request_fn+0x59/0x336 [scsi_mod]
[  431.910148]  [c01dbb20] blk_run_queue+0x37/0x63
[  431.915100]  [f8a25561] scsi_next_command+0x25/0x2f [scsi_mod]
[  431.921330]  [f8a2571f] scsi_end_request+0x9e/0xa8 [scsi_mod]
[  431.927493]  [f8a258c0] scsi_io_completion+0x15a/0x32b [scsi_mod]
[  431.934113]  [f882c5fb] sd_rw_intr+0x21b/0x245 [sd_mod]
[  431.939787]  [c031b23a] _spin_unlock_irqrestore+0x3e/0x4d
[  431.945640]  [f8a213f6] scsi_finish_command+0x84/0x8b [scsi_mod]
[  431.952051]  [c0142166] trace_hardirqs_on+0x116/0x158
[  431.957446]  [c012e181] __do_softirq+0x5a/0xe9
[  431.962329]  [c01dc291] blk_done_softirq+0x68/0x73
[  431.967447]  [c012e199] __do_softirq+0x72/0xe9
[  431.972290]  [c0107033] do_softirq+0x6f/0xec
[  431.976888]  [c031b0ce] _spin_unlock_irq+0x20/0x2c
[  431.982064]  [c0318b1b] __sched_text_start+0x96b/0x9f3
[  431.987574]  [c01553a1] handle_fasteoi_irq+0x0/0xab
[  431.992823]  [c010716d] do_IRQ+0xbd/0xd4
[  431.997061]  [c0105886] common_interrupt+0x2e/0x34
[  432.002301]  [c0103240] mwait_idle_with_hints+0x3b/0x3f
[  432.007931]  [c01033b9] cpu_idle+0xb5/0xce
[  432.012368]  [c044ca9a] start_kernel+0x4a5/0x4ad
[  432.017398]  [c044c1b8] unknown_bootoption+0x0/0x202
[  432.022829]  ===
[  432.026511] Code: 1f e9 3b c0 c7 04 24 51 6d 3a c0 e8 43 83 f4 ff e8 77 46 
f2 ff ff 4f 34 83 c4 18 5b 5e 5f 5d c3 55 57 56 89 c6 53 8b 40 0c 89 d3 8b 7a 
5c 8b 68 04 89 d0 e8 b5 fe ff ff 8b 43 14 89 da 25 01 80 
[  432.046781] EIP: [c01e1e00] cfq_dispatch_insert+0xb/0x53 SS:ESP 
0068:c0491e54
[  432.054403] Kernel panic - not syncing: Fatal exception in interrupt
[  432.060912] BUG: at arch/i386/kernel/smp.c:546 smp_call_function()
[  432.067203]  [c0118c63] smp_call_function+0x64/0xd0
[  432.072473]  [c023df9a] do_unblank_screen+0x25/0x11b
[  432.077910]  [c0118cea] smp_send_stop+0x1b/0x40
[  432.082848]  [c01296cb] panic+0x54/0xfd
[  432.087033]  [c010639c] die+0x202/0x236
[  432.091222]  [c031cc58] do_page_fault+0x507/0x5e0
[  432.096323]  [c01716e2] kmem_cache_free+0xa1/0xb2
[  432.101353]  [c01716e2] kmem_cache_free+0xa1/0xb2
[  432.106415]  

Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Chuck Ebbert
Dan Williams wrote:
 I can reliably reproduce a null pointer dereference on 2.6.20 and
 2.6.21-rc2.  I will keep digging to find the kernel version where this
 last worked, but wanted to see if there were any immediate experiments I
 should try.
 
 The failure is caused by running tiobench on a MD raid6 array with 6 out
 of 8 disks available.  The commands I issued to reproduce this are:
 
   mdadm -A /dev/md0 /dev/sd[bcdefg]
   mount /dev/md0 /mnt/raid
   tiobench --numruns 5 --size 2048 --dir /mnt/raid
 
 The filesystem is ext3.  The controller is an LSI 1068.  Here are the
 two BUG messages first 2.6.21-rc2 followed by 2.6.20.  I will reply to
 this message with the config.
 Kernel 2.6.20 on an i686
 
 [  177.299787] BUG: unable to handle kernel NULL pointer dereference at 
 virtual address 005c
 [  177.308526]  printing eip:
 [  177.311287] c01de510
 [  177.313521] *pde = 34d40001
 [  177.316353] Oops:  [#1]
 [  177.319202] SMP 
 [  177.321107] Modules linked in: raid456 xor nfsd exportfs lockd nfs_acl 
 sunrpc autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport 
 ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT 
 ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN 
 ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev 
 xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp 
 xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state 
 iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink 
 iptable_filter ip_tables x_tables video sbs i2c_ec dock button battery 
 asus_acpi ac radeon drm ipv6 lp parport_pc parport e1000 uhci_hcd floppy 
 mptsas mptscsih mptbase sg ehci_hcd scsi_transport_sas i2c_i801 i2c_core 
 pcspkr dm_snapshot dm_zero dm_mirror dm_mod ata_piix ata_generic libata 
 sd_mod scsi_mod ext3 jbd
 [  177.402252] CPU:2
 [  177.402253] EIP:0060:[c01de510]Not tainted VLI
 [  177.402253] EFLAGS: 00210016   (2.6.20 #5)
 [  177.414194] EIP is at cfq_dispatch_insert+0xb/0x53
 [  177.419056] eax: f7773ec0   ebx:    ecx: f7773cc0   edx: 
 [  177.425982] esi: f70abae0   edi: f7773cc0   ebp:    esp: f34dbcbc
 [  177.432953] ds: 007b   es: 007b   ss: 0068
 [  177.437127] Process tiotest (pid: 5405, ti=f34db000 task=f7efc030 
 task.ti=f34db000)
 [  177.444763] Stack: 0049 f77d3b9c f7773cc0  c01de6ce c014041e 
 f8a26806 0082 
 [  177.453456]f7efc030 fffe22d6    0004 
 f7efc030 f7773cc0 
 [  177.462121]   f70abae0 f7cd5800 f70abae0 
 c01d4fcc 0001 
 [  177.470798] Call Trace:
 [  177.473503]  [c01de6ce] cfq_dispatch_requests+0x12d/0x466
 [  177.479223]  [c014041e] __lock_acquire+0x9e9/0xa72
 [  177.484285]  [f8a26806] scsi_request_fn+0x286/0x336 [scsi_mod]
 [  177.490485]  [c01d4fcc] elv_next_request+0x1a2/0x1b2
 [  177.495766]  [f8a26806] scsi_request_fn+0x286/0x336 [scsi_mod]
 [  177.501912]  [c0315ba8] _spin_lock_irq+0x38/0x43
 [  177.506840]  [f8a265d9] scsi_request_fn+0x59/0x336 [scsi_mod]
 [  177.512981]  [c01d7e7d] blk_remove_plug+0x5a/0x66
 [  177.517983]  [c01d7ea6] __generic_unplug_device+0x1d/0x1f
 [  177.523705]  [c01d8278] generic_unplug_device+0x15/0x21
 [  177.529272]  [f97ee054] unplug_slaves+0x54/0x88 [raid456]
 [  177.535013]  [c01d997a] blk_backing_dev_unplug+0x73/0x7b
 [  177.540657]  [c0315d82] _spin_unlock_irqrestore+0x3e/0x4d
 [  177.546382]  [c0154b26] sync_page+0x0/0x3b
 [  177.550774]  [c013f5f4] trace_hardirqs_on+0x12e/0x158
 [  177.556108]  [c0154b26] sync_page+0x0/0x3b
 [  177.560471]  [c018caa5] block_sync_page+0x31/0x32
 [  177.565449]  [c0154b59] sync_page+0x33/0x3b
 [  177.569916]  [c0313d9e] __wait_on_bit_lock+0x2a/0x52
 [  177.575201]  [c0154b18] __lock_page+0x58/0x5e
 [  177.579810]  [c0139612] wake_bit_function+0x0/0x3c
 [  177.584905]  [c0155228] do_generic_mapping_read+0x1db/0x44f
 [  177.590911]  [c01570cb] generic_file_aio_read+0x173/0x1a4
 [  177.596617]  [c0154930] file_read_actor+0x0/0xdb
 [  177.601525]  [c0171b47] do_sync_read+0xc7/0x10a
 [  177.606365]  [c01395dd] autoremove_wake_function+0x0/0x35
 [  177.612130]  [c0171a80] do_sync_read+0x0/0x10a
 [  177.616867]  [c01723ce] vfs_read+0xa6/0x152
 [  177.621362]  [c0172830] sys_read+0x41/0x67
 [  177.625794]  [c0103e24] syscall_call+0x7/0xb
 [  177.630403]  ===
 [  177.634031] Code: da 11 3b c0 c7 04 24 51 9d 39 c0 e8 c9 a1 f4 ff e8 ca 6e 
 f2 ff ff 4f 34 83 c4 18 5b 5e 5f 5d c3 55 57 56 89 c6 53 8b 40 0c 89 d3 8b 
 7a 5c 8b 68 04 89 d0 e8 b5 fe ff ff 8b 43 14 89 da 25 01 80 
 [  177.654378] EIP: [c01de510] cfq_dispatch_insert+0xb/0x53 SS:ESP 
 0068:f34dbcbc

cfq_dispatch_requests() has called cfq_dispatch_insert() with a NULL
second argument (struct request *rq)

There are two patches for raid5/6 out there that might fix this. I'll
attach them (the second just fixes a minor bug in the first one.)

From: Neil Brown [EMAIL 

Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Chuck Ebbert
Chuck Ebbert wrote:
 There are two patches for raid5/6 out there that might fix this. I'll
 attach them (the second just fixes a minor bug in the first one.)

Never mind, those patches are already in 2.6.21-rc.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/