subject:"Re\: \[OOPS\] 2.6.21\-rc6\-git5 in cfq_dispatch

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Roland Kuhn


Hi Jens!

On 25 Apr 2007, at 12:18, Jens Axboe wrote:


On Wed, Apr 25 2007, Brad Campbell wrote:

Jens Axboe wrote:

It looks to be extremely rare. Aliases are extremely rare, front  
merges
are rare. And you need both to happen with the details you  
outlined. But
it's a large user base, and we've had 3-4 reports on this in the  
past
months. So it obviously does happen. I could not make it trigger  
without

doctoring the unplug code when I used aio.


Well, not that rare on this particular machine (I had a case where  
I could
reproduce it in less than an hour of normal use previously on this  
box),

and I've had it occur a number of times on my servers, I just never
reported it before as I never took the time to set up a serial  
console and

capture the oops.


Extremely rare in the sense that it takes md and some certain  
conditions
to happen for it to trigger. So for most people it'll be extremely  
rare,

and for others (such as yourself) that hit it, it wont be so rare :-)


Here's a fix for it, confirmed.


Shall I leave the other debugging in, apply this and run it for a  
few hard

days?


Yes, that would be perfect!

Okay, I left all debugging patches in, disabled all kernel  
debugging .config stuff and gave it a spin with our usual "killer"  
workload (as far as batch systems are repeatable anyway) and so far  
there was not a single glitch or message, so I preliminarily conclude  
that the bug is squashed. The final word will come once my 1800 batch  
jobs are processed and I have my particle physics result ;-)


Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+

PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++> h y+++
--END GEEK CODE BLOCK--




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part

Degraded RAID performance - Was : Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:

I wonder if we should avoid bypassing the stripe cache if the needed stripes
are already in the cache... or if at least one needed stripe is or
if the array is degraded...
Probably in the degraded case we should never bypass the cache, as if
we do, then a sequential read of a full stripe will read every block
twice.  I'd better to some performance measurements.


Ok, that would explain some odd performance issues I've noticed.
Let's say I run

dstat -D sda,sdb,sdc,sdd,md0 5
total-cpu-usage --disk/sdadisk/sdbdisk/sdcdisk/sdddisk/md0- -net/total- 
---paging-- ---system--
usr sys idl wai hiq siq|_read write _read write _read write _read write _read write|_recv 
_send|__in_ _out_|_int_ _csw_
 25  22   0  47   0   6|20.1M0 :20.2M0 :20.1M0 :   0 0 :40.2M0 | 146B  662B| 
0 0 |1186   661
 26  20   0  46   0   8|19.4M0 :19.4M0 :19.4M0 :   0 0 :38.9M0 | 160B  549B| 
0 0 |1365   650


Given I'm doing a read, I would have expected a read to consist of 2 direct reads, one parity read 
and some calculation. The numbers I'm seeing however show 3 reads for 2 reads worth of bandwidth.


[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda[0] sdc[2] sdb[1]
  585934080 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

(Dropped Jens and Chuck from the cc as this likely has little interest for them)

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Wednesday April 25, [EMAIL PROTECTED] wrote:
> 
> [  756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb()
> [  756.329615]  [] cfq_merged_request+0x71/0x80
> [  756.345046]  [] cfq_merged_request+0x0/0x80
> [  756.360216]  [] elv_merged_request+0x4e/0x50
> [  756.375647]  [] __make_request+0x1a7/0x2f0
> [  756.390557]  [] generic_make_request+0x127/0x190
> [  756.407025]  [] chunk_aligned_read+0x111/0x1c0
   ^
> [  756.422974]  [] generic_make_request+0x127/0x190
> [  756.439443]  [] make_request+0x388/0x3b0
> [  756.453834]  [] __bio_add_page+0x147/0x1c0
> [  756.468742]  [] bio_alloc_bioset+0x7f/0x150
> [  756.483913]  [] raid5_mergeable_bvec+0x0/0x90
> [  756.499604]  [] bio_add_page+0x37/0x50
> [  756.513473]  [] generic_make_request+0x127/0x190
> [  756.529943]  [] mempool_free+0x2a/0x60
> [  756.543812]  [] bio_free+0x1d/0x40
> [  756.556645]  [] submit_bio+0x46/0xc0
> [  756.571389]  [] radix_tree_node_alloc+0x16/0x60
> [  756.587610]  [] radix_tree_insert+0xe2/0x130
> [  756.603039]  [] __pagevec_lru_add+0x75/0x80
> [  756.618207]  [] mpage_bio_submit+0x11/0x20
> [  756.633117]  [] mpage_readpages+0x100/0x140
   ^^^

So it is a regular data read (not metadata of any sort) which is
by-passing the stripe cache.  I was expecting some sort of metadata
read.
I wander what this is merging in-front of... maybe a stripe-cache
read.
I guess it doesn't tell us anything really interesting, which is
probably a good thing - it means everything else is working properly.

I wonder if we should avoid bypassing the stripe cache if the needed stripes
are already in the cache... or if at least one needed stripe is or
if the array is degraded...
Probably in the degraded case we should never bypass the cache, as if
we do, then a sequential read of a full stripe will read every block
twice.  I'd better to some performance measurements.

Thanks for all the patient testing.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Neil Brown wrote:
> On Wednesday April 25, [EMAIL PROTECTED] wrote:
> > 
> > Here's a fix for it, confirmed.
> > 
> 
> Patch looks good to me.

Good! And thanks for taking the time to look at this bug btw, it's had
me puzzled a bit this week. I should not have ruled out the aliases I
guess, they must be coming elsewhere than from O_DIRECT.

> Hopefully Brad can still wait for the WARN_ON to fire - it might give
> useful clues to why this is happening.  It might be interesting.

But he did and reported it about an hour ago, so all should be well.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Wednesday April 25, [EMAIL PROTECTED] wrote:
> 
> Here's a fix for it, confirmed.
> 

Patch looks good to me.
Hopefully Brad can still wait for the WARN_ON to fire - it might give
useful clues to why this is happening.  It might be interesting.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Brad Campbell wrote:
> Jens Axboe wrote:
> 
> >It looks to be extremely rare. Aliases are extremely rare, front merges
> >are rare. And you need both to happen with the details you outlined. But
> >it's a large user base, and we've had 3-4 reports on this in the past
> >months. So it obviously does happen. I could not make it trigger without
> >doctoring the unplug code when I used aio.
> 
> Well, not that rare on this particular machine (I had a case where I could 
> reproduce it in less than an hour of normal use previously on this box), 
> and I've had it occur a number of times on my servers, I just never 
> reported it before as I never took the time to set up a serial console and 
> capture the oops.

Extremely rare in the sense that it takes md and some certain conditions
to happen for it to trigger. So for most people it'll be extremely rare,
and for others (such as yourself) that hit it, it wont be so rare :-)

> >Here's a fix for it, confirmed.
> 
> Shall I leave the other debugging in, apply this and run it for a few hard 
> days?

Yes, that would be perfect!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:



You could test this theory by putting a
   WARN_ON(cfqq->next_rq == NULL);
at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.




[  756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb()
[  756.329615]  [] cfq_merged_request+0x71/0x80
[  756.345046]  [] cfq_merged_request+0x0/0x80
[  756.360216]  [] elv_merged_request+0x4e/0x50
[  756.375647]  [] __make_request+0x1a7/0x2f0
[  756.390557]  [] generic_make_request+0x127/0x190
[  756.407025]  [] chunk_aligned_read+0x111/0x1c0
[  756.422974]  [] generic_make_request+0x127/0x190
[  756.439443]  [] make_request+0x388/0x3b0
[  756.453834]  [] __bio_add_page+0x147/0x1c0
[  756.468742]  [] bio_alloc_bioset+0x7f/0x150
[  756.483913]  [] raid5_mergeable_bvec+0x0/0x90
[  756.499604]  [] bio_add_page+0x37/0x50
[  756.513473]  [] generic_make_request+0x127/0x190
[  756.529943]  [] mempool_free+0x2a/0x60
[  756.543812]  [] bio_free+0x1d/0x40
[  756.556645]  [] submit_bio+0x46/0xc0
[  756.571389]  [] radix_tree_node_alloc+0x16/0x60
[  756.587610]  [] radix_tree_insert+0xe2/0x130
[  756.603039]  [] __pagevec_lru_add+0x75/0x80
[  756.618207]  [] mpage_bio_submit+0x11/0x20
[  756.633117]  [] mpage_readpages+0x100/0x140
[  756.648287]  [] ext3_get_block+0x0/0xe0
[  756.662419]  [] scsi_end_request+0x9b/0xc0
[  756.677329]  [] scsi_delete_timer+0xb/0x20
[  756.692240]  [] ext3_readpages+0x0/0x20
[  756.706369]  [] read_pages+0xd1/0xe0
[  756.719719]  [] ext3_get_block+0x0/0xe0
[  756.733853]  [] __alloc_pages+0x54/0x2d0
[  756.748242]  [] scsi_softirq_done+0x6f/0xe0
[  756.763413]  [] sil_interrupt+0x81/0x90
[  756.777544]  [] blk_done_softirq+0x58/0x70
[  756.792454]  [] __do_page_cache_readahead+0xa8/0x110
[  756.809961]  [] blockable_page_cache_readahead+0x51/0xd0
[  756.828508]  [] make_ahead_window+0x74/0xc0
[  756.843679]  [] page_cache_readahead+0x9a/0x180
[  756.859886]  [] file_read_actor+0xc4/0xd0
[  756.874537]  [] do_generic_mapping_read+0x467/0x470
[  756.891786]  [] generic_file_aio_read+0x183/0x1c0
[  756.908514]  [] file_read_actor+0x0/0xd0
[  756.922903]  [] scsi_next_command+0x30/0x50
[  756.938073]  [] do_sync_read+0xc0/0x100
[  756.952204]  [] autoremove_wake_function+0x0/0x50
[  756.968934]  [] scsi_finish_command+0x40/0x60
[  756.984622]  [] schedule+0x2e5/0x590
[  756.997972]  [] vfs_read+0x86/0x110
[  757.011065]  [] sys_read+0x47/0x80
[  757.023897]  [] syscall_call+0x7/0xb
[  757.037250]  ===


I know this is late given a fix has been issued, but just got this in the logs with no other debug 
info at all. This kernel has *all* Jens debugging patches + the snippet from Neil above.


I'll apply the fix now and run it for a couple of days.

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Jens Axboe wrote:


It looks to be extremely rare. Aliases are extremely rare, front merges
are rare. And you need both to happen with the details you outlined. But
it's a large user base, and we've had 3-4 reports on this in the past
months. So it obviously does happen. I could not make it trigger without
doctoring the unplug code when I used aio.


Well, not that rare on this particular machine (I had a case where I could reproduce it in less than 
an hour of normal use previously on this box), and I've had it occur a number of times on my 
servers, I just never reported it before as I never took the time to set up a serial console and 
capture the oops.



Here's a fix for it, confirmed.


Shall I leave the other debugging in, apply this and run it for a few hard days?

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:

On Wednesday April 25, [EMAIL PROTECTED] wrote:

BUT... That may explain while we are only seeing it on md. Would md
ever be issuing such requests that trigger this condition?


Can someone remind me which raid level(s) was/were involved?


Raid-5 gegraded here, But I've had it on my server on a fully functioning raid-5 also. Don't think 
I've seen it on any of my raid-6 yet.



I guess it could happen with raid4/5/6.  A read request that was
properly aligned (and we do encourage proper alignment) will be passed
directly to the underlying device.  A concurrent write elsewhere could
require the same block to be read into the stripe-cache to enable a
parity calculation.  So you could get two reads at the same block
address.


Except all the filesystems I've had this occur on are Ext3 and all are mounted noatime, with no 
processes ever writing to the filesystems. So here it occurs with reads only.


Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Neil Brown wrote:
> On Wednesday April 25, [EMAIL PROTECTED] wrote:
> > 
> > That's pretty close to where I think the problem is (the front merging
> > and cfq_reposition_rq_rb()). The issue with that is that you'd only get
> > aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
> > here. Given that front merges are equally not very likely, I'd be
> > surprised is something like that has ever happened.
> 
> Well it certainly doesn't happen very often
> And I can imagine a filesystem genuinely wanting to read the same
> block twice - maybe a block that contained packed tails of two
> different files.

But that usually happens through the page cache, so it's nicely
serialized. You'd only ever get duplicate/alias blocks if you go raw, or
have some layer on top of your block device (like md) that issues io on
its own.

> > BUT... That may explain while we are only seeing it on md. Would md
> > ever be issuing such requests that trigger this condition?
> 
> Can someone remind me which raid level(s) was/were involved?
> 
> I think one was raid0 - that just passes requests on from the
> filesystem, so md would only issue requests like that if the
> filesystem did.

Yep, one was raid0.

> I guess it could happen with raid4/5/6.  A read request that was
> properly aligned (and we do encourage proper alignment) will be passed
> directly to the underlying device.  A concurrent write elsewhere could
> require the same block to be read into the stripe-cache to enable a
> parity calculation.  So you could get two reads at the same block
> address.
> Getting a front-merge would probably require two stripe-heads to be
> processed in reverse-sector order, and it tends to preserve the order
> of incoming requests (though that isn't firmly enforced).
> 
> raid1 is much like raid0 (with totally different code) in that the
> request pattern seen by the underlying device matches the request
> pattern generated by the filesystem.
> 
> If I read the debugging output correctly, the request which I
> hypothesise was the subject of a front-merge is a 'sync' request.
> raid5 does not generate sync requests to fill the stripe cache (maybe
> it should?) so I really think it must have come directly from the
> filesystem.
> 
> (just checked previous email for more detail of when it hits)
> The fact that it hits degraded arrays more easily is interesting.
> Maybe we try to read a block on the missing device and so schedule
> reads for the other devices.  Then we try to read a block on a good
> device and issue a request for exactly the same block that raid5 asked
> for.  That still doesn't explain the 'sync' and the 'front merge'.
> But that is quite possible, just not common maybe.
> 
> It doesn't help us understand the raid0 example though.  May it is
> just a 'can happen, but only rarely' thing.

It looks to be extremely rare. Aliases are extremely rare, front merges
are rare. And you need both to happen with the details you outlined. But
it's a large user base, and we've had 3-4 reports on this in the past
months. So it obviously does happen. I could not make it trigger without
doctoring the unplug code when I used aio.

Here's a fix for it, confirmed.

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9e37971..f965be7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -532,6 +532,11 @@ static void cfq_add_rq_rb(struct request *rq)
 
if (!cfq_cfqq_on_rr(cfqq))
cfq_add_cfqq_rr(cfqd, cfqq);
+
+   /*
+* check if this request is a better next-serve candidate
+*/
+   cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
 }
 
 static inline void
@@ -1639,12 +1644,6 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue 
*cfqq,
cfqq->meta_pending++;
 
/*
-* check if this request is a better next-serve candidate)) {
-*/
-   cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
-   BUG_ON(!cfqq->next_rq);
-
-   /*
 * we never wait for an async request and we don't allow preemption
 * of an async request. so just return early
 */

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Wednesday April 25, [EMAIL PROTECTED] wrote:
> 
> That's pretty close to where I think the problem is (the front merging
> and cfq_reposition_rq_rb()). The issue with that is that you'd only get
> aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
> here. Given that front merges are equally not very likely, I'd be
> surprised is something like that has ever happened.

Well it certainly doesn't happen very often
And I can imagine a filesystem genuinely wanting to read the same
block twice - maybe a block that contained packed tails of two
different files.
> 
> BUT... That may explain while we are only seeing it on md. Would md
> ever be issuing such requests that trigger this condition?

Can someone remind me which raid level(s) was/were involved?

I think one was raid0 - that just passes requests on from the
filesystem, so md would only issue requests like that if the
filesystem did.
I guess it could happen with raid4/5/6.  A read request that was
properly aligned (and we do encourage proper alignment) will be passed
directly to the underlying device.  A concurrent write elsewhere could
require the same block to be read into the stripe-cache to enable a
parity calculation.  So you could get two reads at the same block
address.
Getting a front-merge would probably require two stripe-heads to be
processed in reverse-sector order, and it tends to preserve the order
of incoming requests (though that isn't firmly enforced).

raid1 is much like raid0 (with totally different code) in that the
request pattern seen by the underlying device matches the request
pattern generated by the filesystem.

If I read the debugging output correctly, the request which I
hypothesise was the subject of a front-merge is a 'sync' request.
raid5 does not generate sync requests to fill the stripe cache (maybe
it should?) so I really think it must have come directly from the
filesystem.

(just checked previous email for more detail of when it hits)
The fact that it hits degraded arrays more easily is interesting.
Maybe we try to read a block on the missing device and so schedule
reads for the other devices.  Then we try to read a block on a good
device and issue a request for exactly the same block that raid5 asked
for.  That still doesn't explain the 'sync' and the 'front merge'.
But that is quite possible, just not common maybe.

It doesn't help us understand the raid0 example though.  May it is
just a 'can happen, but only rarely' thing.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Jens Axboe wrote:
> On Wed, Apr 25 2007, Neil Brown wrote:
> > On Tuesday April 24, [EMAIL PROTECTED] wrote:
> > > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, 
> > > report the issue to 
> > > [EMAIL PROTECTED]
> > > [105449.683646] cfq: busy=1,drv=0,timer=0
> > > [105449.694871] cfq rr_list:
> > > [105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
> > > [105449.720693] cfq busy_list:
> > > [105449.729054] cfq idle_list:
> > > [105449.737418] cfq cur_rr:
> > 
> > Ok, I have a theory.
> > 
> > An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls
> > ->elevator_merged_fn which is cfq_merged_request.
> > 
> > At this time there is already a request in cfq with the same sector
> > number, and that request is the only other request on the queue.
> > 
> > cfq_merged_request calls cfq_reposition_rq_rb which removes the
> > req from ->sortlist and then calls cfq_add_rq_rb to add it back (at
> > the new location because ->sector has changed).
> > 
> > cfq_add_rq_rb finds there is already a request with the same sector
> > number and so elv_rb_add returns an __alias which is passed to
> > cfq_dispatch_insert. 
> > This calls cfq_remove_request and as that is the only request present,
> > ->next_rq gets set to NULL.
> > The old request with the new sector number is then added to the
> > ->sortlist, but ->next_rq is never set - it remains NULL.
> > 
> > How likely it would be to get two requests with the same sector number
> > I don't know.  I wouldn't expect it to ever happen - I have seen it
> > before, but it was due to a bug in ext3.  Maybe XFS does it
> > intentionally some times?
> > 
> > You could test this theory by putting a
> >WARN_ON(cfqq->next_rq == NULL);
> > at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.
> > 
> > I will leave the development of a suitable fix up to Jens if he agrees
> > that this is possible.
> 
> That's pretty close to where I think the problem is (the front merging
> and cfq_reposition_rq_rb()). The issue with that is that you'd only get
> aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
> here. Given that front merges are equally not very likely, I'd be
> surprised is something like that has ever happened.
> 
> BUT... That may explain while we are only seeing it on md. Would md
> ever be issuing such requests that trigger this condition?
> 
> I'll try and concoct a test case.

I made a test case and reproduced it, this is indeed what is happening.
md must be issuing direct requests in such a manner, that you do get a
front merge and then an alias to an existing request. Will test a fix!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:


How likely it would be to get two requests with the same sector number
I don't know.  I wouldn't expect it to ever happen - I have seen it
before, but it was due to a bug in ext3.  Maybe XFS does it
intentionally some times?


It certainly sounds like an odd thing to occur.

Even stranger that it's easier to hit on a degraded array or an array being 
checked.

I *am* using ext3 on this box (and all my boxes in fact)


You could test this theory by putting a
   WARN_ON(cfqq->next_rq == NULL);
at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.


I've done that.. now to wait for it to hit again.

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Neil Brown wrote:
> On Tuesday April 24, [EMAIL PROTECTED] wrote:
> > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, 
> > report the issue to 
> > [EMAIL PROTECTED]
> > [105449.683646] cfq: busy=1,drv=0,timer=0
> > [105449.694871] cfq rr_list:
> > [105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
> > [105449.720693] cfq busy_list:
> > [105449.729054] cfq idle_list:
> > [105449.737418] cfq cur_rr:
> 
> Ok, I have a theory.
> 
> An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls
> ->elevator_merged_fn which is cfq_merged_request.
> 
> At this time there is already a request in cfq with the same sector
> number, and that request is the only other request on the queue.
> 
> cfq_merged_request calls cfq_reposition_rq_rb which removes the
> req from ->sortlist and then calls cfq_add_rq_rb to add it back (at
> the new location because ->sector has changed).
> 
> cfq_add_rq_rb finds there is already a request with the same sector
> number and so elv_rb_add returns an __alias which is passed to
> cfq_dispatch_insert. 
> This calls cfq_remove_request and as that is the only request present,
> ->next_rq gets set to NULL.
> The old request with the new sector number is then added to the
> ->sortlist, but ->next_rq is never set - it remains NULL.
> 
> How likely it would be to get two requests with the same sector number
> I don't know.  I wouldn't expect it to ever happen - I have seen it
> before, but it was due to a bug in ext3.  Maybe XFS does it
> intentionally some times?
> 
> You could test this theory by putting a
>WARN_ON(cfqq->next_rq == NULL);
> at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.
> 
> I will leave the development of a suitable fix up to Jens if he agrees
> that this is possible.

That's pretty close to where I think the problem is (the front merging
and cfq_reposition_rq_rb()). The issue with that is that you'd only get
aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
here. Given that front merges are equally not very likely, I'd be
surprised is something like that has ever happened.

BUT... That may explain while we are only seeing it on md. Would md
ever be issuing such requests that trigger this condition?

I'll try and concoct a test case.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Tuesday April 24, [EMAIL PROTECTED] wrote:
> [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, 
> report the issue to 
> [EMAIL PROTECTED]
> [105449.683646] cfq: busy=1,drv=0,timer=0
> [105449.694871] cfq rr_list:
> [105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
> [105449.720693] cfq busy_list:
> [105449.729054] cfq idle_list:
> [105449.737418] cfq cur_rr:

Ok, I have a theory.

An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls
->elevator_merged_fn which is cfq_merged_request.

At this time there is already a request in cfq with the same sector
number, and that request is the only other request on the queue.

cfq_merged_request calls cfq_reposition_rq_rb which removes the
req from ->sortlist and then calls cfq_add_rq_rb to add it back (at
the new location because ->sector has changed).

cfq_add_rq_rb finds there is already a request with the same sector
number and so elv_rb_add returns an __alias which is passed to
cfq_dispatch_insert. 
This calls cfq_remove_request and as that is the only request present,
->next_rq gets set to NULL.
The old request with the new sector number is then added to the
->sortlist, but ->next_rq is never set - it remains NULL.

How likely it would be to get two requests with the same sector number
I don't know.  I wouldn't expect it to ever happen - I have seen it
before, but it was due to a bug in ext3.  Maybe XFS does it
intentionally some times?

You could test this theory by putting a
   WARN_ON(cfqq->next_rq == NULL);
at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.

I will leave the development of a suitable fix up to Jens if he agrees
that this is possible.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Tuesday April 24, [EMAIL PROTECTED] wrote:
 [105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, 
 report the issue to 
 [EMAIL PROTECTED]
 [105449.683646] cfq: busy=1,drv=0,timer=0
 [105449.694871] cfq rr_list:
 [105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
 [105449.720693] cfq busy_list:
 [105449.729054] cfq idle_list:
 [105449.737418] cfq cur_rr:

Ok, I have a theory.

An ELEVATOR_FRONT_MERGE occurs which changes req-sector and calls
-elevator_merged_fn which is cfq_merged_request.

At this time there is already a request in cfq with the same sector
number, and that request is the only other request on the queue.

cfq_merged_request calls cfq_reposition_rq_rb which removes the
req from -sortlist and then calls cfq_add_rq_rb to add it back (at
the new location because -sector has changed).

cfq_add_rq_rb finds there is already a request with the same sector
number and so elv_rb_add returns an __alias which is passed to
cfq_dispatch_insert. 
This calls cfq_remove_request and as that is the only request present,
-next_rq gets set to NULL.
The old request with the new sector number is then added to the
-sortlist, but -next_rq is never set - it remains NULL.

How likely it would be to get two requests with the same sector number
I don't know.  I wouldn't expect it to ever happen - I have seen it
before, but it was due to a bug in ext3.  Maybe XFS does it
intentionally some times?

You could test this theory by putting a
   WARN_ON(cfqq-next_rq == NULL);
at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.

I will leave the development of a suitable fix up to Jens if he agrees
that this is possible.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Neil Brown wrote:
 On Tuesday April 24, [EMAIL PROTECTED] wrote:
  [105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, 
  report the issue to 
  [EMAIL PROTECTED]
  [105449.683646] cfq: busy=1,drv=0,timer=0
  [105449.694871] cfq rr_list:
  [105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
  [105449.720693] cfq busy_list:
  [105449.729054] cfq idle_list:
  [105449.737418] cfq cur_rr:
 
 Ok, I have a theory.
 
 An ELEVATOR_FRONT_MERGE occurs which changes req-sector and calls
 -elevator_merged_fn which is cfq_merged_request.
 
 At this time there is already a request in cfq with the same sector
 number, and that request is the only other request on the queue.
 
 cfq_merged_request calls cfq_reposition_rq_rb which removes the
 req from -sortlist and then calls cfq_add_rq_rb to add it back (at
 the new location because -sector has changed).
 
 cfq_add_rq_rb finds there is already a request with the same sector
 number and so elv_rb_add returns an __alias which is passed to
 cfq_dispatch_insert. 
 This calls cfq_remove_request and as that is the only request present,
 -next_rq gets set to NULL.
 The old request with the new sector number is then added to the
 -sortlist, but -next_rq is never set - it remains NULL.
 
 How likely it would be to get two requests with the same sector number
 I don't know.  I wouldn't expect it to ever happen - I have seen it
 before, but it was due to a bug in ext3.  Maybe XFS does it
 intentionally some times?
 
 You could test this theory by putting a
WARN_ON(cfqq-next_rq == NULL);
 at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.
 
 I will leave the development of a suitable fix up to Jens if he agrees
 that this is possible.

That's pretty close to where I think the problem is (the front merging
and cfq_reposition_rq_rb()). The issue with that is that you'd only get
aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
here. Given that front merges are equally not very likely, I'd be
surprised is something like that has ever happened.

BUT... That may explain while we are only seeing it on md. Would md
ever be issuing such requests that trigger this condition?

I'll try and concoct a test case.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:


How likely it would be to get two requests with the same sector number
I don't know.  I wouldn't expect it to ever happen - I have seen it
before, but it was due to a bug in ext3.  Maybe XFS does it
intentionally some times?


It certainly sounds like an odd thing to occur.

Even stranger that it's easier to hit on a degraded array or an array being 
checked.

I *am* using ext3 on this box (and all my boxes in fact)


You could test this theory by putting a
   WARN_ON(cfqq-next_rq == NULL);
at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.


I've done that.. now to wait for it to hit again.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Jens Axboe wrote:
 On Wed, Apr 25 2007, Neil Brown wrote:
  On Tuesday April 24, [EMAIL PROTECTED] wrote:
   [105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, 
   report the issue to 
   [EMAIL PROTECTED]
   [105449.683646] cfq: busy=1,drv=0,timer=0
   [105449.694871] cfq rr_list:
   [105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
   [105449.720693] cfq busy_list:
   [105449.729054] cfq idle_list:
   [105449.737418] cfq cur_rr:
  
  Ok, I have a theory.
  
  An ELEVATOR_FRONT_MERGE occurs which changes req-sector and calls
  -elevator_merged_fn which is cfq_merged_request.
  
  At this time there is already a request in cfq with the same sector
  number, and that request is the only other request on the queue.
  
  cfq_merged_request calls cfq_reposition_rq_rb which removes the
  req from -sortlist and then calls cfq_add_rq_rb to add it back (at
  the new location because -sector has changed).
  
  cfq_add_rq_rb finds there is already a request with the same sector
  number and so elv_rb_add returns an __alias which is passed to
  cfq_dispatch_insert. 
  This calls cfq_remove_request and as that is the only request present,
  -next_rq gets set to NULL.
  The old request with the new sector number is then added to the
  -sortlist, but -next_rq is never set - it remains NULL.
  
  How likely it would be to get two requests with the same sector number
  I don't know.  I wouldn't expect it to ever happen - I have seen it
  before, but it was due to a bug in ext3.  Maybe XFS does it
  intentionally some times?
  
  You could test this theory by putting a
 WARN_ON(cfqq-next_rq == NULL);
  at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.
  
  I will leave the development of a suitable fix up to Jens if he agrees
  that this is possible.
 
 That's pretty close to where I think the problem is (the front merging
 and cfq_reposition_rq_rb()). The issue with that is that you'd only get
 aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
 here. Given that front merges are equally not very likely, I'd be
 surprised is something like that has ever happened.
 
 BUT... That may explain while we are only seeing it on md. Would md
 ever be issuing such requests that trigger this condition?
 
 I'll try and concoct a test case.

I made a test case and reproduced it, this is indeed what is happening.
md must be issuing direct requests in such a manner, that you do get a
front merge and then an alias to an existing request. Will test a fix!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Wednesday April 25, [EMAIL PROTECTED] wrote:
 
 That's pretty close to where I think the problem is (the front merging
 and cfq_reposition_rq_rb()). The issue with that is that you'd only get
 aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
 here. Given that front merges are equally not very likely, I'd be
 surprised is something like that has ever happened.

Well it certainly doesn't happen very often
And I can imagine a filesystem genuinely wanting to read the same
block twice - maybe a block that contained packed tails of two
different files.
 
 BUT... That may explain while we are only seeing it on md. Would md
 ever be issuing such requests that trigger this condition?

Can someone remind me which raid level(s) was/were involved?

I think one was raid0 - that just passes requests on from the
filesystem, so md would only issue requests like that if the
filesystem did.
I guess it could happen with raid4/5/6.  A read request that was
properly aligned (and we do encourage proper alignment) will be passed
directly to the underlying device.  A concurrent write elsewhere could
require the same block to be read into the stripe-cache to enable a
parity calculation.  So you could get two reads at the same block
address.
Getting a front-merge would probably require two stripe-heads to be
processed in reverse-sector order, and it tends to preserve the order
of incoming requests (though that isn't firmly enforced).

raid1 is much like raid0 (with totally different code) in that the
request pattern seen by the underlying device matches the request
pattern generated by the filesystem.

If I read the debugging output correctly, the request which I
hypothesise was the subject of a front-merge is a 'sync' request.
raid5 does not generate sync requests to fill the stripe cache (maybe
it should?) so I really think it must have come directly from the
filesystem.

(just checked previous email for more detail of when it hits)
The fact that it hits degraded arrays more easily is interesting.
Maybe we try to read a block on the missing device and so schedule
reads for the other devices.  Then we try to read a block on a good
device and issue a request for exactly the same block that raid5 asked
for.  That still doesn't explain the 'sync' and the 'front merge'.
But that is quite possible, just not common maybe.

It doesn't help us understand the raid0 example though.  May it is
just a 'can happen, but only rarely' thing.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Neil Brown wrote:
 On Wednesday April 25, [EMAIL PROTECTED] wrote:
  
  That's pretty close to where I think the problem is (the front merging
  and cfq_reposition_rq_rb()). The issue with that is that you'd only get
  aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
  here. Given that front merges are equally not very likely, I'd be
  surprised is something like that has ever happened.
 
 Well it certainly doesn't happen very often
 And I can imagine a filesystem genuinely wanting to read the same
 block twice - maybe a block that contained packed tails of two
 different files.

But that usually happens through the page cache, so it's nicely
serialized. You'd only ever get duplicate/alias blocks if you go raw, or
have some layer on top of your block device (like md) that issues io on
its own.

  BUT... That may explain while we are only seeing it on md. Would md
  ever be issuing such requests that trigger this condition?
 
 Can someone remind me which raid level(s) was/were involved?
 
 I think one was raid0 - that just passes requests on from the
 filesystem, so md would only issue requests like that if the
 filesystem did.

Yep, one was raid0.

 I guess it could happen with raid4/5/6.  A read request that was
 properly aligned (and we do encourage proper alignment) will be passed
 directly to the underlying device.  A concurrent write elsewhere could
 require the same block to be read into the stripe-cache to enable a
 parity calculation.  So you could get two reads at the same block
 address.
 Getting a front-merge would probably require two stripe-heads to be
 processed in reverse-sector order, and it tends to preserve the order
 of incoming requests (though that isn't firmly enforced).
 
 raid1 is much like raid0 (with totally different code) in that the
 request pattern seen by the underlying device matches the request
 pattern generated by the filesystem.
 
 If I read the debugging output correctly, the request which I
 hypothesise was the subject of a front-merge is a 'sync' request.
 raid5 does not generate sync requests to fill the stripe cache (maybe
 it should?) so I really think it must have come directly from the
 filesystem.
 
 (just checked previous email for more detail of when it hits)
 The fact that it hits degraded arrays more easily is interesting.
 Maybe we try to read a block on the missing device and so schedule
 reads for the other devices.  Then we try to read a block on a good
 device and issue a request for exactly the same block that raid5 asked
 for.  That still doesn't explain the 'sync' and the 'front merge'.
 But that is quite possible, just not common maybe.
 
 It doesn't help us understand the raid0 example though.  May it is
 just a 'can happen, but only rarely' thing.

It looks to be extremely rare. Aliases are extremely rare, front merges
are rare. And you need both to happen with the details you outlined. But
it's a large user base, and we've had 3-4 reports on this in the past
months. So it obviously does happen. I could not make it trigger without
doctoring the unplug code when I used aio.

Here's a fix for it, confirmed.

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9e37971..f965be7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -532,6 +532,11 @@ static void cfq_add_rq_rb(struct request *rq)
 
if (!cfq_cfqq_on_rr(cfqq))
cfq_add_cfqq_rr(cfqd, cfqq);
+
+   /*
+* check if this request is a better next-serve candidate
+*/
+   cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq);
 }
 
 static inline void
@@ -1639,12 +1644,6 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue 
*cfqq,
cfqq-meta_pending++;
 
/*
-* check if this request is a better next-serve candidate)) {
-*/
-   cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq);
-   BUG_ON(!cfqq-next_rq);
-
-   /*
 * we never wait for an async request and we don't allow preemption
 * of an async request. so just return early
 */

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:

On Wednesday April 25, [EMAIL PROTECTED] wrote:

BUT... That may explain while we are only seeing it on md. Would md
ever be issuing such requests that trigger this condition?


Can someone remind me which raid level(s) was/were involved?


Raid-5 gegraded here, But I've had it on my server on a fully functioning raid-5 also. Don't think 
I've seen it on any of my raid-6 yet.



I guess it could happen with raid4/5/6.  A read request that was
properly aligned (and we do encourage proper alignment) will be passed
directly to the underlying device.  A concurrent write elsewhere could
require the same block to be read into the stripe-cache to enable a
parity calculation.  So you could get two reads at the same block
address.


Except all the filesystems I've had this occur on are Ext3 and all are mounted noatime, with no 
processes ever writing to the filesystems. So here it occurs with reads only.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Jens Axboe wrote:


It looks to be extremely rare. Aliases are extremely rare, front merges
are rare. And you need both to happen with the details you outlined. But
it's a large user base, and we've had 3-4 reports on this in the past
months. So it obviously does happen. I could not make it trigger without
doctoring the unplug code when I used aio.


Well, not that rare on this particular machine (I had a case where I could reproduce it in less than 
an hour of normal use previously on this box), and I've had it occur a number of times on my 
servers, I just never reported it before as I never took the time to set up a serial console and 
capture the oops.



Here's a fix for it, confirmed.


Shall I leave the other debugging in, apply this and run it for a few hard days?

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:



You could test this theory by putting a
   WARN_ON(cfqq-next_rq == NULL);
at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.




[  756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb()
[  756.329615]  [c0204fa1] cfq_merged_request+0x71/0x80
[  756.345046]  [c0204f30] cfq_merged_request+0x0/0x80
[  756.360216]  [c01fb4ae] elv_merged_request+0x4e/0x50
[  756.375647]  [c01ff1a7] __make_request+0x1a7/0x2f0
[  756.390557]  [c01ff497] generic_make_request+0x127/0x190
[  756.407025]  [c02bb011] chunk_aligned_read+0x111/0x1c0
[  756.422974]  [c01ff497] generic_make_request+0x127/0x190
[  756.439443]  [c02bb448] make_request+0x388/0x3b0
[  756.453834]  [c016e027] __bio_add_page+0x147/0x1c0
[  756.468742]  [c016dbff] bio_alloc_bioset+0x7f/0x150
[  756.483913]  [c02bac50] raid5_mergeable_bvec+0x0/0x90
[  756.499604]  [c016e107] bio_add_page+0x37/0x50
[  756.513473]  [c01ff497] generic_make_request+0x127/0x190
[  756.529943]  [c013586a] mempool_free+0x2a/0x60
[  756.543812]  [c016dadd] bio_free+0x1d/0x40
[  756.556645]  [c01ff546] submit_bio+0x46/0xc0
[  756.571389]  [c020a396] radix_tree_node_alloc+0x16/0x60
[  756.587610]  [c020a5c2] radix_tree_insert+0xe2/0x130
[  756.603039]  [c0139575] __pagevec_lru_add+0x75/0x80
[  756.618207]  [c0171971] mpage_bio_submit+0x11/0x20
[  756.633117]  [c01720e0] mpage_readpages+0x100/0x140
[  756.648287]  [c0187d50] ext3_get_block+0x0/0xe0
[  756.662419]  [c026fc0b] scsi_end_request+0x9b/0xc0
[  756.677329]  [c026dd2b] scsi_delete_timer+0xb/0x20
[  756.692240]  [c0188ae0] ext3_readpages+0x0/0x20
[  756.706369]  [c0138ac1] read_pages+0xd1/0xe0
[  756.719719]  [c0187d50] ext3_get_block+0x0/0xe0
[  756.733853]  [c0136e24] __alloc_pages+0x54/0x2d0
[  756.748242]  [c02704ff] scsi_softirq_done+0x6f/0xe0
[  756.763413]  [c02861a1] sil_interrupt+0x81/0x90
[  756.777544]  [c01ffa78] blk_done_softirq+0x58/0x70
[  756.792454]  [c0138b78] __do_page_cache_readahead+0xa8/0x110
[  756.809961]  [c0138d21] blockable_page_cache_readahead+0x51/0xd0
[  756.828508]  [c0138e14] make_ahead_window+0x74/0xc0
[  756.843679]  [c0138efa] page_cache_readahead+0x9a/0x180
[  756.859886]  [c0133704] file_read_actor+0xc4/0xd0
[  756.874537]  [c0133637] do_generic_mapping_read+0x467/0x470
[  756.891786]  [c0133893] generic_file_aio_read+0x183/0x1c0
[  756.908514]  [c0133640] file_read_actor+0x0/0xd0
[  756.922903]  [c026fb20] scsi_next_command+0x30/0x50
[  756.938073]  [c014e250] do_sync_read+0xc0/0x100
[  756.952204]  [c0124120] autoremove_wake_function+0x0/0x50
[  756.968934]  [c026bf00] scsi_finish_command+0x40/0x60
[  756.984622]  [c03319a5] schedule+0x2e5/0x590
[  756.997972]  [c014e316] vfs_read+0x86/0x110
[  757.011065]  [c014e5f7] sys_read+0x47/0x80
[  757.023897]  [c0102894] syscall_call+0x7/0xb
[  757.037250]  ===


I know this is late given a fix has been issued, but just got this in the logs with no other debug 
info at all. This kernel has *all* Jens debugging patches + the snippet from Neil above.


I'll apply the fix now and run it for a couple of days.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Brad Campbell wrote:
 Jens Axboe wrote:
 
 It looks to be extremely rare. Aliases are extremely rare, front merges
 are rare. And you need both to happen with the details you outlined. But
 it's a large user base, and we've had 3-4 reports on this in the past
 months. So it obviously does happen. I could not make it trigger without
 doctoring the unplug code when I used aio.
 
 Well, not that rare on this particular machine (I had a case where I could 
 reproduce it in less than an hour of normal use previously on this box), 
 and I've had it occur a number of times on my servers, I just never 
 reported it before as I never took the time to set up a serial console and 
 capture the oops.

Extremely rare in the sense that it takes md and some certain conditions
to happen for it to trigger. So for most people it'll be extremely rare,
and for others (such as yourself) that hit it, it wont be so rare :-)

 Here's a fix for it, confirmed.
 
 Shall I leave the other debugging in, apply this and run it for a few hard 
 days?

Yes, that would be perfect!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Wednesday April 25, [EMAIL PROTECTED] wrote:
 
 Here's a fix for it, confirmed.
 

Patch looks good to me.
Hopefully Brad can still wait for the WARN_ON to fire - it might give
useful clues to why this is happening.  It might be interesting.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Jens Axboe

On Wed, Apr 25 2007, Neil Brown wrote:
 On Wednesday April 25, [EMAIL PROTECTED] wrote:
  
  Here's a fix for it, confirmed.
  
 
 Patch looks good to me.

Good! And thanks for taking the time to look at this bug btw, it's had
me puzzled a bit this week. I should not have ruled out the aliases I
guess, they must be coming elsewhere than from O_DIRECT.

 Hopefully Brad can still wait for the WARN_ON to fire - it might give
 useful clues to why this is happening.  It might be interesting.

But he did and reported it about an hour ago, so all should be well.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Neil Brown

On Wednesday April 25, [EMAIL PROTECTED] wrote:
 
 [  756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb()
 [  756.329615]  [c0204fa1] cfq_merged_request+0x71/0x80
 [  756.345046]  [c0204f30] cfq_merged_request+0x0/0x80
 [  756.360216]  [c01fb4ae] elv_merged_request+0x4e/0x50
 [  756.375647]  [c01ff1a7] __make_request+0x1a7/0x2f0
 [  756.390557]  [c01ff497] generic_make_request+0x127/0x190
 [  756.407025]  [c02bb011] chunk_aligned_read+0x111/0x1c0
   ^
 [  756.422974]  [c01ff497] generic_make_request+0x127/0x190
 [  756.439443]  [c02bb448] make_request+0x388/0x3b0
 [  756.453834]  [c016e027] __bio_add_page+0x147/0x1c0
 [  756.468742]  [c016dbff] bio_alloc_bioset+0x7f/0x150
 [  756.483913]  [c02bac50] raid5_mergeable_bvec+0x0/0x90
 [  756.499604]  [c016e107] bio_add_page+0x37/0x50
 [  756.513473]  [c01ff497] generic_make_request+0x127/0x190
 [  756.529943]  [c013586a] mempool_free+0x2a/0x60
 [  756.543812]  [c016dadd] bio_free+0x1d/0x40
 [  756.556645]  [c01ff546] submit_bio+0x46/0xc0
 [  756.571389]  [c020a396] radix_tree_node_alloc+0x16/0x60
 [  756.587610]  [c020a5c2] radix_tree_insert+0xe2/0x130
 [  756.603039]  [c0139575] __pagevec_lru_add+0x75/0x80
 [  756.618207]  [c0171971] mpage_bio_submit+0x11/0x20
 [  756.633117]  [c01720e0] mpage_readpages+0x100/0x140
   ^^^

So it is a regular data read (not metadata of any sort) which is
by-passing the stripe cache.  I was expecting some sort of metadata
read.
I wander what this is merging in-front of... maybe a stripe-cache
read.
I guess it doesn't tell us anything really interesting, which is
probably a good thing - it means everything else is working properly.

I wonder if we should avoid bypassing the stripe cache if the needed stripes
are already in the cache... or if at least one needed stripe is or
if the array is degraded...
Probably in the degraded case we should never bypass the cache, as if
we do, then a sequential read of a full stripe will read every block
twice.  I'd better to some performance measurements.

Thanks for all the patient testing.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Degraded RAID performance - Was : Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Brad Campbell


Neil Brown wrote:

I wonder if we should avoid bypassing the stripe cache if the needed stripes
are already in the cache... or if at least one needed stripe is or
if the array is degraded...
Probably in the degraded case we should never bypass the cache, as if
we do, then a sequential read of a full stripe will read every block
twice.  I'd better to some performance measurements.


Ok, that would explain some odd performance issues I've noticed.
Let's say I run

dstat -D sda,sdb,sdc,sdd,md0 5
total-cpu-usage --disk/sdadisk/sdbdisk/sdcdisk/sdddisk/md0- -net/total- 
---paging-- ---system--
usr sys idl wai hiq siq|_read write _read write _read write _read write _read write|_recv 
_send|__in_ _out_|_int_ _csw_
 25  22   0  47   0   6|20.1M0 :20.2M0 :20.1M0 :   0 0 :40.2M0 | 146B  662B| 
0 0 |1186   661
 26  20   0  46   0   8|19.4M0 :19.4M0 :19.4M0 :   0 0 :38.9M0 | 160B  549B| 
0 0 |1365   650


Given I'm doing a read, I would have expected a read to consist of 2 direct reads, one parity read 
and some calculation. The numbers I'm seeing however show 3 reads for 2 reads worth of bandwidth.


[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda[0] sdc[2] sdb[1]
  585934080 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

(Dropped Jens and Chuck from the cc as this likely has little interest for them)

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-25 Thread Roland Kuhn


Hi Jens!

On 25 Apr 2007, at 12:18, Jens Axboe wrote:


On Wed, Apr 25 2007, Brad Campbell wrote:

Jens Axboe wrote:

It looks to be extremely rare. Aliases are extremely rare, front  
merges
are rare. And you need both to happen with the details you  
outlined. But
it's a large user base, and we've had 3-4 reports on this in the  
past
months. So it obviously does happen. I could not make it trigger  
without

doctoring the unplug code when I used aio.


Well, not that rare on this particular machine (I had a case where  
I could
reproduce it in less than an hour of normal use previously on this  
box),

and I've had it occur a number of times on my servers, I just never
reported it before as I never took the time to set up a serial  
console and

capture the oops.


Extremely rare in the sense that it takes md and some certain  
conditions
to happen for it to trigger. So for most people it'll be extremely  
rare,

and for others (such as yourself) that hit it, it wont be so rare :-)


Here's a fix for it, confirmed.


Shall I leave the other debugging in, apply this and run it for a  
few hard

days?


Yes, that would be perfect!

Okay, I left all debugging patches in, disabled all kernel  
debugging .config stuff and gave it a spin with our usual killer  
workload (as far as batch systems are repeatable anyway) and so far  
there was not a single glitch or message, so I preliminarily conclude  
that the bug is squashed. The final word will come once my 1800 batch  
jobs are processed and I have my particle physics result ;-)


Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GS/CS/M/MU d-(++) s:+ a- C+++ UL P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+

PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++ h y+++
--END GEEK CODE BLOCK--




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Brad Campbell


Jens Axboe wrote:


Ok, can you try and reproduce with this one applied? It'll keep the
system running (unless there are other corruptions going on), so it
should help you a bit as well. It will dump some cfq state info when the
condition triggers that can perhaps help diagnose this. So if you can
apply this patch and reproduce + send the output, I'd much appreciate
it!



I think I'm wearing holes in my platters. This is being a swine to hit, but I 
finally got some..
It seems to respond to a combination of high cpu usage and relatively high disk 
utilisation.

I tried all sorts of acrobatics with multiple readers and hammering the array while reading from 
individual drives..


The only reliable way I can reproduce this seems to be on a degraded array running a "while true ; 
do md5sum -c md5 ; done" on about a 180GB directory of files. It is taking anywhere from 4 to 96 
hours to hit it though.. but at least it's reproducible.



[105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[105449.683646] cfq: busy=1,drv=0,timer=0
[105449.694871] cfq rr_list:
[105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
[105449.720693] cfq busy_list:
[105449.729054] cfq idle_list:
[105449.737418] cfq cur_rr:
[115435.022192] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[115435.052160] cfq: busy=1,drv=0,timer=0
[115435.063383] cfq rr_list:
[115435.071227]   3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
[115435.089205] cfq busy_list:
[115435.097566] cfq idle_list:
[115435.105930] cfq cur_rr:
[115616.651883] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[115616.681848] cfq: busy=1,drv=0,timer=0
[115616.693071] cfq rr_list:
[115616.700916]   3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=61
[115616.718893] cfq busy_list:
[115616.727253] cfq idle_list:
[115616.735617] cfq cur_rr:
[119679.564753] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[119679.594732] cfq: busy=1,drv=0,timer=0
[119679.605955] cfq rr_list:
[119679.613799]   3241: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
[119679.631778] cfq busy_list:
[119679.640136] cfq idle_list:
[119679.648502] cfq cur_rr:

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Jens Axboe

On Tue, Apr 24 2007, Roland Kuhn wrote:
> Hi Jens!
> 
> On 24 Apr 2007, at 14:32, Jens Axboe wrote:
> 
> >On Tue, Apr 24 2007, Roland Kuhn wrote:
> >>Hi Jens!
> >>
> >>[I made a typo in the Cc: list so that lkml is only included as of
> >>now. Actually I copied the typo from you ;-) ]
> >
> >Well no, you started the typo, I merely propagated it and forgot to  
> >fix
> >it up :-)
> >
> Actually, I copied it from your printk() ;-) (thinking helps...)

Ahhh! Yeah that one indeed has a typo, tsk tsk.

> >Sure. You might want to include NFS file access into your tests,
> >since we've not triggered this with locally accessing the disks.
> >BTW:
> 
> How are you exporting the directory (what exports options) - how
> is it
> mounted by the client(s)? What chunksize is your raid6 using?
> >>>
> >>>And what are the nature of the files on the raid (huge, small, ?)  
> >>>and
> >>>what are the client(s) doing? Just approximately, I know these  
> >>>things
> >>>can be hard/difficult/impossible to specify.
> >>>
> >>The files are 100-400MB in size and the client is merging them into a
> >>new file in the same directory using the ROOT library, which does in
> >>essence alternating sequences of
> >>
> >>_llseek(somewhere)
> >>read(n bytes)
> >>_llseek(somewhere+n)
> >>read(m bytes)
> >>...
> >>
> >>and then
> >>
> >>_llseek(somewhere)
> >>rt_sigaction(ignore INT)
> >>write(n bytes)
> >>rt_sigaction(INT->DFL)
> >>time()
> >>_llseek(somewhere+n)
> >>...
> >>
> >>where n is of the the order of 30kB. The input files are treated
> >>sequentially, not randomly.
> >
> >Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.
> >
> Too bad.
> 
> >>BTW: the machine just stopped dead, no sign whatsoever on console or
> >>netconsole, so I rebooted with elevator=deadline
> >>(need to get some work done besides ;-) )
> >
> >Unfortunately expected, if we can race and lose an update to - 
> >>next_rq,
> >we can race and corrupt some of the internal data structures as  
> >well. If
> >you have the time and inclination, it would be interesting to see  
> >if you
> >can reproduce with some debugging options enabled:
> >
> >- Enable all preempt, spinlock and lockdep debugging measures
> >- Possibly slab poisoning, although that may slow you down somewhat
> >
> Kernel compilation under way, will report back.

Thanks!

> >Are you using 4kb stacks?
> >
> No idea, 'grep -i stack .config' gives no indication, but ISTR that  
> 4k was made the default some time back?

You are on x86-64, my mistake.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Roland Kuhn


Hi Jens!

On 24 Apr 2007, at 14:32, Jens Axboe wrote:


On Tue, Apr 24 2007, Roland Kuhn wrote:

Hi Jens!

[I made a typo in the Cc: list so that lkml is only included as of
now. Actually I copied the typo from you ;-) ]


Well no, you started the typo, I merely propagated it and forgot to  
fix

it up :-)


Actually, I copied it from your printk() ;-) (thinking helps...)


Sure. You might want to include NFS file access into your tests,
since we've not triggered this with locally accessing the disks.
BTW:


How are you exporting the directory (what exports options) - how
is it
mounted by the client(s)? What chunksize is your raid6 using?


And what are the nature of the files on the raid (huge, small, ?)  
and
what are the client(s) doing? Just approximately, I know these  
things

can be hard/difficult/impossible to specify.


The files are 100-400MB in size and the client is merging them into a
new file in the same directory using the ROOT library, which does in
essence alternating sequences of

_llseek(somewhere)
read(n bytes)
_llseek(somewhere+n)
read(m bytes)
...

and then

_llseek(somewhere)
rt_sigaction(ignore INT)
write(n bytes)
rt_sigaction(INT->DFL)
time()
_llseek(somewhere+n)
...

where n is of the the order of 30kB. The input files are treated
sequentially, not randomly.


Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.


Too bad.


BTW: the machine just stopped dead, no sign whatsoever on console or
netconsole, so I rebooted with elevator=deadline
(need to get some work done besides ;-) )


Unfortunately expected, if we can race and lose an update to - 
>next_rq,
we can race and corrupt some of the internal data structures as  
well. If
you have the time and inclination, it would be interesting to see  
if you

can reproduce with some debugging options enabled:

- Enable all preempt, spinlock and lockdep debugging measures
- Possibly slab poisoning, although that may slow you down somewhat


Kernel compilation under way, will report back.


Are you using 4kb stacks?

No idea, 'grep -i stack .config' gives no indication, but ISTR that  
4k was made the default some time back?


Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+

PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++> h y+++
--END GEEK CODE BLOCK--




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Jens Axboe

On Tue, Apr 24 2007, Roland Kuhn wrote:
> Hi Jens!
> 
> [I made a typo in the Cc: list so that lkml is only included as of  
> now. Actually I copied the typo from you ;-) ]

Well no, you started the typo, I merely propagated it and forgot to fix
it up :-)

> >On Tue, Apr 24 2007, Jens Axboe wrote:
> >>On Tue, Apr 24 2007, Roland Kuhn wrote:
> >>>Hi Jens!
> >>>
> >>>On 24 Apr 2007, at 11:18, Jens Axboe wrote:
> >>>
> On Tue, Apr 24 2007, Roland Kuhn wrote:
> >Hi Jens!
> >
> >We're using a custom built fileserver (dual core Athlon64, using
> >x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2
> >(9.1GB ext3) after a hardware incident (cable pulled on one  
> >disk) the
> >machine would reliably oops while serving some large files over
> >NFSv3. The oops message scrolled partly off the screen, but the IP
> >was in cfq_dispatch_insert, so I tried your debug patch from
> >yesterday with 2.6.21-rc7. I used netconsole for capturing the  
> >output
> >(which works nicely, thanks Matt!) and as usual the condition
> >triggered after about half a minute, this with the following  
> >printout
> >instead of crashing (still works fine):
> >
> >cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report  
> >the
> >issue to [EMAIL PROTECTED]
> >cfq: busy=1,drv=1,timer=0
> >cfq rr_list:
> >cfq busy_list:
> > 4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
> >cfq idle_list:
> >cfq cur_rr:
> >cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report  
> >the
> >issue to [EMAIL PROTECTED]
> >cfq: busy=1,drv=1,timer=0
> >cfq rr_list:
> >cfq busy_list:
> > 4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
> >cfq idle_list:
> >cfq cur_rr:
> >
> >There was no backtrace, so the only thing I can tell is that  
> >for the
> >previous crashes some nfs threads were always involved, only  
> >once did
> >it happen inside an interrupt handler (with the "aieee" kind of
> >message).
> >
> >If you want me to try something else, don't hesitate to ask!
> 
> Nifty, great that you can reproduce so quickly. I'll try a 3-drive
> raid6
> here and see if read activity along with a resync will trigger
> anything.
> If that doesn't work for me, I'll provide you with a more extensive
> debug patch (if you don't mind).
> 
> >>>Sure. You might want to include NFS file access into your tests,
> >>>since we've not triggered this with locally accessing the disks.  
> >>>BTW:
> >>
> >>How are you exporting the directory (what exports options) - how  
> >>is it
> >>mounted by the client(s)? What chunksize is your raid6 using?
> >
> >And what are the nature of the files on the raid (huge, small, ?) and
> >what are the client(s) doing? Just approximately, I know these things
> >can be hard/difficult/impossible to specify.
> >
> The files are 100-400MB in size and the client is merging them into a  
> new file in the same directory using the ROOT library, which does in  
> essence alternating sequences of
> 
> _llseek(somewhere)
> read(n bytes)
> _llseek(somewhere+n)
> read(m bytes)
> ...
> 
> and then
> 
> _llseek(somewhere)
> rt_sigaction(ignore INT)
> write(n bytes)
> rt_sigaction(INT->DFL)
> time()
> _llseek(somewhere+n)
> ...
> 
> where n is of the the order of 30kB. The input files are treated  
> sequentially, not randomly.

Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.

> BTW: the machine just stopped dead, no sign whatsoever on console or  
> netconsole, so I rebooted with elevator=deadline
> (need to get some work done besides ;-) )

Unfortunately expected, if we can race and lose an update to ->next_rq,
we can race and corrupt some of the internal data structures as well. If
you have the time and inclination, it would be interesting to see if you
can reproduce with some debugging options enabled:

- Enable all preempt, spinlock and lockdep debugging measures
- Possibly slab poisoning, although that may slow you down somewhat

Are you using 4kb stacks?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Roland Kuhn


Hi Jens!

[I made a typo in the Cc: list so that lkml is only included as of  
now. Actually I copied the typo from you ;-) ]


On 24 Apr 2007, at 11:40, Jens Axboe wrote:


On Tue, Apr 24 2007, Jens Axboe wrote:

On Tue, Apr 24 2007, Roland Kuhn wrote:

Hi Jens!

On 24 Apr 2007, at 11:18, Jens Axboe wrote:


On Tue, Apr 24 2007, Roland Kuhn wrote:

Hi Jens!

We're using a custom built fileserver (dual core Athlon64, using
x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2
(9.1GB ext3) after a hardware incident (cable pulled on one  
disk) the

machine would reliably oops while serving some large files over
NFSv3. The oops message scrolled partly off the screen, but the IP
was in cfq_dispatch_insert, so I tried your debug patch from
yesterday with 2.6.21-rc7. I used netconsole for capturing the  
output

(which works nicely, thanks Matt!) and as usual the condition
triggered after about half a minute, this with the following  
printout

instead of crashing (still works fine):

cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report  
the

issue to [EMAIL PROTECTED]
cfq: busy=1,drv=1,timer=0
cfq rr_list:
cfq busy_list:
 4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
cfq idle_list:
cfq cur_rr:
cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report  
the

issue to [EMAIL PROTECTED]
cfq: busy=1,drv=1,timer=0
cfq rr_list:
cfq busy_list:
 4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
cfq idle_list:
cfq cur_rr:

There was no backtrace, so the only thing I can tell is that  
for the
previous crashes some nfs threads were always involved, only  
once did

it happen inside an interrupt handler (with the "aieee" kind of
message).

If you want me to try something else, don't hesitate to ask!


Nifty, great that you can reproduce so quickly. I'll try a 3-drive
raid6
here and see if read activity along with a resync will trigger
anything.
If that doesn't work for me, I'll provide you with a more extensive
debug patch (if you don't mind).


Sure. You might want to include NFS file access into your tests,
since we've not triggered this with locally accessing the disks.  
BTW:


How are you exporting the directory (what exports options) - how  
is it

mounted by the client(s)? What chunksize is your raid6 using?


And what are the nature of the files on the raid (huge, small, ?) and
what are the client(s) doing? Just approximately, I know these things
can be hard/difficult/impossible to specify.

The files are 100-400MB in size and the client is merging them into a  
new file in the same directory using the ROOT library, which does in  
essence alternating sequences of


_llseek(somewhere)
read(n bytes)
_llseek(somewhere+n)
read(m bytes)
...

and then

_llseek(somewhere)
rt_sigaction(ignore INT)
write(n bytes)
rt_sigaction(INT->DFL)
time()
_llseek(somewhere+n)
...

where n is of the the order of 30kB. The input files are treated  
sequentially, not randomly.


BTW: the machine just stopped dead, no sign whatsoever on console or  
netconsole, so I rebooted with elevator=deadline

(need to get some work done besides ;-) )

Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+

PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++> h y+++
--END GEEK CODE BLOCK--




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Roland Kuhn


Hi Jens!

[I made a typo in the Cc: list so that lkml is only included as of  
now. Actually I copied the typo from you ;-) ]


On 24 Apr 2007, at 11:40, Jens Axboe wrote:


On Tue, Apr 24 2007, Jens Axboe wrote:

On Tue, Apr 24 2007, Roland Kuhn wrote:

Hi Jens!

On 24 Apr 2007, at 11:18, Jens Axboe wrote:


On Tue, Apr 24 2007, Roland Kuhn wrote:

Hi Jens!

We're using a custom built fileserver (dual core Athlon64, using
x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2
(9.1GB ext3) after a hardware incident (cable pulled on one  
disk) the

machine would reliably oops while serving some large files over
NFSv3. The oops message scrolled partly off the screen, but the IP
was in cfq_dispatch_insert, so I tried your debug patch from
yesterday with 2.6.21-rc7. I used netconsole for capturing the  
output

(which works nicely, thanks Matt!) and as usual the condition
triggered after about half a minute, this with the following  
printout

instead of crashing (still works fine):

cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report  
the

issue to [EMAIL PROTECTED]
cfq: busy=1,drv=1,timer=0
cfq rr_list:
cfq busy_list:
 4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
cfq idle_list:
cfq cur_rr:
cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report  
the

issue to [EMAIL PROTECTED]
cfq: busy=1,drv=1,timer=0
cfq rr_list:
cfq busy_list:
 4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
cfq idle_list:
cfq cur_rr:

There was no backtrace, so the only thing I can tell is that  
for the
previous crashes some nfs threads were always involved, only  
once did

it happen inside an interrupt handler (with the aieee kind of
message).

If you want me to try something else, don't hesitate to ask!


Nifty, great that you can reproduce so quickly. I'll try a 3-drive
raid6
here and see if read activity along with a resync will trigger
anything.
If that doesn't work for me, I'll provide you with a more extensive
debug patch (if you don't mind).


Sure. You might want to include NFS file access into your tests,
since we've not triggered this with locally accessing the disks.  
BTW:


How are you exporting the directory (what exports options) - how  
is it

mounted by the client(s)? What chunksize is your raid6 using?


And what are the nature of the files on the raid (huge, small, ?) and
what are the client(s) doing? Just approximately, I know these things
can be hard/difficult/impossible to specify.

The files are 100-400MB in size and the client is merging them into a  
new file in the same directory using the ROOT library, which does in  
essence alternating sequences of


_llseek(somewhere)
read(n bytes)
_llseek(somewhere+n)
read(m bytes)
...

and then

_llseek(somewhere)
rt_sigaction(ignore INT)
write(n bytes)
rt_sigaction(INT-DFL)
time()
_llseek(somewhere+n)
...

where n is of the the order of 30kB. The input files are treated  
sequentially, not randomly.


BTW: the machine just stopped dead, no sign whatsoever on console or  
netconsole, so I rebooted with elevator=deadline

(need to get some work done besides ;-) )

Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GS/CS/M/MU d-(++) s:+ a- C+++ UL P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+

PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++ h y+++
--END GEEK CODE BLOCK--




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Jens Axboe

On Tue, Apr 24 2007, Roland Kuhn wrote:
 Hi Jens!
 
 [I made a typo in the Cc: list so that lkml is only included as of  
 now. Actually I copied the typo from you ;-) ]

Well no, you started the typo, I merely propagated it and forgot to fix
it up :-)

 On Tue, Apr 24 2007, Jens Axboe wrote:
 On Tue, Apr 24 2007, Roland Kuhn wrote:
 Hi Jens!
 
 On 24 Apr 2007, at 11:18, Jens Axboe wrote:
 
 On Tue, Apr 24 2007, Roland Kuhn wrote:
 Hi Jens!
 
 We're using a custom built fileserver (dual core Athlon64, using
 x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2
 (9.1GB ext3) after a hardware incident (cable pulled on one  
 disk) the
 machine would reliably oops while serving some large files over
 NFSv3. The oops message scrolled partly off the screen, but the IP
 was in cfq_dispatch_insert, so I tried your debug patch from
 yesterday with 2.6.21-rc7. I used netconsole for capturing the  
 output
 (which works nicely, thanks Matt!) and as usual the condition
 triggered after about half a minute, this with the following  
 printout
 instead of crashing (still works fine):
 
 cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report  
 the
 issue to [EMAIL PROTECTED]
 cfq: busy=1,drv=1,timer=0
 cfq rr_list:
 cfq busy_list:
  4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
 cfq idle_list:
 cfq cur_rr:
 cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report  
 the
 issue to [EMAIL PROTECTED]
 cfq: busy=1,drv=1,timer=0
 cfq rr_list:
 cfq busy_list:
  4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221
 cfq idle_list:
 cfq cur_rr:
 
 There was no backtrace, so the only thing I can tell is that  
 for the
 previous crashes some nfs threads were always involved, only  
 once did
 it happen inside an interrupt handler (with the aieee kind of
 message).
 
 If you want me to try something else, don't hesitate to ask!
 
 Nifty, great that you can reproduce so quickly. I'll try a 3-drive
 raid6
 here and see if read activity along with a resync will trigger
 anything.
 If that doesn't work for me, I'll provide you with a more extensive
 debug patch (if you don't mind).
 
 Sure. You might want to include NFS file access into your tests,
 since we've not triggered this with locally accessing the disks.  
 BTW:
 
 How are you exporting the directory (what exports options) - how  
 is it
 mounted by the client(s)? What chunksize is your raid6 using?
 
 And what are the nature of the files on the raid (huge, small, ?) and
 what are the client(s) doing? Just approximately, I know these things
 can be hard/difficult/impossible to specify.
 
 The files are 100-400MB in size and the client is merging them into a  
 new file in the same directory using the ROOT library, which does in  
 essence alternating sequences of
 
 _llseek(somewhere)
 read(n bytes)
 _llseek(somewhere+n)
 read(m bytes)
 ...
 
 and then
 
 _llseek(somewhere)
 rt_sigaction(ignore INT)
 write(n bytes)
 rt_sigaction(INT-DFL)
 time()
 _llseek(somewhere+n)
 ...
 
 where n is of the the order of 30kB. The input files are treated  
 sequentially, not randomly.

Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.

 BTW: the machine just stopped dead, no sign whatsoever on console or  
 netconsole, so I rebooted with elevator=deadline
 (need to get some work done besides ;-) )

Unfortunately expected, if we can race and lose an update to -next_rq,
we can race and corrupt some of the internal data structures as well. If
you have the time and inclination, it would be interesting to see if you
can reproduce with some debugging options enabled:

- Enable all preempt, spinlock and lockdep debugging measures
- Possibly slab poisoning, although that may slow you down somewhat

Are you using 4kb stacks?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Roland Kuhn


Hi Jens!

On 24 Apr 2007, at 14:32, Jens Axboe wrote:


On Tue, Apr 24 2007, Roland Kuhn wrote:

Hi Jens!

[I made a typo in the Cc: list so that lkml is only included as of
now. Actually I copied the typo from you ;-) ]


Well no, you started the typo, I merely propagated it and forgot to  
fix

it up :-)


Actually, I copied it from your printk() ;-) (thinking helps...)


Sure. You might want to include NFS file access into your tests,
since we've not triggered this with locally accessing the disks.
BTW:


How are you exporting the directory (what exports options) - how
is it
mounted by the client(s)? What chunksize is your raid6 using?


And what are the nature of the files on the raid (huge, small, ?)  
and
what are the client(s) doing? Just approximately, I know these  
things

can be hard/difficult/impossible to specify.


The files are 100-400MB in size and the client is merging them into a
new file in the same directory using the ROOT library, which does in
essence alternating sequences of

_llseek(somewhere)
read(n bytes)
_llseek(somewhere+n)
read(m bytes)
...

and then

_llseek(somewhere)
rt_sigaction(ignore INT)
write(n bytes)
rt_sigaction(INT-DFL)
time()
_llseek(somewhere+n)
...

where n is of the the order of 30kB. The input files are treated
sequentially, not randomly.


Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.


Too bad.


BTW: the machine just stopped dead, no sign whatsoever on console or
netconsole, so I rebooted with elevator=deadline
(need to get some work done besides ;-) )


Unfortunately expected, if we can race and lose an update to - 
next_rq,
we can race and corrupt some of the internal data structures as  
well. If
you have the time and inclination, it would be interesting to see  
if you

can reproduce with some debugging options enabled:

- Enable all preempt, spinlock and lockdep debugging measures
- Possibly slab poisoning, although that may slow you down somewhat


Kernel compilation under way, will report back.


Are you using 4kb stacks?

No idea, 'grep -i stack .config' gives no indication, but ISTR that  
4k was made the default some time back?


Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GS/CS/M/MU d-(++) s:+ a- C+++ UL P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+

PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++ h y+++
--END GEEK CODE BLOCK--




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Jens Axboe

On Tue, Apr 24 2007, Roland Kuhn wrote:
 Hi Jens!
 
 On 24 Apr 2007, at 14:32, Jens Axboe wrote:
 
 On Tue, Apr 24 2007, Roland Kuhn wrote:
 Hi Jens!
 
 [I made a typo in the Cc: list so that lkml is only included as of
 now. Actually I copied the typo from you ;-) ]
 
 Well no, you started the typo, I merely propagated it and forgot to  
 fix
 it up :-)
 
 Actually, I copied it from your printk() ;-) (thinking helps...)

Ahhh! Yeah that one indeed has a typo, tsk tsk.

 Sure. You might want to include NFS file access into your tests,
 since we've not triggered this with locally accessing the disks.
 BTW:
 
 How are you exporting the directory (what exports options) - how
 is it
 mounted by the client(s)? What chunksize is your raid6 using?
 
 And what are the nature of the files on the raid (huge, small, ?)  
 and
 what are the client(s) doing? Just approximately, I know these  
 things
 can be hard/difficult/impossible to specify.
 
 The files are 100-400MB in size and the client is merging them into a
 new file in the same directory using the ROOT library, which does in
 essence alternating sequences of
 
 _llseek(somewhere)
 read(n bytes)
 _llseek(somewhere+n)
 read(m bytes)
 ...
 
 and then
 
 _llseek(somewhere)
 rt_sigaction(ignore INT)
 write(n bytes)
 rt_sigaction(INT-DFL)
 time()
 _llseek(somewhere+n)
 ...
 
 where n is of the the order of 30kB. The input files are treated
 sequentially, not randomly.
 
 Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.
 
 Too bad.
 
 BTW: the machine just stopped dead, no sign whatsoever on console or
 netconsole, so I rebooted with elevator=deadline
 (need to get some work done besides ;-) )
 
 Unfortunately expected, if we can race and lose an update to - 
 next_rq,
 we can race and corrupt some of the internal data structures as  
 well. If
 you have the time and inclination, it would be interesting to see  
 if you
 can reproduce with some debugging options enabled:
 
 - Enable all preempt, spinlock and lockdep debugging measures
 - Possibly slab poisoning, although that may slow you down somewhat
 
 Kernel compilation under way, will report back.

Thanks!

 Are you using 4kb stacks?
 
 No idea, 'grep -i stack .config' gives no indication, but ISTR that  
 4k was made the default some time back?

You are on x86-64, my mistake.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-24 Thread Brad Campbell


Jens Axboe wrote:


Ok, can you try and reproduce with this one applied? It'll keep the
system running (unless there are other corruptions going on), so it
should help you a bit as well. It will dump some cfq state info when the
condition triggers that can perhaps help diagnose this. So if you can
apply this patch and reproduce + send the output, I'd much appreciate
it!



I think I'm wearing holes in my platters. This is being a swine to hit, but I 
finally got some..
It seems to respond to a combination of high cpu usage and relatively high disk 
utilisation.

I tried all sorts of acrobatics with multiple readers and hammering the array while reading from 
individual drives..


The only reliable way I can reproduce this seems to be on a degraded array running a while true ; 
do md5sum -c md5 ; done on about a 180GB directory of files. It is taking anywhere from 4 to 96 
hours to hit it though.. but at least it's reproducible.



[105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[105449.683646] cfq: busy=1,drv=0,timer=0
[105449.694871] cfq rr_list:
[105449.702715]   3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
[105449.720693] cfq busy_list:
[105449.729054] cfq idle_list:
[105449.737418] cfq cur_rr:
[115435.022192] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[115435.052160] cfq: busy=1,drv=0,timer=0
[115435.063383] cfq rr_list:
[115435.071227]   3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
[115435.089205] cfq busy_list:
[115435.097566] cfq idle_list:
[115435.105930] cfq cur_rr:
[115616.651883] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[115616.681848] cfq: busy=1,drv=0,timer=0
[115616.693071] cfq rr_list:
[115616.700916]   3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=61
[115616.718893] cfq busy_list:
[115616.727253] cfq idle_list:
[115616.735617] cfq cur_rr:
[119679.564753] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to 
[EMAIL PROTECTED]

[119679.594732] cfq: busy=1,drv=0,timer=0
[119679.605955] cfq rr_list:
[119679.613799]   3241: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69
[119679.631778] cfq busy_list:
[119679.640136] cfq idle_list:
[119679.648502] cfq cur_rr:

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-23 Thread Jens Axboe

On Sun, Apr 22 2007, Brad Campbell wrote:
> Jens Axboe wrote:
> >
> >Thanks for testing Brad, be sure to use the next patch I sent instead.
> >The one from this mail shouldn't even get you booted. So double check
> >that you are still using CFQ :-)
> >
> 
> [184901.576773] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 005c
> [184901.602612]  printing eip:
> [184901.610990] c0205399
> [184901.617796] *pde = 
> [184901.626421] Oops:  [#1]
> [184901.635044] Modules linked in:
> [184901.644500] CPU:0
> [184901.644501] EIP:0060:[]Not tainted VLI
> [184901.644503] EFLAGS: 00010082   (2.6.21-rc7 #7)
> [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70
> [184901.696168] eax: f7f078e0   ebx: f7ca2794   ecx: 0004   edx: 
> 
> [184901.716743] esi: c1acaa1c   edi: f7c9c6c0   ebp:    esp: 
> dbaefde0
> [184901.737316] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 
> task.ti=dbaee000)
> [184901.777422] Stack:  c1acaa1c f7c9c6c0  c0205509 
> e6b61bd8 c0133451 1000
> [184901.803121]0008  0004 0e713800 c1acaa1c 
> f7c9c6c0 c1acaa1c 
> [184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 
> c01fb829  c016ea8d
> [184901.854552] Call Trace:
> [184901.862723]  [] __cfq_dispatch_requests+0x79/0x170
> [184901.879971]  [] do_generic_mapping_read+0x281/0x470
> [184901.897478]  [] cfq_dispatch_requests+0x69/0x90
> [184901.913946]  [] elv_next_request+0x39/0x130
> [184901.929375]  [] bio_endio+0x5d/0x90
> [184901.942725]  [] scsi_request_fn+0x45/0x280
> [184901.957896]  [] blk_run_queue+0x32/0x70
> [184901.972286]  [] scsi_next_command+0x30/0x50
> [184901.987716]  [] scsi_end_request+0x9b/0xc0
> [184902.002886]  [] scsi_io_completion+0x81/0x330
> [184902.018835]  [] scsi_delete_timer+0xb/0x20
> [184902.034006]  [] ata_scsi_qc_complete+0x65/0xd0
> [184902.050214]  [] sd_rw_intr+0x8b/0x220
> [184902.064085]  [] ata_altstatus+0x1c/0x20
> [184902.078475]  [] ata_hsm_move+0x14d/0x3f0
> [184902.093126]  [] scsi_finish_command+0x40/0x60
> [184902.109075]  [] scsi_softirq_done+0x6f/0xe0
> [184902.124506]  [] sil_interrupt+0x81/0x90
> [184902.138895]  [] blk_done_softirq+0x58/0x70
> [184902.154066]  [] __do_softirq+0x6f/0x80
> [184902.181806]  [] do_IRQ+0x3e/0x80
> [184902.194380]  [] common_interrupt+0x23/0x28
> [184902.209551]  ===
> [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 
> 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 
> <8b> 72 5c 8b 78
> 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80
> [184902.280564] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP 
> 0068:dbaefde0
> [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt
> [184902.322746] Rebooting in 60 seconds..
> 
> Ok, it's taken be _ages_ to get the system to a point I can reproduce this, 
> but I think it's now reproducible with a couple of hours beating. The bad 
> news is it looks like it has not tickled any of your debugging markers! 
> This was the 1st thing printed on a clean serial console, so nothing above 
> that for days.
> 
> I did double check and I was/am certainly running the kernel with the debug 
> patch compiled in.

Ok, can you try and reproduce with this one applied? It'll keep the
system running (unless there are other corruptions going on), so it
should help you a bit as well. It will dump some cfq state info when the
condition triggers that can perhaps help diagnose this. So if you can
apply this patch and reproduce + send the output, I'd much appreciate
it!

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..2aba928 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -947,6 +947,36 @@ keep_queue:
return cfqq;
 }
 
+static void cfq_dump_queue(struct cfq_queue *cfqq)
+{
+   printk("  %d: sort=%d,next=%p,q=%d/%d,a=%d/%d,d=%d/%d,f=%x\n", 
cfqq->key, RB_EMPTY_ROOT(>sort_list), cfqq->next_rq, cfqq->queued[0], 
cfqq->queued[1], cfqq->allocated[0], cfqq->allocated[1], cfqq->on_dispatch[0], 
cfqq->on_dispatch[1], cfqq->flags);
+}
+
+static void cfq_dump_state(struct cfq_data *cfqd)
+{
+   struct cfq_queue *cfqq;
+   int i;
+
+   printk("cfq: busy=%d,drv=%d,timer=%d\n", cfqd->busy_queues, 
cfqd->rq_in_driver, timer_pending(>idle_slice_timer));
+
+   printk("cfq rr_list:\n");
+   for (i = 0; i < CFQ_PRIO_LISTS; i++)
+   list_for_each_entry(cfqq, >rr_list[i], cfq_list)
+   cfq_dump_queue(cfqq);
+
+   printk("cfq busy_list:\n");
+   list_for_each_entry(cfqq, >busy_rr, cfq_list)
+   cfq_dump_queue(cfqq);
+
+   printk("cfq idle_list:\n");
+   list_for_each_entry(cfqq, >idle_rr, cfq_list)
+   cfq_dump_queue(cfqq);
+
+   printk("cfq cur_rr:\n");
+   list_for_each_entry(cfqq, >cur_rr, cfq_list)
+

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-23 Thread Jens Axboe

On Sun, Apr 22 2007, Brad Campbell wrote:
 Jens Axboe wrote:
 
 Thanks for testing Brad, be sure to use the next patch I sent instead.
 The one from this mail shouldn't even get you booted. So double check
 that you are still using CFQ :-)
 
 
 [184901.576773] BUG: unable to handle kernel NULL pointer dereference at 
 virtual address 005c
 [184901.602612]  printing eip:
 [184901.610990] c0205399
 [184901.617796] *pde = 
 [184901.626421] Oops:  [#1]
 [184901.635044] Modules linked in:
 [184901.644500] CPU:0
 [184901.644501] EIP:0060:[c0205399]Not tainted VLI
 [184901.644503] EFLAGS: 00010082   (2.6.21-rc7 #7)
 [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70
 [184901.696168] eax: f7f078e0   ebx: f7ca2794   ecx: 0004   edx: 
 
 [184901.716743] esi: c1acaa1c   edi: f7c9c6c0   ebp:    esp: 
 dbaefde0
 [184901.737316] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
 [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 
 task.ti=dbaee000)
 [184901.777422] Stack:  c1acaa1c f7c9c6c0  c0205509 
 e6b61bd8 c0133451 1000
 [184901.803121]0008  0004 0e713800 c1acaa1c 
 f7c9c6c0 c1acaa1c 
 [184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 
 c01fb829  c016ea8d
 [184901.854552] Call Trace:
 [184901.862723]  [c0205509] __cfq_dispatch_requests+0x79/0x170
 [184901.879971]  [c0133451] do_generic_mapping_read+0x281/0x470
 [184901.897478]  [c0205749] cfq_dispatch_requests+0x69/0x90
 [184901.913946]  [c01fb829] elv_next_request+0x39/0x130
 [184901.929375]  [c016ea8d] bio_endio+0x5d/0x90
 [184901.942725]  [c0270375] scsi_request_fn+0x45/0x280
 [184901.957896]  [c01fde92] blk_run_queue+0x32/0x70
 [184901.972286]  [c026f8e0] scsi_next_command+0x30/0x50
 [184901.987716]  [c026f9cb] scsi_end_request+0x9b/0xc0
 [184902.002886]  [c026fb31] scsi_io_completion+0x81/0x330
 [184902.018835]  [c026daeb] scsi_delete_timer+0xb/0x20
 [184902.034006]  [c027ee35] ata_scsi_qc_complete+0x65/0xd0
 [184902.050214]  [c02751bb] sd_rw_intr+0x8b/0x220
 [184902.064085]  [c0280c0c] ata_altstatus+0x1c/0x20
 [184902.078475]  [c027b68d] ata_hsm_move+0x14d/0x3f0
 [184902.093126]  [c026bcc0] scsi_finish_command+0x40/0x60
 [184902.109075]  [c02702bf] scsi_softirq_done+0x6f/0xe0
 [184902.124506]  [c0285f61] sil_interrupt+0x81/0x90
 [184902.138895]  [c01ffa78] blk_done_softirq+0x58/0x70
 [184902.154066]  [c011721f] __do_softirq+0x6f/0x80
 [184902.181806]  [c0104cee] do_IRQ+0x3e/0x80
 [184902.194380]  [c010322f] common_interrupt+0x23/0x28
 [184902.209551]  ===
 [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 
 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 
 8b 72 5c 8b 78
 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80
 [184902.280564] EIP: [c0205399] cfq_dispatch_insert+0x19/0x70 SS:ESP 
 0068:dbaefde0
 [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt
 [184902.322746] Rebooting in 60 seconds..
 
 Ok, it's taken be _ages_ to get the system to a point I can reproduce this, 
 but I think it's now reproducible with a couple of hours beating. The bad 
 news is it looks like it has not tickled any of your debugging markers! 
 This was the 1st thing printed on a clean serial console, so nothing above 
 that for days.
 
 I did double check and I was/am certainly running the kernel with the debug 
 patch compiled in.

Ok, can you try and reproduce with this one applied? It'll keep the
system running (unless there are other corruptions going on), so it
should help you a bit as well. It will dump some cfq state info when the
condition triggers that can perhaps help diagnose this. So if you can
apply this patch and reproduce + send the output, I'd much appreciate
it!

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..2aba928 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -947,6 +947,36 @@ keep_queue:
return cfqq;
 }
 
+static void cfq_dump_queue(struct cfq_queue *cfqq)
+{
+   printk(  %d: sort=%d,next=%p,q=%d/%d,a=%d/%d,d=%d/%d,f=%x\n, 
cfqq-key, RB_EMPTY_ROOT(cfqq-sort_list), cfqq-next_rq, cfqq-queued[0], 
cfqq-queued[1], cfqq-allocated[0], cfqq-allocated[1], cfqq-on_dispatch[0], 
cfqq-on_dispatch[1], cfqq-flags);
+}
+
+static void cfq_dump_state(struct cfq_data *cfqd)
+{
+   struct cfq_queue *cfqq;
+   int i;
+
+   printk(cfq: busy=%d,drv=%d,timer=%d\n, cfqd-busy_queues, 
cfqd-rq_in_driver, timer_pending(cfqd-idle_slice_timer));
+
+   printk(cfq rr_list:\n);
+   for (i = 0; i  CFQ_PRIO_LISTS; i++)
+   list_for_each_entry(cfqq, cfqd-rr_list[i], cfq_list)
+   cfq_dump_queue(cfqq);
+
+   printk(cfq busy_list:\n);
+   list_for_each_entry(cfqq, cfqd-busy_rr, cfq_list)
+   cfq_dump_queue(cfqq);
+
+   printk(cfq idle_list:\n);
+   list_for_each_entry(cfqq, cfqd-idle_rr, cfq_list)
+

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-22 Thread Brad Campbell


Jens Axboe wrote:


Thanks for testing Brad, be sure to use the next patch I sent instead.
The one from this mail shouldn't even get you booted. So double check
that you are still using CFQ :-)



[184901.576773] BUG: unable to handle kernel NULL pointer dereference at 
virtual address 005c
[184901.602612]  printing eip:
[184901.610990] c0205399
[184901.617796] *pde = 
[184901.626421] Oops:  [#1]
[184901.635044] Modules linked in:
[184901.644500] CPU:0
[184901.644501] EIP:0060:[]Not tainted VLI
[184901.644503] EFLAGS: 00010082   (2.6.21-rc7 #7)
[184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70
[184901.696168] eax: f7f078e0   ebx: f7ca2794   ecx: 0004   edx: 
[184901.716743] esi: c1acaa1c   edi: f7c9c6c0   ebp:    esp: dbaefde0
[184901.737316] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 
task.ti=dbaee000)
[184901.777422] Stack:  c1acaa1c f7c9c6c0  c0205509 e6b61bd8 
c0133451 1000
[184901.803121]0008  0004 0e713800 c1acaa1c f7c9c6c0 
c1acaa1c 
[184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 c01fb829 
 c016ea8d
[184901.854552] Call Trace:
[184901.862723]  [] __cfq_dispatch_requests+0x79/0x170
[184901.879971]  [] do_generic_mapping_read+0x281/0x470
[184901.897478]  [] cfq_dispatch_requests+0x69/0x90
[184901.913946]  [] elv_next_request+0x39/0x130
[184901.929375]  [] bio_endio+0x5d/0x90
[184901.942725]  [] scsi_request_fn+0x45/0x280
[184901.957896]  [] blk_run_queue+0x32/0x70
[184901.972286]  [] scsi_next_command+0x30/0x50
[184901.987716]  [] scsi_end_request+0x9b/0xc0
[184902.002886]  [] scsi_io_completion+0x81/0x330
[184902.018835]  [] scsi_delete_timer+0xb/0x20
[184902.034006]  [] ata_scsi_qc_complete+0x65/0xd0
[184902.050214]  [] sd_rw_intr+0x8b/0x220
[184902.064085]  [] ata_altstatus+0x1c/0x20
[184902.078475]  [] ata_hsm_move+0x14d/0x3f0
[184902.093126]  [] scsi_finish_command+0x40/0x60
[184902.109075]  [] scsi_softirq_done+0x6f/0xe0
[184902.124506]  [] sil_interrupt+0x81/0x90
[184902.138895]  [] blk_done_softirq+0x58/0x70
[184902.154066]  [] __do_softirq+0x6f/0x80
[184902.181806]  [] do_IRQ+0x3e/0x80
[184902.194380]  [] common_interrupt+0x23/0x28
[184902.209551]  ===
[184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 
24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78

04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80
[184902.280564] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP 
0068:dbaefde0
[184902.303418] Kernel panic - not syncing: Fatal exception in interrupt
[184902.322746] Rebooting in 60 seconds..

Ok, it's taken be _ages_ to get the system to a point I can reproduce this, but I think it's now 
reproducible with a couple of hours beating. The bad news is it looks like it has not tickled any of 
your debugging markers! This was the 1st thing printed on a clean serial console, so nothing above 
that for days.


I did double check and I was/am certainly running the kernel with the debug 
patch compiled in.

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-22 Thread Brad Campbell


Jens Axboe wrote:


Thanks for testing Brad, be sure to use the next patch I sent instead.
The one from this mail shouldn't even get you booted. So double check
that you are still using CFQ :-)



[184901.576773] BUG: unable to handle kernel NULL pointer dereference at 
virtual address 005c
[184901.602612]  printing eip:
[184901.610990] c0205399
[184901.617796] *pde = 
[184901.626421] Oops:  [#1]
[184901.635044] Modules linked in:
[184901.644500] CPU:0
[184901.644501] EIP:0060:[c0205399]Not tainted VLI
[184901.644503] EFLAGS: 00010082   (2.6.21-rc7 #7)
[184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70
[184901.696168] eax: f7f078e0   ebx: f7ca2794   ecx: 0004   edx: 
[184901.716743] esi: c1acaa1c   edi: f7c9c6c0   ebp:    esp: dbaefde0
[184901.737316] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 
task.ti=dbaee000)
[184901.777422] Stack:  c1acaa1c f7c9c6c0  c0205509 e6b61bd8 
c0133451 1000
[184901.803121]0008  0004 0e713800 c1acaa1c f7c9c6c0 
c1acaa1c 
[184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 c01fb829 
 c016ea8d
[184901.854552] Call Trace:
[184901.862723]  [c0205509] __cfq_dispatch_requests+0x79/0x170
[184901.879971]  [c0133451] do_generic_mapping_read+0x281/0x470
[184901.897478]  [c0205749] cfq_dispatch_requests+0x69/0x90
[184901.913946]  [c01fb829] elv_next_request+0x39/0x130
[184901.929375]  [c016ea8d] bio_endio+0x5d/0x90
[184901.942725]  [c0270375] scsi_request_fn+0x45/0x280
[184901.957896]  [c01fde92] blk_run_queue+0x32/0x70
[184901.972286]  [c026f8e0] scsi_next_command+0x30/0x50
[184901.987716]  [c026f9cb] scsi_end_request+0x9b/0xc0
[184902.002886]  [c026fb31] scsi_io_completion+0x81/0x330
[184902.018835]  [c026daeb] scsi_delete_timer+0xb/0x20
[184902.034006]  [c027ee35] ata_scsi_qc_complete+0x65/0xd0
[184902.050214]  [c02751bb] sd_rw_intr+0x8b/0x220
[184902.064085]  [c0280c0c] ata_altstatus+0x1c/0x20
[184902.078475]  [c027b68d] ata_hsm_move+0x14d/0x3f0
[184902.093126]  [c026bcc0] scsi_finish_command+0x40/0x60
[184902.109075]  [c02702bf] scsi_softirq_done+0x6f/0xe0
[184902.124506]  [c0285f61] sil_interrupt+0x81/0x90
[184902.138895]  [c01ffa78] blk_done_softirq+0x58/0x70
[184902.154066]  [c011721f] __do_softirq+0x6f/0x80
[184902.181806]  [c0104cee] do_IRQ+0x3e/0x80
[184902.194380]  [c010322f] common_interrupt+0x23/0x28
[184902.209551]  ===
[184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 
24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78

04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80
[184902.280564] EIP: [c0205399] cfq_dispatch_insert+0x19/0x70 SS:ESP 
0068:dbaefde0
[184902.303418] Kernel panic - not syncing: Fatal exception in interrupt
[184902.322746] Rebooting in 60 seconds..

Ok, it's taken be _ages_ to get the system to a point I can reproduce this, but I think it's now 
reproducible with a couple of hours beating. The bad news is it looks like it has not tickled any of 
your debugging markers! This was the 1st thing printed on a clean serial console, so nothing above 
that for days.


I did double check and I was/am certainly running the kernel with the debug 
patch compiled in.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Jens Axboe

On Wed, Apr 18 2007, Brad Campbell wrote:
> Jens Axboe wrote:
> 
> >I had something similar for generic_unplug_request() as well, but didn't
> >see/hear any reports of it being tried out. Here's a complete debugging
> >patch for this and other potential dangers.
> >
> 
> I had a clean 2.6.21-rc7 that I forgot to change the default sched on take 
> down my main server last night. Another oops of the same variety. 
> Unfortunately only a digital photo to go by though.
> 
> I've removed Neil's patch, applied yours and fired it up on my 4 drive test 
> box (the one that was causing the trouble before). I've finally scrounged 
> another machine for a serial logger and I'll beat it until it breaks. 
> Perhaps if I can get this to reliably trigger again we can track it down.

Thanks for testing Brad, be sure to use the next patch I sent instead.
The one from this mail shouldn't even get you booted. So double check
that you are still using CFQ :-)

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Jens Axboe

On Wed, Apr 18 2007, Jens Axboe wrote:
> I had something similar for generic_unplug_request() as well, but didn't
> see/hear any reports of it being tried out. Here's a complete debugging
> patch for this and other potential dangers.

Which had a bug (do the check _after_ deleting from the rbtree,
otherwise it'll go rightfully BUG() the first time).

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 6853fb7..9aa7bf7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -644,12 +644,15 @@ static void cfq_remove_request(struct request *rq)
 {
struct cfq_queue *cfqq = RQ_CFQQ(rq);
 
+   BUG_ON(!irqs_disabled());
if (cfqq->next_rq == rq)
cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq);
 
list_del_init(>queuelist);
cfq_del_rq_rb(rq);
 
+   BUG_ON(!cfqq->next_rq && !RB_EMPTY_ROOT(>sort_list));
+
if (rq_is_meta(rq)) {
WARN_ON(!cfqq->meta_pending);
cfqq->meta_pending--;
@@ -1231,6 +1234,8 @@ static void cfq_exit_single_io_context(struct 
cfq_io_context *cic)
 {
struct cfq_data *cfqd = cic->key;
 
+   BUG_ON(irqs_disabled());
+
if (cfqd) {
request_queue_t *q = cfqd->queue;
 
@@ -1699,6 +1704,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue 
*cfqq,
/*
 * check if this request is a better next-serve candidate)) {
 */
+   BUG_ON(!irqs_disabled());
cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
BUG_ON(!cfqq->next_rq);
 
@@ -2014,6 +2020,7 @@ static void cfq_exit_queue(elevator_t *e)
 
cfq_shutdown_timer_wq(cfqd);
 
+   BUG_ON(irqs_disabled());
spin_lock_irq(q->queue_lock);
 
if (cfqd->active_queue)
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3de0695..c16863e 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device);
  **/
 void generic_unplug_device(request_queue_t *q)
 {
+   BUG_ON(irqs_disabled());
+
spin_lock_irq(q->queue_lock);
__generic_unplug_device(q);
spin_unlock_irq(q->queue_lock);

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Brad Campbell


Jens Axboe wrote:


I had something similar for generic_unplug_request() as well, but didn't
see/hear any reports of it being tried out. Here's a complete debugging
patch for this and other potential dangers.



I had a clean 2.6.21-rc7 that I forgot to change the default sched on take down my main server last 
night. Another oops of the same variety. Unfortunately only a digital photo to go by though.


I've removed Neil's patch, applied yours and fired it up on my 4 drive test box (the one that was 
causing the trouble before). I've finally scrounged another machine for a serial logger and I'll 
beat it until it breaks. Perhaps if I can get this to reliably trigger again we can track it down.


Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Jens Axboe

On Tue, Apr 17 2007, Neil Brown wrote:
> On Monday April 16, [EMAIL PROTECTED] wrote:
> > 
> > cfq_dispatch_insert() was called with rq == 0. This one is getting really
> > annoying... and md is involved again (RAID0 this time.)
> 
> Yeah... weird.
> RAID0 is so light-weight and so different from RAID1 or RAID5 that I
> feel fairly safe concluding that the problem isn't in or near md.
> But that doesn't help you.

Well the fact is that we have 3-4 distinct reports of this oops, and all
of them are using md. No reports have been filed where md isn't managing
the disks. While this doesn't conclusively put the finger of blame on
md, it is still likely. It could bug a CFQ bug too of course, or (more
likely), some bad interaction between md and CFQ.

> This really feels like a locking problem.

Very much.

> The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is
> not NULL.  That happens plenty of times in the code (particularly as
> the first request is inserted) but always under ->queue_lock so it
> should never be visible to cfq_dispatch_insert..
> 
> Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
> elv_next_request which could ultimately call __cfq_dispatch_requests
> without taking ->queue_lock (that I can see).  But you probably aren't
> using ide-scsi (does anyone?).
> 
> Given that interrupts are always disabled when queue_lock is taken, it
> might be useful to add
>WARN_ON(!irqs_disabled());
> every time ->next_rq is set.
> Something like the following.
> 
> It might show something useful if we are lucky.

I had something similar for generic_unplug_request() as well, but didn't
see/hear any reports of it being tried out. Here's a complete debugging
patch for this and other potential dangers.

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..8f749aa 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -587,8 +587,11 @@ static void cfq_remove_request(struct request *rq)
 {
struct cfq_queue *cfqq = RQ_CFQQ(rq);
 
-   if (cfqq->next_rq == rq)
+   BUG_ON(!irqs_disabled());
+   if (cfqq->next_rq == rq) {
cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq);
+   BUG_ON(!cfqq->next_rq && !RB_EMPTY_ROOT(>sort_list));
+   }
 
list_del_init(>queuelist);
cfq_del_rq_rb(rq);
@@ -1637,6 +1640,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue 
*cfqq,
/*
 * check if this request is a better next-serve candidate)) {
 */
+   BUG_ON(!irqs_disabled());
cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
BUG_ON(!cfqq->next_rq);
 
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3de0695..c16863e 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device);
  **/
 void generic_unplug_device(request_queue_t *q)
 {
+   BUG_ON(irqs_disabled());
+
spin_lock_irq(q->queue_lock);
__generic_unplug_device(q);
spin_unlock_irq(q->queue_lock);

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Jens Axboe

On Tue, Apr 17 2007, Neil Brown wrote:
 On Monday April 16, [EMAIL PROTECTED] wrote:
  
  cfq_dispatch_insert() was called with rq == 0. This one is getting really
  annoying... and md is involved again (RAID0 this time.)
 
 Yeah... weird.
 RAID0 is so light-weight and so different from RAID1 or RAID5 that I
 feel fairly safe concluding that the problem isn't in or near md.
 But that doesn't help you.

Well the fact is that we have 3-4 distinct reports of this oops, and all
of them are using md. No reports have been filed where md isn't managing
the disks. While this doesn't conclusively put the finger of blame on
md, it is still likely. It could bug a CFQ bug too of course, or (more
likely), some bad interaction between md and CFQ.

 This really feels like a locking problem.

Very much.

 The problem occurs when -next_rq is NULL, but -sort_list.rb_node is
 not NULL.  That happens plenty of times in the code (particularly as
 the first request is inserted) but always under -queue_lock so it
 should never be visible to cfq_dispatch_insert..
 
 Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
 elv_next_request which could ultimately call __cfq_dispatch_requests
 without taking -queue_lock (that I can see).  But you probably aren't
 using ide-scsi (does anyone?).
 
 Given that interrupts are always disabled when queue_lock is taken, it
 might be useful to add
WARN_ON(!irqs_disabled());
 every time -next_rq is set.
 Something like the following.
 
 It might show something useful if we are lucky.

I had something similar for generic_unplug_request() as well, but didn't
see/hear any reports of it being tried out. Here's a complete debugging
patch for this and other potential dangers.

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..8f749aa 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -587,8 +587,11 @@ static void cfq_remove_request(struct request *rq)
 {
struct cfq_queue *cfqq = RQ_CFQQ(rq);
 
-   if (cfqq-next_rq == rq)
+   BUG_ON(!irqs_disabled());
+   if (cfqq-next_rq == rq) {
cfqq-next_rq = cfq_find_next_rq(cfqq-cfqd, cfqq, rq);
+   BUG_ON(!cfqq-next_rq  !RB_EMPTY_ROOT(cfqq-sort_list));
+   }
 
list_del_init(rq-queuelist);
cfq_del_rq_rb(rq);
@@ -1637,6 +1640,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue 
*cfqq,
/*
 * check if this request is a better next-serve candidate)) {
 */
+   BUG_ON(!irqs_disabled());
cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq);
BUG_ON(!cfqq-next_rq);
 
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3de0695..c16863e 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device);
  **/
 void generic_unplug_device(request_queue_t *q)
 {
+   BUG_ON(irqs_disabled());
+
spin_lock_irq(q-queue_lock);
__generic_unplug_device(q);
spin_unlock_irq(q-queue_lock);

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Jens Axboe

On Wed, Apr 18 2007, Jens Axboe wrote:
 I had something similar for generic_unplug_request() as well, but didn't
 see/hear any reports of it being tried out. Here's a complete debugging
 patch for this and other potential dangers.

Which had a bug (do the check _after_ deleting from the rbtree,
otherwise it'll go rightfully BUG() the first time).

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 6853fb7..9aa7bf7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -644,12 +644,15 @@ static void cfq_remove_request(struct request *rq)
 {
struct cfq_queue *cfqq = RQ_CFQQ(rq);
 
+   BUG_ON(!irqs_disabled());
if (cfqq-next_rq == rq)
cfqq-next_rq = cfq_find_next_rq(cfqq-cfqd, cfqq, rq);
 
list_del_init(rq-queuelist);
cfq_del_rq_rb(rq);
 
+   BUG_ON(!cfqq-next_rq  !RB_EMPTY_ROOT(cfqq-sort_list));
+
if (rq_is_meta(rq)) {
WARN_ON(!cfqq-meta_pending);
cfqq-meta_pending--;
@@ -1231,6 +1234,8 @@ static void cfq_exit_single_io_context(struct 
cfq_io_context *cic)
 {
struct cfq_data *cfqd = cic-key;
 
+   BUG_ON(irqs_disabled());
+
if (cfqd) {
request_queue_t *q = cfqd-queue;
 
@@ -1699,6 +1704,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue 
*cfqq,
/*
 * check if this request is a better next-serve candidate)) {
 */
+   BUG_ON(!irqs_disabled());
cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq);
BUG_ON(!cfqq-next_rq);
 
@@ -2014,6 +2020,7 @@ static void cfq_exit_queue(elevator_t *e)
 
cfq_shutdown_timer_wq(cfqd);
 
+   BUG_ON(irqs_disabled());
spin_lock_irq(q-queue_lock);
 
if (cfqd-active_queue)
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3de0695..c16863e 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device);
  **/
 void generic_unplug_device(request_queue_t *q)
 {
+   BUG_ON(irqs_disabled());
+
spin_lock_irq(q-queue_lock);
__generic_unplug_device(q);
spin_unlock_irq(q-queue_lock);

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Brad Campbell


Jens Axboe wrote:


I had something similar for generic_unplug_request() as well, but didn't
see/hear any reports of it being tried out. Here's a complete debugging
patch for this and other potential dangers.



I had a clean 2.6.21-rc7 that I forgot to change the default sched on take down my main server last 
night. Another oops of the same variety. Unfortunately only a digital photo to go by though.


I've removed Neil's patch, applied yours and fired it up on my 4 drive test box (the one that was 
causing the trouble before). I've finally scrounged another machine for a serial logger and I'll 
beat it until it breaks. Perhaps if I can get this to reliably trigger again we can track it down.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-18 Thread Jens Axboe

On Wed, Apr 18 2007, Brad Campbell wrote:
 Jens Axboe wrote:
 
 I had something similar for generic_unplug_request() as well, but didn't
 see/hear any reports of it being tried out. Here's a complete debugging
 patch for this and other potential dangers.
 
 
 I had a clean 2.6.21-rc7 that I forgot to change the default sched on take 
 down my main server last night. Another oops of the same variety. 
 Unfortunately only a digital photo to go by though.
 
 I've removed Neil's patch, applied yours and fired it up on my 4 drive test 
 box (the one that was causing the trouble before). I've finally scrounged 
 another machine for a serial logger and I'll beat it until it breaks. 
 Perhaps if I can get this to reliably trigger again we can track it down.

Thanks for testing Brad, be sure to use the next patch I sent instead.
The one from this mail shouldn't even get you booted. So double check
that you are still using CFQ :-)

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-17 Thread Bartlomiej Zolnierkiewicz


Hi,

On Tuesday 17 April 2007, Neil Brown wrote:
> On Monday April 16, [EMAIL PROTECTED] wrote:
> > 
> > cfq_dispatch_insert() was called with rq == 0. This one is getting really
> > annoying... and md is involved again (RAID0 this time.)
> 
> Yeah... weird.
> RAID0 is so light-weight and so different from RAID1 or RAID5 that I
> feel fairly safe concluding that the problem isn't in or near md.
> But that doesn't help you.
> 
> This really feels like a locking problem.
> 
> The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is
> not NULL.  That happens plenty of times in the code (particularly as
> the first request is inserted) but always under ->queue_lock so it
> should never be visible to cfq_dispatch_insert..
> 
> Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
> elv_next_request which could ultimately call __cfq_dispatch_requests
> without taking ->queue_lock (that I can see).  But you probably aren't
> using ide-scsi (does anyone?).

ide-scsi is holding ide_lock while calling elv_next_request()
(for ide ide_lock == ->queue_lock)

Also from the original report:

On Sunday 15 April 2007, Brad Campbell wrote:
> 
> The box is booted with PXE and runs an nfsroot. It's Debian 3.1. It has 2 SIL 
> 3112 controllers in it 
> with 4 WD 200GB ATA drives all on PATA->SATA bridges.

and you can even see libata functions in the OOPS...

Bart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-17 Thread Brad Campbell


Neil Brown wrote:

On Monday April 16, [EMAIL PROTECTED] wrote:

cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)


Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that the problem isn't in or near md.


Just a quick thought. Could this issue in any way cause silent data corruption 
on write to the md?

I've been chasing a bug for months now on this box which resulted in a small bit of seemingly random 
corruption in large copies to /dev/md0. Problem is, I changed the PSU in the box (which appeared to 
help a little) but then about the same time I discovered the oops was in cfq so changed the 
scheduler also. I've not been able to oops or reproduce the data corruption since.


I've now got to go back to cfq and see if I can get it to panic reproducibly again, but I'm 
wondering now if there may have been another issue there.


Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-17 Thread Brad Campbell


Neil Brown wrote:

On Monday April 16, [EMAIL PROTECTED] wrote:

cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)


Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that the problem isn't in or near md.
But that doesn't help you.


It's raid-5 here and it was degraded. I've just added the extra disk now and I've been running with 
the deadline scheduler to work around the issue. I'll add Neil's patch and try some tests with cfq 
tonight if I get the chance.


Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-17 Thread Brad Campbell


Neil Brown wrote:

On Monday April 16, [EMAIL PROTECTED] wrote:

cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)


Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that the problem isn't in or near md.
But that doesn't help you.


It's raid-5 here and it was degraded. I've just added the extra disk now and I've been running with 
the deadline scheduler to work around the issue. I'll add Neil's patch and try some tests with cfq 
tonight if I get the chance.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-17 Thread Brad Campbell


Neil Brown wrote:

On Monday April 16, [EMAIL PROTECTED] wrote:

cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)


Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that the problem isn't in or near md.


Just a quick thought. Could this issue in any way cause silent data corruption 
on write to the md?

I've been chasing a bug for months now on this box which resulted in a small bit of seemingly random 
corruption in large copies to /dev/md0. Problem is, I changed the PSU in the box (which appeared to 
help a little) but then about the same time I discovered the oops was in cfq so changed the 
scheduler also. I've not been able to oops or reproduce the data corruption since.


I've now got to go back to cfq and see if I can get it to panic reproducibly again, but I'm 
wondering now if there may have been another issue there.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-17 Thread Bartlomiej Zolnierkiewicz


Hi,

On Tuesday 17 April 2007, Neil Brown wrote:
 On Monday April 16, [EMAIL PROTECTED] wrote:
  
  cfq_dispatch_insert() was called with rq == 0. This one is getting really
  annoying... and md is involved again (RAID0 this time.)
 
 Yeah... weird.
 RAID0 is so light-weight and so different from RAID1 or RAID5 that I
 feel fairly safe concluding that the problem isn't in or near md.
 But that doesn't help you.
 
 This really feels like a locking problem.
 
 The problem occurs when -next_rq is NULL, but -sort_list.rb_node is
 not NULL.  That happens plenty of times in the code (particularly as
 the first request is inserted) but always under -queue_lock so it
 should never be visible to cfq_dispatch_insert..
 
 Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
 elv_next_request which could ultimately call __cfq_dispatch_requests
 without taking -queue_lock (that I can see).  But you probably aren't
 using ide-scsi (does anyone?).

ide-scsi is holding ide_lock while calling elv_next_request()
(for ide ide_lock == -queue_lock)

Also from the original report:

On Sunday 15 April 2007, Brad Campbell wrote:
 
 The box is booted with PXE and runs an nfsroot. It's Debian 3.1. It has 2 SIL 
 3112 controllers in it 
 with 4 WD 200GB ATA drives all on PATA-SATA bridges.

and you can even see libata functions in the OOPS...

Bart
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-16 Thread Neil Brown

On Monday April 16, [EMAIL PROTECTED] wrote:
> 
> cfq_dispatch_insert() was called with rq == 0. This one is getting really
> annoying... and md is involved again (RAID0 this time.)

Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that the problem isn't in or near md.
But that doesn't help you.

This really feels like a locking problem.

The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is
not NULL.  That happens plenty of times in the code (particularly as
the first request is inserted) but always under ->queue_lock so it
should never be visible to cfq_dispatch_insert..

Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
elv_next_request which could ultimately call __cfq_dispatch_requests
without taking ->queue_lock (that I can see).  But you probably aren't
using ide-scsi (does anyone?).

Given that interrupts are always disabled when queue_lock is taken, it
might be useful to add
   WARN_ON(!irqs_disabled());
every time ->next_rq is set.
Something like the following.

It might show something useful if we are lucky.

NeilBrown

diff .prev/block/cfq-iosched.c ./block/cfq-iosched.c
--- .prev/block/cfq-iosched.c   2007-04-17 15:01:36.0 +1000
+++ ./block/cfq-iosched.c   2007-04-17 15:02:25.0 +1000
@@ -628,6 +628,7 @@ static void cfq_remove_request(struct re
 {
struct cfq_queue *cfqq = RQ_CFQQ(rq);

+   BUG_ON(!irqs_disabled());
if (cfqq->next_rq == rq)
cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq);

@@ -1810,6 +1811,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, s
/*
 * check if this request is a better next-serve candidate)) {
 */
+   BUG_ON(!irqs_disabled());
cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
BUG_ON(!cfqq->next_rq);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-16 Thread Chuck Ebbert

Brad Campbell wrote:
> Brad Campbell wrote:
>> G'day all,
>>
>> All I have is a digital photo of this oops. (It's 3.5mb). I have
>> serial console configured, but Murphy is watching me carefully and I
>> just can't seem to reproduce it while logging the console output.
>>
> 
> And as usual, after trying to capture one for 4 days, I get one 10 mins
> after I've sent the E-mail :)
> 
> I think I've just found a way to make this easier to reproduce as
> /dev/sdd was not even mounted this
> time. I just cold booted and started an md5sum -c run on a directory of
> about 180GB.
> 
> [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at
> virtual address 005c
> [ 2566.218242]  printing eip:
> [ 2566.226362] c0203169
> [ 2566.232906] *pde = 
> [ 2566.241274] Oops:  [#1]
> [ 2566.249637] Modules linked in:
> [ 2566.258832] CPU:0
> [ 2566.258833] EIP:0060:[]Not tainted VLI
> [ 2566.258834] EFLAGS: 00010082   (2.6.21-rc6-git5 #1)
> [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70
> [ 2566.310761] eax: f7a0eae0   ebx: f7a0cb28   ecx: e2f869e8   edx:
> 
> [ 2566.331076] esi: f79fea7c   edi: f7d04ac0   ebp:    esp:
> f6945de0
> [ 2566.351388] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0
> task.ti=f6944000)
> [ 2566.390975] Stack:  f79fea7c f7d04ac0  c02032d9
> f6ae5ef0 c0133411 1000
> [ 2566.416414]0008  0004 0b582fd4 f79fea7c
> f7d04ac0 f79fea7c 
> [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282
> c01fb7a9  c016ea4d
> [ 2566.467326] Call Trace:
> [ 2566.475236]  [] __cfq_dispatch_requests+0x79/0x170
> [ 2566.492224]  [] do_generic_mapping_read+0x281/0x470
> [ 2566.509473]  [] cfq_dispatch_requests+0x69/0x90
> [ 2566.525681]  [] elv_next_request+0x39/0x130
> [ 2566.540850]  [] bio_endio+0x5d/0x90
> [ 2566.553942]  [] scsi_request_fn+0x45/0x280
> [ 2566.568851]  [] blk_run_queue+0x32/0x70
> [ 2566.582982]  [] scsi_next_command+0x30/0x50
> [ 2566.598154]  [] scsi_end_request+0x9b/0xc0
> [ 2566.613063]  [] scsi_io_completion+0x81/0x330
> [ 2566.628751]  [] scsi_delete_timer+0xb/0x20
> [ 2566.643661]  [] ata_scsi_qc_complete+0x65/0xd0
> [ 2566.659613]  [] sd_rw_intr+0x8b/0x220
> [ 2566.673222]  [] ata_altstatus+0x1c/0x20
> [ 2566.687352]  [] ata_hsm_move+0x14d/0x3f0
> [ 2566.701744]  [] scsi_finish_command+0x40/0x60
> [ 2566.717434]  [] scsi_softirq_done+0x6f/0xe0
> [ 2566.732604]  [] sil_interrupt+0x81/0x90
> [ 2566.746733]  [] blk_done_softirq+0x58/0x70
> [ 2566.761644]  [] __do_softirq+0x6f/0x80
> [ 2566.775516]  [] do_softirq+0x27/0x30
> [ 2566.788866]  [] do_IRQ+0x3e/0x80
> [ 2566.801177]  [] common_interrupt+0x23/0x28
> [ 2566.816090]  ===
> [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00
> 00 83 ec 10 89 1c 24 89 6c
> 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04
> 89 d0 e8 4a fa ff ff 8b 45 14
> 89 ea 25 01 80
> [ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP
> 0068:f6945de0
> [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt

cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-16 Thread Chuck Ebbert

Brad Campbell wrote:
 Brad Campbell wrote:
 G'day all,

 All I have is a digital photo of this oops. (It's 3.5mb). I have
 serial console configured, but Murphy is watching me carefully and I
 just can't seem to reproduce it while logging the console output.

 
 And as usual, after trying to capture one for 4 days, I get one 10 mins
 after I've sent the E-mail :)
 
 I think I've just found a way to make this easier to reproduce as
 /dev/sdd was not even mounted this
 time. I just cold booted and started an md5sum -c run on a directory of
 about 180GB.
 
 [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at
 virtual address 005c
 [ 2566.218242]  printing eip:
 [ 2566.226362] c0203169
 [ 2566.232906] *pde = 
 [ 2566.241274] Oops:  [#1]
 [ 2566.249637] Modules linked in:
 [ 2566.258832] CPU:0
 [ 2566.258833] EIP:0060:[c0203169]Not tainted VLI
 [ 2566.258834] EFLAGS: 00010082   (2.6.21-rc6-git5 #1)
 [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70
 [ 2566.310761] eax: f7a0eae0   ebx: f7a0cb28   ecx: e2f869e8   edx:
 
 [ 2566.331076] esi: f79fea7c   edi: f7d04ac0   ebp:    esp:
 f6945de0
 [ 2566.351388] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
 [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0
 task.ti=f6944000)
 [ 2566.390975] Stack:  f79fea7c f7d04ac0  c02032d9
 f6ae5ef0 c0133411 1000
 [ 2566.416414]0008  0004 0b582fd4 f79fea7c
 f7d04ac0 f79fea7c 
 [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282
 c01fb7a9  c016ea4d
 [ 2566.467326] Call Trace:
 [ 2566.475236]  [c02032d9] __cfq_dispatch_requests+0x79/0x170
 [ 2566.492224]  [c0133411] do_generic_mapping_read+0x281/0x470
 [ 2566.509473]  [c0203519] cfq_dispatch_requests+0x69/0x90
 [ 2566.525681]  [c01fb7a9] elv_next_request+0x39/0x130
 [ 2566.540850]  [c016ea4d] bio_endio+0x5d/0x90
 [ 2566.553942]  [c0268635] scsi_request_fn+0x45/0x280
 [ 2566.568851]  [c01fdde2] blk_run_queue+0x32/0x70
 [ 2566.582982]  [c0267ba0] scsi_next_command+0x30/0x50
 [ 2566.598154]  [c0267c8b] scsi_end_request+0x9b/0xc0
 [ 2566.613063]  [c0267df1] scsi_io_completion+0x81/0x330
 [ 2566.628751]  [c0265dab] scsi_delete_timer+0xb/0x20
 [ 2566.643661]  [c02770f5] ata_scsi_qc_complete+0x65/0xd0
 [ 2566.659613]  [c026d47b] sd_rw_intr+0x8b/0x220
 [ 2566.673222]  [c0278ecc] ata_altstatus+0x1c/0x20
 [ 2566.687352]  [c027394d] ata_hsm_move+0x14d/0x3f0
 [ 2566.701744]  [c0263f80] scsi_finish_command+0x40/0x60
 [ 2566.717434]  [c026857f] scsi_softirq_done+0x6f/0xe0
 [ 2566.732604]  [c027d521] sil_interrupt+0x81/0x90
 [ 2566.746733]  [c01ff9b8] blk_done_softirq+0x58/0x70
 [ 2566.761644]  [c011721f] __do_softirq+0x6f/0x80
 [ 2566.775516]  [c0117257] do_softirq+0x27/0x30
 [ 2566.788866]  [c0104cee] do_IRQ+0x3e/0x80
 [ 2566.801177]  [c010322f] common_interrupt+0x23/0x28
 [ 2566.816090]  ===
 [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00
 00 83 ec 10 89 1c 24 89 6c
 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04
 89 d0 e8 4a fa ff ff 8b 45 14
 89 ea 25 01 80
 [ 2566.886586] EIP: [c0203169] cfq_dispatch_insert+0x19/0x70 SS:ESP
 0068:f6945de0
 [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt

cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-16 Thread Neil Brown

On Monday April 16, [EMAIL PROTECTED] wrote:
 
 cfq_dispatch_insert() was called with rq == 0. This one is getting really
 annoying... and md is involved again (RAID0 this time.)

Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that the problem isn't in or near md.
But that doesn't help you.

This really feels like a locking problem.

The problem occurs when -next_rq is NULL, but -sort_list.rb_node is
not NULL.  That happens plenty of times in the code (particularly as
the first request is inserted) but always under -queue_lock so it
should never be visible to cfq_dispatch_insert..

Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
elv_next_request which could ultimately call __cfq_dispatch_requests
without taking -queue_lock (that I can see).  But you probably aren't
using ide-scsi (does anyone?).

Given that interrupts are always disabled when queue_lock is taken, it
might be useful to add
   WARN_ON(!irqs_disabled());
every time -next_rq is set.
Something like the following.

It might show something useful if we are lucky.

NeilBrown


diff .prev/block/cfq-iosched.c ./block/cfq-iosched.c
--- .prev/block/cfq-iosched.c   2007-04-17 15:01:36.0 +1000
+++ ./block/cfq-iosched.c   2007-04-17 15:02:25.0 +1000
@@ -628,6 +628,7 @@ static void cfq_remove_request(struct re
 {
struct cfq_queue *cfqq = RQ_CFQQ(rq);
 
+   BUG_ON(!irqs_disabled());
if (cfqq-next_rq == rq)
cfqq-next_rq = cfq_find_next_rq(cfqq-cfqd, cfqq, rq);
 
@@ -1810,6 +1811,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, s
/*
 * check if this request is a better next-serve candidate)) {
 */
+   BUG_ON(!irqs_disabled());
cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq);
BUG_ON(!cfqq-next_rq);
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-15 Thread Brad Campbell


Adrian Bunk wrote:

[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ]

On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote:

Brad Campbell wrote:

G'day all,

All I have is a digital photo of this oops. (It's 3.5mb). I have serial 
console configured, but Murphy is watching me carefully and I just can't 
seem to reproduce it while logging the console output.


And as usual, after trying to capture one for 4 days, I get one 10 mins 
after I've sent the E-mail :)


I think I've just found a way to make this easier to reproduce as /dev/sdd 
was not even mounted this
time. I just cold booted and started an md5sum -c run on a directory of 
about 180GB.



Is this also present with kernel 2.6.20 or is it a regression?



Yes, I reproduced it with 2.6.20 but appear to be unable to do so with 2.6.19.



--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-15 Thread Adrian Bunk

[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ]

On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote:
> Brad Campbell wrote:
> >G'day all,
> >
> >All I have is a digital photo of this oops. (It's 3.5mb). I have serial 
> >console configured, but Murphy is watching me carefully and I just can't 
> >seem to reproduce it while logging the console output.
> >
> 
> And as usual, after trying to capture one for 4 days, I get one 10 mins 
> after I've sent the E-mail :)
> 
> I think I've just found a way to make this easier to reproduce as /dev/sdd 
> was not even mounted this
> time. I just cold booted and started an md5sum -c run on a directory of 
> about 180GB.


Is this also present with kernel 2.6.20 or is it a regression?


> [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 005c
> [ 2566.218242]  printing eip:
> [ 2566.226362] c0203169
> [ 2566.232906] *pde = 
> [ 2566.241274] Oops:  [#1]
> [ 2566.249637] Modules linked in:
> [ 2566.258832] CPU:0
> [ 2566.258833] EIP:0060:[]Not tainted VLI
> [ 2566.258834] EFLAGS: 00010082   (2.6.21-rc6-git5 #1)
> [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70
> [ 2566.310761] eax: f7a0eae0   ebx: f7a0cb28   ecx: e2f869e8   edx: 
> [ 2566.331076] esi: f79fea7c   edi: f7d04ac0   ebp:    esp: f6945de0
> [ 2566.351388] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 
> task.ti=f6944000)
> [ 2566.390975] Stack:  f79fea7c f7d04ac0  c02032d9 f6ae5ef0 
> c0133411 1000
> [ 2566.416414]0008  0004 0b582fd4 f79fea7c f7d04ac0 
> f79fea7c 
> [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 
>  c016ea4d
> [ 2566.467326] Call Trace:
> [ 2566.475236]  [] __cfq_dispatch_requests+0x79/0x170
> [ 2566.492224]  [] do_generic_mapping_read+0x281/0x470
> [ 2566.509473]  [] cfq_dispatch_requests+0x69/0x90
> [ 2566.525681]  [] elv_next_request+0x39/0x130
> [ 2566.540850]  [] bio_endio+0x5d/0x90
> [ 2566.553942]  [] scsi_request_fn+0x45/0x280
> [ 2566.568851]  [] blk_run_queue+0x32/0x70
> [ 2566.582982]  [] scsi_next_command+0x30/0x50
> [ 2566.598154]  [] scsi_end_request+0x9b/0xc0
> [ 2566.613063]  [] scsi_io_completion+0x81/0x330
> [ 2566.628751]  [] scsi_delete_timer+0xb/0x20
> [ 2566.643661]  [] ata_scsi_qc_complete+0x65/0xd0
> [ 2566.659613]  [] sd_rw_intr+0x8b/0x220
> [ 2566.673222]  [] ata_altstatus+0x1c/0x20
> [ 2566.687352]  [] ata_hsm_move+0x14d/0x3f0
> [ 2566.701744]  [] scsi_finish_command+0x40/0x60
> [ 2566.717434]  [] scsi_softirq_done+0x6f/0xe0
> [ 2566.732604]  [] sil_interrupt+0x81/0x90
> [ 2566.746733]  [] blk_done_softirq+0x58/0x70
> [ 2566.761644]  [] __do_softirq+0x6f/0x80
> [ 2566.775516]  [] do_softirq+0x27/0x30
> [ 2566.788866]  [] do_IRQ+0x3e/0x80
> [ 2566.801177]  [] common_interrupt+0x23/0x28
> [ 2566.816090]  ===
> [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 
> 83 ec 10 89 1c 24 89 6c
> 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04 89 
> d0 e8 4a fa ff ff 8b 45 14
> 89 ea 25 01 80
> [ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP 
> 0068:f6945de0
> [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-15 Thread Brad Campbell


Brad Campbell wrote:

G'day all,

All I have is a digital photo of this oops. (It's 3.5mb). I have serial 
console configured, but Murphy is watching me carefully and I just can't 
seem to reproduce it while logging the console output.




And as usual, after trying to capture one for 4 days, I get one 10 mins after 
I've sent the E-mail :)

I think I've just found a way to make this easier to reproduce as /dev/sdd was 
not even mounted this
time. I just cold booted and started an md5sum -c run on a directory of about 
180GB.

[ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 005c
[ 2566.218242]  printing eip:
[ 2566.226362] c0203169
[ 2566.232906] *pde = 
[ 2566.241274] Oops:  [#1]
[ 2566.249637] Modules linked in:
[ 2566.258832] CPU:0
[ 2566.258833] EIP:0060:[]Not tainted VLI
[ 2566.258834] EFLAGS: 00010082   (2.6.21-rc6-git5 #1)
[ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70
[ 2566.310761] eax: f7a0eae0   ebx: f7a0cb28   ecx: e2f869e8   edx: 
[ 2566.331076] esi: f79fea7c   edi: f7d04ac0   ebp:    esp: f6945de0
[ 2566.351388] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 
task.ti=f6944000)
[ 2566.390975] Stack:  f79fea7c f7d04ac0  c02032d9 f6ae5ef0 
c0133411 1000
[ 2566.416414]0008  0004 0b582fd4 f79fea7c f7d04ac0 
f79fea7c 
[ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 
 c016ea4d
[ 2566.467326] Call Trace:
[ 2566.475236]  [] __cfq_dispatch_requests+0x79/0x170
[ 2566.492224]  [] do_generic_mapping_read+0x281/0x470
[ 2566.509473]  [] cfq_dispatch_requests+0x69/0x90
[ 2566.525681]  [] elv_next_request+0x39/0x130
[ 2566.540850]  [] bio_endio+0x5d/0x90
[ 2566.553942]  [] scsi_request_fn+0x45/0x280
[ 2566.568851]  [] blk_run_queue+0x32/0x70
[ 2566.582982]  [] scsi_next_command+0x30/0x50
[ 2566.598154]  [] scsi_end_request+0x9b/0xc0
[ 2566.613063]  [] scsi_io_completion+0x81/0x330
[ 2566.628751]  [] scsi_delete_timer+0xb/0x20
[ 2566.643661]  [] ata_scsi_qc_complete+0x65/0xd0
[ 2566.659613]  [] sd_rw_intr+0x8b/0x220
[ 2566.673222]  [] ata_altstatus+0x1c/0x20
[ 2566.687352]  [] ata_hsm_move+0x14d/0x3f0
[ 2566.701744]  [] scsi_finish_command+0x40/0x60
[ 2566.717434]  [] scsi_softirq_done+0x6f/0xe0
[ 2566.732604]  [] sil_interrupt+0x81/0x90
[ 2566.746733]  [] blk_done_softirq+0x58/0x70
[ 2566.761644]  [] __do_softirq+0x6f/0x80
[ 2566.775516]  [] do_softirq+0x27/0x30
[ 2566.788866]  [] do_IRQ+0x3e/0x80
[ 2566.801177]  [] common_interrupt+0x23/0x28
[ 2566.816090]  ===
[ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 
ec 10 89 1c 24 89 6c
24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04 89 d0 e8 
4a fa ff ff 8b 45 14
89 ea 25 01 80
[ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP 
0068:f6945de0
[ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt

--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-15 Thread Adrian Bunk

[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ]

On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote:
 Brad Campbell wrote:
 G'day all,
 
 All I have is a digital photo of this oops. (It's 3.5mb). I have serial 
 console configured, but Murphy is watching me carefully and I just can't 
 seem to reproduce it while logging the console output.
 
 
 And as usual, after trying to capture one for 4 days, I get one 10 mins 
 after I've sent the E-mail :)
 
 I think I've just found a way to make this easier to reproduce as /dev/sdd 
 was not even mounted this
 time. I just cold booted and started an md5sum -c run on a directory of 
 about 180GB.


Is this also present with kernel 2.6.20 or is it a regression?


 [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at 
 virtual address 005c
 [ 2566.218242]  printing eip:
 [ 2566.226362] c0203169
 [ 2566.232906] *pde = 
 [ 2566.241274] Oops:  [#1]
 [ 2566.249637] Modules linked in:
 [ 2566.258832] CPU:0
 [ 2566.258833] EIP:0060:[c0203169]Not tainted VLI
 [ 2566.258834] EFLAGS: 00010082   (2.6.21-rc6-git5 #1)
 [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70
 [ 2566.310761] eax: f7a0eae0   ebx: f7a0cb28   ecx: e2f869e8   edx: 
 [ 2566.331076] esi: f79fea7c   edi: f7d04ac0   ebp:    esp: f6945de0
 [ 2566.351388] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
 [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 
 task.ti=f6944000)
 [ 2566.390975] Stack:  f79fea7c f7d04ac0  c02032d9 f6ae5ef0 
 c0133411 1000
 [ 2566.416414]0008  0004 0b582fd4 f79fea7c f7d04ac0 
 f79fea7c 
 [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 
  c016ea4d
 [ 2566.467326] Call Trace:
 [ 2566.475236]  [c02032d9] __cfq_dispatch_requests+0x79/0x170
 [ 2566.492224]  [c0133411] do_generic_mapping_read+0x281/0x470
 [ 2566.509473]  [c0203519] cfq_dispatch_requests+0x69/0x90
 [ 2566.525681]  [c01fb7a9] elv_next_request+0x39/0x130
 [ 2566.540850]  [c016ea4d] bio_endio+0x5d/0x90
 [ 2566.553942]  [c0268635] scsi_request_fn+0x45/0x280
 [ 2566.568851]  [c01fdde2] blk_run_queue+0x32/0x70
 [ 2566.582982]  [c0267ba0] scsi_next_command+0x30/0x50
 [ 2566.598154]  [c0267c8b] scsi_end_request+0x9b/0xc0
 [ 2566.613063]  [c0267df1] scsi_io_completion+0x81/0x330
 [ 2566.628751]  [c0265dab] scsi_delete_timer+0xb/0x20
 [ 2566.643661]  [c02770f5] ata_scsi_qc_complete+0x65/0xd0
 [ 2566.659613]  [c026d47b] sd_rw_intr+0x8b/0x220
 [ 2566.673222]  [c0278ecc] ata_altstatus+0x1c/0x20
 [ 2566.687352]  [c027394d] ata_hsm_move+0x14d/0x3f0
 [ 2566.701744]  [c0263f80] scsi_finish_command+0x40/0x60
 [ 2566.717434]  [c026857f] scsi_softirq_done+0x6f/0xe0
 [ 2566.732604]  [c027d521] sil_interrupt+0x81/0x90
 [ 2566.746733]  [c01ff9b8] blk_done_softirq+0x58/0x70
 [ 2566.761644]  [c011721f] __do_softirq+0x6f/0x80
 [ 2566.775516]  [c0117257] do_softirq+0x27/0x30
 [ 2566.788866]  [c0104cee] do_IRQ+0x3e/0x80
 [ 2566.801177]  [c010322f] common_interrupt+0x23/0x28
 [ 2566.816090]  ===
 [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 
 83 ec 10 89 1c 24 89 6c
 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04 89 
 d0 e8 4a fa ff ff 8b 45 14
 89 ea 25 01 80
 [ 2566.886586] EIP: [c0203169] cfq_dispatch_insert+0x19/0x70 SS:ESP 
 0068:f6945de0
 [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-15 Thread Brad Campbell


Adrian Bunk wrote:

[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ]

On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote:

Brad Campbell wrote:

G'day all,

All I have is a digital photo of this oops. (It's 3.5mb). I have serial 
console configured, but Murphy is watching me carefully and I just can't 
seem to reproduce it while logging the console output.


And as usual, after trying to capture one for 4 days, I get one 10 mins 
after I've sent the E-mail :)


I think I've just found a way to make this easier to reproduce as /dev/sdd 
was not even mounted this
time. I just cold booted and started an md5sum -c run on a directory of 
about 180GB.



Is this also present with kernel 2.6.20 or is it a regression?



Yes, I reproduced it with 2.6.20 but appear to be unable to do so with 2.6.19.



--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-15 Thread Brad Campbell


Brad Campbell wrote:

G'day all,

All I have is a digital photo of this oops. (It's 3.5mb). I have serial 
console configured, but Murphy is watching me carefully and I just can't 
seem to reproduce it while logging the console output.




And as usual, after trying to capture one for 4 days, I get one 10 mins after 
I've sent the E-mail :)

I think I've just found a way to make this easier to reproduce as /dev/sdd was 
not even mounted this
time. I just cold booted and started an md5sum -c run on a directory of about 
180GB.

[ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 005c
[ 2566.218242]  printing eip:
[ 2566.226362] c0203169
[ 2566.232906] *pde = 
[ 2566.241274] Oops:  [#1]
[ 2566.249637] Modules linked in:
[ 2566.258832] CPU:0
[ 2566.258833] EIP:0060:[c0203169]Not tainted VLI
[ 2566.258834] EFLAGS: 00010082   (2.6.21-rc6-git5 #1)
[ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70
[ 2566.310761] eax: f7a0eae0   ebx: f7a0cb28   ecx: e2f869e8   edx: 
[ 2566.331076] esi: f79fea7c   edi: f7d04ac0   ebp:    esp: f6945de0
[ 2566.351388] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 
task.ti=f6944000)
[ 2566.390975] Stack:  f79fea7c f7d04ac0  c02032d9 f6ae5ef0 
c0133411 1000
[ 2566.416414]0008  0004 0b582fd4 f79fea7c f7d04ac0 
f79fea7c 
[ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 
 c016ea4d
[ 2566.467326] Call Trace:
[ 2566.475236]  [c02032d9] __cfq_dispatch_requests+0x79/0x170
[ 2566.492224]  [c0133411] do_generic_mapping_read+0x281/0x470
[ 2566.509473]  [c0203519] cfq_dispatch_requests+0x69/0x90
[ 2566.525681]  [c01fb7a9] elv_next_request+0x39/0x130
[ 2566.540850]  [c016ea4d] bio_endio+0x5d/0x90
[ 2566.553942]  [c0268635] scsi_request_fn+0x45/0x280
[ 2566.568851]  [c01fdde2] blk_run_queue+0x32/0x70
[ 2566.582982]  [c0267ba0] scsi_next_command+0x30/0x50
[ 2566.598154]  [c0267c8b] scsi_end_request+0x9b/0xc0
[ 2566.613063]  [c0267df1] scsi_io_completion+0x81/0x330
[ 2566.628751]  [c0265dab] scsi_delete_timer+0xb/0x20
[ 2566.643661]  [c02770f5] ata_scsi_qc_complete+0x65/0xd0
[ 2566.659613]  [c026d47b] sd_rw_intr+0x8b/0x220
[ 2566.673222]  [c0278ecc] ata_altstatus+0x1c/0x20
[ 2566.687352]  [c027394d] ata_hsm_move+0x14d/0x3f0
[ 2566.701744]  [c0263f80] scsi_finish_command+0x40/0x60
[ 2566.717434]  [c026857f] scsi_softirq_done+0x6f/0xe0
[ 2566.732604]  [c027d521] sil_interrupt+0x81/0x90
[ 2566.746733]  [c01ff9b8] blk_done_softirq+0x58/0x70
[ 2566.761644]  [c011721f] __do_softirq+0x6f/0x80
[ 2566.775516]  [c0117257] do_softirq+0x27/0x30
[ 2566.788866]  [c0104cee] do_IRQ+0x3e/0x80
[ 2566.801177]  [c010322f] common_interrupt+0x23/0x28
[ 2566.816090]  ===
[ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 
ec 10 89 1c 24 89 6c
24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04 89 d0 e8 
4a fa ff ff 8b 45 14
89 ea 25 01 80
[ 2566.886586] EIP: [c0203169] cfq_dispatch_insert+0x19/0x70 SS:ESP 
0068:f6945de0
[ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt

--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

68 matches

Mail list logo