Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi Jens! On 25 Apr 2007, at 12:18, Jens Axboe wrote: On Wed, Apr 25 2007, Brad Campbell wrote: Jens Axboe wrote: It looks to be extremely rare. Aliases are extremely rare, front merges are rare. And you need both to happen with the details you outlined. But it's a large user base, and we've had 3-4 reports on this in the past months. So it obviously does happen. I could not make it trigger without doctoring the unplug code when I used aio. Well, not that rare on this particular machine (I had a case where I could reproduce it in less than an hour of normal use previously on this box), and I've had it occur a number of times on my servers, I just never reported it before as I never took the time to set up a serial console and capture the oops. Extremely rare in the sense that it takes md and some certain conditions to happen for it to trigger. So for most people it'll be extremely rare, and for others (such as yourself) that hit it, it wont be so rare :-) Here's a fix for it, confirmed. Shall I leave the other debugging in, apply this and run it for a few hard days? Yes, that would be perfect! Okay, I left all debugging patches in, disabled all kernel debugging .config stuff and gave it a spin with our usual "killer" workload (as far as batch systems are repeatable anyway) and so far there was not a single glitch or message, so I preliminarily conclude that the bug is squashed. The final word will come once my 1800 batch jobs are processed and I have my particle physics result ;-) Ciao, Roland -- TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching Telefon 089/289-12575; Telefax 089/289-12570 -- CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482 -- Any society that would give up a little liberty to gain a little security will deserve neither and lose both. - Benjamin Franklin -BEGIN GEEK CODE BLOCK- Version: 3.12 GS/CS/M/MU d-(++) s:+ a-> C+++ UL P+++ L+++ E(+) W+ !N K- w--- M + !V Y+ PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++> h y+++ --END GEEK CODE BLOCK-- smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part
Degraded RAID performance - Was : Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: I wonder if we should avoid bypassing the stripe cache if the needed stripes are already in the cache... or if at least one needed stripe is or if the array is degraded... Probably in the degraded case we should never bypass the cache, as if we do, then a sequential read of a full stripe will read every block twice. I'd better to some performance measurements. Ok, that would explain some odd performance issues I've noticed. Let's say I run dstat -D sda,sdb,sdc,sdd,md0 5 total-cpu-usage --disk/sdadisk/sdbdisk/sdcdisk/sdddisk/md0- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq|_read write _read write _read write _read write _read write|_recv _send|__in_ _out_|_int_ _csw_ 25 22 0 47 0 6|20.1M0 :20.2M0 :20.1M0 : 0 0 :40.2M0 | 146B 662B| 0 0 |1186 661 26 20 0 46 0 8|19.4M0 :19.4M0 :19.4M0 : 0 0 :38.9M0 | 160B 549B| 0 0 |1365 650 Given I'm doing a read, I would have expected a read to consist of 2 direct reads, one parity read and some calculation. The numbers I'm seeing however show 3 reads for 2 reads worth of bandwidth. [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda[0] sdc[2] sdb[1] 585934080 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] (Dropped Jens and Chuck from the cc as this likely has little interest for them) Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wednesday April 25, [EMAIL PROTECTED] wrote: > > [ 756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb() > [ 756.329615] [] cfq_merged_request+0x71/0x80 > [ 756.345046] [] cfq_merged_request+0x0/0x80 > [ 756.360216] [] elv_merged_request+0x4e/0x50 > [ 756.375647] [] __make_request+0x1a7/0x2f0 > [ 756.390557] [] generic_make_request+0x127/0x190 > [ 756.407025] [] chunk_aligned_read+0x111/0x1c0 ^ > [ 756.422974] [] generic_make_request+0x127/0x190 > [ 756.439443] [] make_request+0x388/0x3b0 > [ 756.453834] [] __bio_add_page+0x147/0x1c0 > [ 756.468742] [] bio_alloc_bioset+0x7f/0x150 > [ 756.483913] [] raid5_mergeable_bvec+0x0/0x90 > [ 756.499604] [] bio_add_page+0x37/0x50 > [ 756.513473] [] generic_make_request+0x127/0x190 > [ 756.529943] [] mempool_free+0x2a/0x60 > [ 756.543812] [] bio_free+0x1d/0x40 > [ 756.556645] [] submit_bio+0x46/0xc0 > [ 756.571389] [] radix_tree_node_alloc+0x16/0x60 > [ 756.587610] [] radix_tree_insert+0xe2/0x130 > [ 756.603039] [] __pagevec_lru_add+0x75/0x80 > [ 756.618207] [] mpage_bio_submit+0x11/0x20 > [ 756.633117] [] mpage_readpages+0x100/0x140 ^^^ So it is a regular data read (not metadata of any sort) which is by-passing the stripe cache. I was expecting some sort of metadata read. I wander what this is merging in-front of... maybe a stripe-cache read. I guess it doesn't tell us anything really interesting, which is probably a good thing - it means everything else is working properly. I wonder if we should avoid bypassing the stripe cache if the needed stripes are already in the cache... or if at least one needed stripe is or if the array is degraded... Probably in the degraded case we should never bypass the cache, as if we do, then a sequential read of a full stripe will read every block twice. I'd better to some performance measurements. Thanks for all the patient testing. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Neil Brown wrote: > On Wednesday April 25, [EMAIL PROTECTED] wrote: > > > > Here's a fix for it, confirmed. > > > > Patch looks good to me. Good! And thanks for taking the time to look at this bug btw, it's had me puzzled a bit this week. I should not have ruled out the aliases I guess, they must be coming elsewhere than from O_DIRECT. > Hopefully Brad can still wait for the WARN_ON to fire - it might give > useful clues to why this is happening. It might be interesting. But he did and reported it about an hour ago, so all should be well. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wednesday April 25, [EMAIL PROTECTED] wrote: > > Here's a fix for it, confirmed. > Patch looks good to me. Hopefully Brad can still wait for the WARN_ON to fire - it might give useful clues to why this is happening. It might be interesting. Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Brad Campbell wrote: > Jens Axboe wrote: > > >It looks to be extremely rare. Aliases are extremely rare, front merges > >are rare. And you need both to happen with the details you outlined. But > >it's a large user base, and we've had 3-4 reports on this in the past > >months. So it obviously does happen. I could not make it trigger without > >doctoring the unplug code when I used aio. > > Well, not that rare on this particular machine (I had a case where I could > reproduce it in less than an hour of normal use previously on this box), > and I've had it occur a number of times on my servers, I just never > reported it before as I never took the time to set up a serial console and > capture the oops. Extremely rare in the sense that it takes md and some certain conditions to happen for it to trigger. So for most people it'll be extremely rare, and for others (such as yourself) that hit it, it wont be so rare :-) > >Here's a fix for it, confirmed. > > Shall I leave the other debugging in, apply this and run it for a few hard > days? Yes, that would be perfect! -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: You could test this theory by putting a WARN_ON(cfqq->next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. [ 756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb() [ 756.329615] [] cfq_merged_request+0x71/0x80 [ 756.345046] [] cfq_merged_request+0x0/0x80 [ 756.360216] [] elv_merged_request+0x4e/0x50 [ 756.375647] [] __make_request+0x1a7/0x2f0 [ 756.390557] [] generic_make_request+0x127/0x190 [ 756.407025] [] chunk_aligned_read+0x111/0x1c0 [ 756.422974] [] generic_make_request+0x127/0x190 [ 756.439443] [] make_request+0x388/0x3b0 [ 756.453834] [] __bio_add_page+0x147/0x1c0 [ 756.468742] [] bio_alloc_bioset+0x7f/0x150 [ 756.483913] [] raid5_mergeable_bvec+0x0/0x90 [ 756.499604] [] bio_add_page+0x37/0x50 [ 756.513473] [] generic_make_request+0x127/0x190 [ 756.529943] [] mempool_free+0x2a/0x60 [ 756.543812] [] bio_free+0x1d/0x40 [ 756.556645] [] submit_bio+0x46/0xc0 [ 756.571389] [] radix_tree_node_alloc+0x16/0x60 [ 756.587610] [] radix_tree_insert+0xe2/0x130 [ 756.603039] [] __pagevec_lru_add+0x75/0x80 [ 756.618207] [] mpage_bio_submit+0x11/0x20 [ 756.633117] [] mpage_readpages+0x100/0x140 [ 756.648287] [] ext3_get_block+0x0/0xe0 [ 756.662419] [] scsi_end_request+0x9b/0xc0 [ 756.677329] [] scsi_delete_timer+0xb/0x20 [ 756.692240] [] ext3_readpages+0x0/0x20 [ 756.706369] [] read_pages+0xd1/0xe0 [ 756.719719] [] ext3_get_block+0x0/0xe0 [ 756.733853] [] __alloc_pages+0x54/0x2d0 [ 756.748242] [] scsi_softirq_done+0x6f/0xe0 [ 756.763413] [] sil_interrupt+0x81/0x90 [ 756.777544] [] blk_done_softirq+0x58/0x70 [ 756.792454] [] __do_page_cache_readahead+0xa8/0x110 [ 756.809961] [] blockable_page_cache_readahead+0x51/0xd0 [ 756.828508] [] make_ahead_window+0x74/0xc0 [ 756.843679] [] page_cache_readahead+0x9a/0x180 [ 756.859886] [] file_read_actor+0xc4/0xd0 [ 756.874537] [] do_generic_mapping_read+0x467/0x470 [ 756.891786] [] generic_file_aio_read+0x183/0x1c0 [ 756.908514] [] file_read_actor+0x0/0xd0 [ 756.922903] [] scsi_next_command+0x30/0x50 [ 756.938073] [] do_sync_read+0xc0/0x100 [ 756.952204] [] autoremove_wake_function+0x0/0x50 [ 756.968934] [] scsi_finish_command+0x40/0x60 [ 756.984622] [] schedule+0x2e5/0x590 [ 756.997972] [] vfs_read+0x86/0x110 [ 757.011065] [] sys_read+0x47/0x80 [ 757.023897] [] syscall_call+0x7/0xb [ 757.037250] === I know this is late given a fix has been issued, but just got this in the logs with no other debug info at all. This kernel has *all* Jens debugging patches + the snippet from Neil above. I'll apply the fix now and run it for a couple of days. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: It looks to be extremely rare. Aliases are extremely rare, front merges are rare. And you need both to happen with the details you outlined. But it's a large user base, and we've had 3-4 reports on this in the past months. So it obviously does happen. I could not make it trigger without doctoring the unplug code when I used aio. Well, not that rare on this particular machine (I had a case where I could reproduce it in less than an hour of normal use previously on this box), and I've had it occur a number of times on my servers, I just never reported it before as I never took the time to set up a serial console and capture the oops. Here's a fix for it, confirmed. Shall I leave the other debugging in, apply this and run it for a few hard days? Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: On Wednesday April 25, [EMAIL PROTECTED] wrote: BUT... That may explain while we are only seeing it on md. Would md ever be issuing such requests that trigger this condition? Can someone remind me which raid level(s) was/were involved? Raid-5 gegraded here, But I've had it on my server on a fully functioning raid-5 also. Don't think I've seen it on any of my raid-6 yet. I guess it could happen with raid4/5/6. A read request that was properly aligned (and we do encourage proper alignment) will be passed directly to the underlying device. A concurrent write elsewhere could require the same block to be read into the stripe-cache to enable a parity calculation. So you could get two reads at the same block address. Except all the filesystems I've had this occur on are Ext3 and all are mounted noatime, with no processes ever writing to the filesystems. So here it occurs with reads only. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Neil Brown wrote: > On Wednesday April 25, [EMAIL PROTECTED] wrote: > > > > That's pretty close to where I think the problem is (the front merging > > and cfq_reposition_rq_rb()). The issue with that is that you'd only get > > aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case > > here. Given that front merges are equally not very likely, I'd be > > surprised is something like that has ever happened. > > Well it certainly doesn't happen very often > And I can imagine a filesystem genuinely wanting to read the same > block twice - maybe a block that contained packed tails of two > different files. But that usually happens through the page cache, so it's nicely serialized. You'd only ever get duplicate/alias blocks if you go raw, or have some layer on top of your block device (like md) that issues io on its own. > > BUT... That may explain while we are only seeing it on md. Would md > > ever be issuing such requests that trigger this condition? > > Can someone remind me which raid level(s) was/were involved? > > I think one was raid0 - that just passes requests on from the > filesystem, so md would only issue requests like that if the > filesystem did. Yep, one was raid0. > I guess it could happen with raid4/5/6. A read request that was > properly aligned (and we do encourage proper alignment) will be passed > directly to the underlying device. A concurrent write elsewhere could > require the same block to be read into the stripe-cache to enable a > parity calculation. So you could get two reads at the same block > address. > Getting a front-merge would probably require two stripe-heads to be > processed in reverse-sector order, and it tends to preserve the order > of incoming requests (though that isn't firmly enforced). > > raid1 is much like raid0 (with totally different code) in that the > request pattern seen by the underlying device matches the request > pattern generated by the filesystem. > > If I read the debugging output correctly, the request which I > hypothesise was the subject of a front-merge is a 'sync' request. > raid5 does not generate sync requests to fill the stripe cache (maybe > it should?) so I really think it must have come directly from the > filesystem. > > (just checked previous email for more detail of when it hits) > The fact that it hits degraded arrays more easily is interesting. > Maybe we try to read a block on the missing device and so schedule > reads for the other devices. Then we try to read a block on a good > device and issue a request for exactly the same block that raid5 asked > for. That still doesn't explain the 'sync' and the 'front merge'. > But that is quite possible, just not common maybe. > > It doesn't help us understand the raid0 example though. May it is > just a 'can happen, but only rarely' thing. It looks to be extremely rare. Aliases are extremely rare, front merges are rare. And you need both to happen with the details you outlined. But it's a large user base, and we've had 3-4 reports on this in the past months. So it obviously does happen. I could not make it trigger without doctoring the unplug code when I used aio. Here's a fix for it, confirmed. diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 9e37971..f965be7 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -532,6 +532,11 @@ static void cfq_add_rq_rb(struct request *rq) if (!cfq_cfqq_on_rr(cfqq)) cfq_add_cfqq_rr(cfqd, cfqq); + + /* +* check if this request is a better next-serve candidate +*/ + cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq); } static inline void @@ -1639,12 +1644,6 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, cfqq->meta_pending++; /* -* check if this request is a better next-serve candidate)) { -*/ - cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq); - BUG_ON(!cfqq->next_rq); - - /* * we never wait for an async request and we don't allow preemption * of an async request. so just return early */ -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wednesday April 25, [EMAIL PROTECTED] wrote: > > That's pretty close to where I think the problem is (the front merging > and cfq_reposition_rq_rb()). The issue with that is that you'd only get > aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case > here. Given that front merges are equally not very likely, I'd be > surprised is something like that has ever happened. Well it certainly doesn't happen very often And I can imagine a filesystem genuinely wanting to read the same block twice - maybe a block that contained packed tails of two different files. > > BUT... That may explain while we are only seeing it on md. Would md > ever be issuing such requests that trigger this condition? Can someone remind me which raid level(s) was/were involved? I think one was raid0 - that just passes requests on from the filesystem, so md would only issue requests like that if the filesystem did. I guess it could happen with raid4/5/6. A read request that was properly aligned (and we do encourage proper alignment) will be passed directly to the underlying device. A concurrent write elsewhere could require the same block to be read into the stripe-cache to enable a parity calculation. So you could get two reads at the same block address. Getting a front-merge would probably require two stripe-heads to be processed in reverse-sector order, and it tends to preserve the order of incoming requests (though that isn't firmly enforced). raid1 is much like raid0 (with totally different code) in that the request pattern seen by the underlying device matches the request pattern generated by the filesystem. If I read the debugging output correctly, the request which I hypothesise was the subject of a front-merge is a 'sync' request. raid5 does not generate sync requests to fill the stripe cache (maybe it should?) so I really think it must have come directly from the filesystem. (just checked previous email for more detail of when it hits) The fact that it hits degraded arrays more easily is interesting. Maybe we try to read a block on the missing device and so schedule reads for the other devices. Then we try to read a block on a good device and issue a request for exactly the same block that raid5 asked for. That still doesn't explain the 'sync' and the 'front merge'. But that is quite possible, just not common maybe. It doesn't help us understand the raid0 example though. May it is just a 'can happen, but only rarely' thing. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Jens Axboe wrote: > On Wed, Apr 25 2007, Neil Brown wrote: > > On Tuesday April 24, [EMAIL PROTECTED] wrote: > > > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, > > > report the issue to > > > [EMAIL PROTECTED] > > > [105449.683646] cfq: busy=1,drv=0,timer=0 > > > [105449.694871] cfq rr_list: > > > [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 > > > [105449.720693] cfq busy_list: > > > [105449.729054] cfq idle_list: > > > [105449.737418] cfq cur_rr: > > > > Ok, I have a theory. > > > > An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls > > ->elevator_merged_fn which is cfq_merged_request. > > > > At this time there is already a request in cfq with the same sector > > number, and that request is the only other request on the queue. > > > > cfq_merged_request calls cfq_reposition_rq_rb which removes the > > req from ->sortlist and then calls cfq_add_rq_rb to add it back (at > > the new location because ->sector has changed). > > > > cfq_add_rq_rb finds there is already a request with the same sector > > number and so elv_rb_add returns an __alias which is passed to > > cfq_dispatch_insert. > > This calls cfq_remove_request and as that is the only request present, > > ->next_rq gets set to NULL. > > The old request with the new sector number is then added to the > > ->sortlist, but ->next_rq is never set - it remains NULL. > > > > How likely it would be to get two requests with the same sector number > > I don't know. I wouldn't expect it to ever happen - I have seen it > > before, but it was due to a bug in ext3. Maybe XFS does it > > intentionally some times? > > > > You could test this theory by putting a > >WARN_ON(cfqq->next_rq == NULL); > > at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. > > > > I will leave the development of a suitable fix up to Jens if he agrees > > that this is possible. > > That's pretty close to where I think the problem is (the front merging > and cfq_reposition_rq_rb()). The issue with that is that you'd only get > aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case > here. Given that front merges are equally not very likely, I'd be > surprised is something like that has ever happened. > > BUT... That may explain while we are only seeing it on md. Would md > ever be issuing such requests that trigger this condition? > > I'll try and concoct a test case. I made a test case and reproduced it, this is indeed what is happening. md must be issuing direct requests in such a manner, that you do get a front merge and then an alias to an existing request. Will test a fix! -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: How likely it would be to get two requests with the same sector number I don't know. I wouldn't expect it to ever happen - I have seen it before, but it was due to a bug in ext3. Maybe XFS does it intentionally some times? It certainly sounds like an odd thing to occur. Even stranger that it's easier to hit on a degraded array or an array being checked. I *am* using ext3 on this box (and all my boxes in fact) You could test this theory by putting a WARN_ON(cfqq->next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. I've done that.. now to wait for it to hit again. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Neil Brown wrote: > On Tuesday April 24, [EMAIL PROTECTED] wrote: > > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, > > report the issue to > > [EMAIL PROTECTED] > > [105449.683646] cfq: busy=1,drv=0,timer=0 > > [105449.694871] cfq rr_list: > > [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 > > [105449.720693] cfq busy_list: > > [105449.729054] cfq idle_list: > > [105449.737418] cfq cur_rr: > > Ok, I have a theory. > > An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls > ->elevator_merged_fn which is cfq_merged_request. > > At this time there is already a request in cfq with the same sector > number, and that request is the only other request on the queue. > > cfq_merged_request calls cfq_reposition_rq_rb which removes the > req from ->sortlist and then calls cfq_add_rq_rb to add it back (at > the new location because ->sector has changed). > > cfq_add_rq_rb finds there is already a request with the same sector > number and so elv_rb_add returns an __alias which is passed to > cfq_dispatch_insert. > This calls cfq_remove_request and as that is the only request present, > ->next_rq gets set to NULL. > The old request with the new sector number is then added to the > ->sortlist, but ->next_rq is never set - it remains NULL. > > How likely it would be to get two requests with the same sector number > I don't know. I wouldn't expect it to ever happen - I have seen it > before, but it was due to a bug in ext3. Maybe XFS does it > intentionally some times? > > You could test this theory by putting a >WARN_ON(cfqq->next_rq == NULL); > at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. > > I will leave the development of a suitable fix up to Jens if he agrees > that this is possible. That's pretty close to where I think the problem is (the front merging and cfq_reposition_rq_rb()). The issue with that is that you'd only get aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case here. Given that front merges are equally not very likely, I'd be surprised is something like that has ever happened. BUT... That may explain while we are only seeing it on md. Would md ever be issuing such requests that trigger this condition? I'll try and concoct a test case. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tuesday April 24, [EMAIL PROTECTED] wrote: > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, > report the issue to > [EMAIL PROTECTED] > [105449.683646] cfq: busy=1,drv=0,timer=0 > [105449.694871] cfq rr_list: > [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 > [105449.720693] cfq busy_list: > [105449.729054] cfq idle_list: > [105449.737418] cfq cur_rr: Ok, I have a theory. An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls ->elevator_merged_fn which is cfq_merged_request. At this time there is already a request in cfq with the same sector number, and that request is the only other request on the queue. cfq_merged_request calls cfq_reposition_rq_rb which removes the req from ->sortlist and then calls cfq_add_rq_rb to add it back (at the new location because ->sector has changed). cfq_add_rq_rb finds there is already a request with the same sector number and so elv_rb_add returns an __alias which is passed to cfq_dispatch_insert. This calls cfq_remove_request and as that is the only request present, ->next_rq gets set to NULL. The old request with the new sector number is then added to the ->sortlist, but ->next_rq is never set - it remains NULL. How likely it would be to get two requests with the same sector number I don't know. I wouldn't expect it to ever happen - I have seen it before, but it was due to a bug in ext3. Maybe XFS does it intentionally some times? You could test this theory by putting a WARN_ON(cfqq->next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. I will leave the development of a suitable fix up to Jens if he agrees that this is possible. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tuesday April 24, [EMAIL PROTECTED] wrote: [105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [105449.683646] cfq: busy=1,drv=0,timer=0 [105449.694871] cfq rr_list: [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [105449.720693] cfq busy_list: [105449.729054] cfq idle_list: [105449.737418] cfq cur_rr: Ok, I have a theory. An ELEVATOR_FRONT_MERGE occurs which changes req-sector and calls -elevator_merged_fn which is cfq_merged_request. At this time there is already a request in cfq with the same sector number, and that request is the only other request on the queue. cfq_merged_request calls cfq_reposition_rq_rb which removes the req from -sortlist and then calls cfq_add_rq_rb to add it back (at the new location because -sector has changed). cfq_add_rq_rb finds there is already a request with the same sector number and so elv_rb_add returns an __alias which is passed to cfq_dispatch_insert. This calls cfq_remove_request and as that is the only request present, -next_rq gets set to NULL. The old request with the new sector number is then added to the -sortlist, but -next_rq is never set - it remains NULL. How likely it would be to get two requests with the same sector number I don't know. I wouldn't expect it to ever happen - I have seen it before, but it was due to a bug in ext3. Maybe XFS does it intentionally some times? You could test this theory by putting a WARN_ON(cfqq-next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. I will leave the development of a suitable fix up to Jens if he agrees that this is possible. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Neil Brown wrote: On Tuesday April 24, [EMAIL PROTECTED] wrote: [105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [105449.683646] cfq: busy=1,drv=0,timer=0 [105449.694871] cfq rr_list: [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [105449.720693] cfq busy_list: [105449.729054] cfq idle_list: [105449.737418] cfq cur_rr: Ok, I have a theory. An ELEVATOR_FRONT_MERGE occurs which changes req-sector and calls -elevator_merged_fn which is cfq_merged_request. At this time there is already a request in cfq with the same sector number, and that request is the only other request on the queue. cfq_merged_request calls cfq_reposition_rq_rb which removes the req from -sortlist and then calls cfq_add_rq_rb to add it back (at the new location because -sector has changed). cfq_add_rq_rb finds there is already a request with the same sector number and so elv_rb_add returns an __alias which is passed to cfq_dispatch_insert. This calls cfq_remove_request and as that is the only request present, -next_rq gets set to NULL. The old request with the new sector number is then added to the -sortlist, but -next_rq is never set - it remains NULL. How likely it would be to get two requests with the same sector number I don't know. I wouldn't expect it to ever happen - I have seen it before, but it was due to a bug in ext3. Maybe XFS does it intentionally some times? You could test this theory by putting a WARN_ON(cfqq-next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. I will leave the development of a suitable fix up to Jens if he agrees that this is possible. That's pretty close to where I think the problem is (the front merging and cfq_reposition_rq_rb()). The issue with that is that you'd only get aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case here. Given that front merges are equally not very likely, I'd be surprised is something like that has ever happened. BUT... That may explain while we are only seeing it on md. Would md ever be issuing such requests that trigger this condition? I'll try and concoct a test case. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: How likely it would be to get two requests with the same sector number I don't know. I wouldn't expect it to ever happen - I have seen it before, but it was due to a bug in ext3. Maybe XFS does it intentionally some times? It certainly sounds like an odd thing to occur. Even stranger that it's easier to hit on a degraded array or an array being checked. I *am* using ext3 on this box (and all my boxes in fact) You could test this theory by putting a WARN_ON(cfqq-next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. I've done that.. now to wait for it to hit again. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Jens Axboe wrote: On Wed, Apr 25 2007, Neil Brown wrote: On Tuesday April 24, [EMAIL PROTECTED] wrote: [105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [105449.683646] cfq: busy=1,drv=0,timer=0 [105449.694871] cfq rr_list: [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [105449.720693] cfq busy_list: [105449.729054] cfq idle_list: [105449.737418] cfq cur_rr: Ok, I have a theory. An ELEVATOR_FRONT_MERGE occurs which changes req-sector and calls -elevator_merged_fn which is cfq_merged_request. At this time there is already a request in cfq with the same sector number, and that request is the only other request on the queue. cfq_merged_request calls cfq_reposition_rq_rb which removes the req from -sortlist and then calls cfq_add_rq_rb to add it back (at the new location because -sector has changed). cfq_add_rq_rb finds there is already a request with the same sector number and so elv_rb_add returns an __alias which is passed to cfq_dispatch_insert. This calls cfq_remove_request and as that is the only request present, -next_rq gets set to NULL. The old request with the new sector number is then added to the -sortlist, but -next_rq is never set - it remains NULL. How likely it would be to get two requests with the same sector number I don't know. I wouldn't expect it to ever happen - I have seen it before, but it was due to a bug in ext3. Maybe XFS does it intentionally some times? You could test this theory by putting a WARN_ON(cfqq-next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. I will leave the development of a suitable fix up to Jens if he agrees that this is possible. That's pretty close to where I think the problem is (the front merging and cfq_reposition_rq_rb()). The issue with that is that you'd only get aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case here. Given that front merges are equally not very likely, I'd be surprised is something like that has ever happened. BUT... That may explain while we are only seeing it on md. Would md ever be issuing such requests that trigger this condition? I'll try and concoct a test case. I made a test case and reproduced it, this is indeed what is happening. md must be issuing direct requests in such a manner, that you do get a front merge and then an alias to an existing request. Will test a fix! -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wednesday April 25, [EMAIL PROTECTED] wrote: That's pretty close to where I think the problem is (the front merging and cfq_reposition_rq_rb()). The issue with that is that you'd only get aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case here. Given that front merges are equally not very likely, I'd be surprised is something like that has ever happened. Well it certainly doesn't happen very often And I can imagine a filesystem genuinely wanting to read the same block twice - maybe a block that contained packed tails of two different files. BUT... That may explain while we are only seeing it on md. Would md ever be issuing such requests that trigger this condition? Can someone remind me which raid level(s) was/were involved? I think one was raid0 - that just passes requests on from the filesystem, so md would only issue requests like that if the filesystem did. I guess it could happen with raid4/5/6. A read request that was properly aligned (and we do encourage proper alignment) will be passed directly to the underlying device. A concurrent write elsewhere could require the same block to be read into the stripe-cache to enable a parity calculation. So you could get two reads at the same block address. Getting a front-merge would probably require two stripe-heads to be processed in reverse-sector order, and it tends to preserve the order of incoming requests (though that isn't firmly enforced). raid1 is much like raid0 (with totally different code) in that the request pattern seen by the underlying device matches the request pattern generated by the filesystem. If I read the debugging output correctly, the request which I hypothesise was the subject of a front-merge is a 'sync' request. raid5 does not generate sync requests to fill the stripe cache (maybe it should?) so I really think it must have come directly from the filesystem. (just checked previous email for more detail of when it hits) The fact that it hits degraded arrays more easily is interesting. Maybe we try to read a block on the missing device and so schedule reads for the other devices. Then we try to read a block on a good device and issue a request for exactly the same block that raid5 asked for. That still doesn't explain the 'sync' and the 'front merge'. But that is quite possible, just not common maybe. It doesn't help us understand the raid0 example though. May it is just a 'can happen, but only rarely' thing. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Neil Brown wrote: On Wednesday April 25, [EMAIL PROTECTED] wrote: That's pretty close to where I think the problem is (the front merging and cfq_reposition_rq_rb()). The issue with that is that you'd only get aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case here. Given that front merges are equally not very likely, I'd be surprised is something like that has ever happened. Well it certainly doesn't happen very often And I can imagine a filesystem genuinely wanting to read the same block twice - maybe a block that contained packed tails of two different files. But that usually happens through the page cache, so it's nicely serialized. You'd only ever get duplicate/alias blocks if you go raw, or have some layer on top of your block device (like md) that issues io on its own. BUT... That may explain while we are only seeing it on md. Would md ever be issuing such requests that trigger this condition? Can someone remind me which raid level(s) was/were involved? I think one was raid0 - that just passes requests on from the filesystem, so md would only issue requests like that if the filesystem did. Yep, one was raid0. I guess it could happen with raid4/5/6. A read request that was properly aligned (and we do encourage proper alignment) will be passed directly to the underlying device. A concurrent write elsewhere could require the same block to be read into the stripe-cache to enable a parity calculation. So you could get two reads at the same block address. Getting a front-merge would probably require two stripe-heads to be processed in reverse-sector order, and it tends to preserve the order of incoming requests (though that isn't firmly enforced). raid1 is much like raid0 (with totally different code) in that the request pattern seen by the underlying device matches the request pattern generated by the filesystem. If I read the debugging output correctly, the request which I hypothesise was the subject of a front-merge is a 'sync' request. raid5 does not generate sync requests to fill the stripe cache (maybe it should?) so I really think it must have come directly from the filesystem. (just checked previous email for more detail of when it hits) The fact that it hits degraded arrays more easily is interesting. Maybe we try to read a block on the missing device and so schedule reads for the other devices. Then we try to read a block on a good device and issue a request for exactly the same block that raid5 asked for. That still doesn't explain the 'sync' and the 'front merge'. But that is quite possible, just not common maybe. It doesn't help us understand the raid0 example though. May it is just a 'can happen, but only rarely' thing. It looks to be extremely rare. Aliases are extremely rare, front merges are rare. And you need both to happen with the details you outlined. But it's a large user base, and we've had 3-4 reports on this in the past months. So it obviously does happen. I could not make it trigger without doctoring the unplug code when I used aio. Here's a fix for it, confirmed. diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 9e37971..f965be7 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -532,6 +532,11 @@ static void cfq_add_rq_rb(struct request *rq) if (!cfq_cfqq_on_rr(cfqq)) cfq_add_cfqq_rr(cfqd, cfqq); + + /* +* check if this request is a better next-serve candidate +*/ + cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq); } static inline void @@ -1639,12 +1644,6 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, cfqq-meta_pending++; /* -* check if this request is a better next-serve candidate)) { -*/ - cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq); - BUG_ON(!cfqq-next_rq); - - /* * we never wait for an async request and we don't allow preemption * of an async request. so just return early */ -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: On Wednesday April 25, [EMAIL PROTECTED] wrote: BUT... That may explain while we are only seeing it on md. Would md ever be issuing such requests that trigger this condition? Can someone remind me which raid level(s) was/were involved? Raid-5 gegraded here, But I've had it on my server on a fully functioning raid-5 also. Don't think I've seen it on any of my raid-6 yet. I guess it could happen with raid4/5/6. A read request that was properly aligned (and we do encourage proper alignment) will be passed directly to the underlying device. A concurrent write elsewhere could require the same block to be read into the stripe-cache to enable a parity calculation. So you could get two reads at the same block address. Except all the filesystems I've had this occur on are Ext3 and all are mounted noatime, with no processes ever writing to the filesystems. So here it occurs with reads only. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: It looks to be extremely rare. Aliases are extremely rare, front merges are rare. And you need both to happen with the details you outlined. But it's a large user base, and we've had 3-4 reports on this in the past months. So it obviously does happen. I could not make it trigger without doctoring the unplug code when I used aio. Well, not that rare on this particular machine (I had a case where I could reproduce it in less than an hour of normal use previously on this box), and I've had it occur a number of times on my servers, I just never reported it before as I never took the time to set up a serial console and capture the oops. Here's a fix for it, confirmed. Shall I leave the other debugging in, apply this and run it for a few hard days? Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: You could test this theory by putting a WARN_ON(cfqq-next_rq == NULL); at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call. [ 756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb() [ 756.329615] [c0204fa1] cfq_merged_request+0x71/0x80 [ 756.345046] [c0204f30] cfq_merged_request+0x0/0x80 [ 756.360216] [c01fb4ae] elv_merged_request+0x4e/0x50 [ 756.375647] [c01ff1a7] __make_request+0x1a7/0x2f0 [ 756.390557] [c01ff497] generic_make_request+0x127/0x190 [ 756.407025] [c02bb011] chunk_aligned_read+0x111/0x1c0 [ 756.422974] [c01ff497] generic_make_request+0x127/0x190 [ 756.439443] [c02bb448] make_request+0x388/0x3b0 [ 756.453834] [c016e027] __bio_add_page+0x147/0x1c0 [ 756.468742] [c016dbff] bio_alloc_bioset+0x7f/0x150 [ 756.483913] [c02bac50] raid5_mergeable_bvec+0x0/0x90 [ 756.499604] [c016e107] bio_add_page+0x37/0x50 [ 756.513473] [c01ff497] generic_make_request+0x127/0x190 [ 756.529943] [c013586a] mempool_free+0x2a/0x60 [ 756.543812] [c016dadd] bio_free+0x1d/0x40 [ 756.556645] [c01ff546] submit_bio+0x46/0xc0 [ 756.571389] [c020a396] radix_tree_node_alloc+0x16/0x60 [ 756.587610] [c020a5c2] radix_tree_insert+0xe2/0x130 [ 756.603039] [c0139575] __pagevec_lru_add+0x75/0x80 [ 756.618207] [c0171971] mpage_bio_submit+0x11/0x20 [ 756.633117] [c01720e0] mpage_readpages+0x100/0x140 [ 756.648287] [c0187d50] ext3_get_block+0x0/0xe0 [ 756.662419] [c026fc0b] scsi_end_request+0x9b/0xc0 [ 756.677329] [c026dd2b] scsi_delete_timer+0xb/0x20 [ 756.692240] [c0188ae0] ext3_readpages+0x0/0x20 [ 756.706369] [c0138ac1] read_pages+0xd1/0xe0 [ 756.719719] [c0187d50] ext3_get_block+0x0/0xe0 [ 756.733853] [c0136e24] __alloc_pages+0x54/0x2d0 [ 756.748242] [c02704ff] scsi_softirq_done+0x6f/0xe0 [ 756.763413] [c02861a1] sil_interrupt+0x81/0x90 [ 756.777544] [c01ffa78] blk_done_softirq+0x58/0x70 [ 756.792454] [c0138b78] __do_page_cache_readahead+0xa8/0x110 [ 756.809961] [c0138d21] blockable_page_cache_readahead+0x51/0xd0 [ 756.828508] [c0138e14] make_ahead_window+0x74/0xc0 [ 756.843679] [c0138efa] page_cache_readahead+0x9a/0x180 [ 756.859886] [c0133704] file_read_actor+0xc4/0xd0 [ 756.874537] [c0133637] do_generic_mapping_read+0x467/0x470 [ 756.891786] [c0133893] generic_file_aio_read+0x183/0x1c0 [ 756.908514] [c0133640] file_read_actor+0x0/0xd0 [ 756.922903] [c026fb20] scsi_next_command+0x30/0x50 [ 756.938073] [c014e250] do_sync_read+0xc0/0x100 [ 756.952204] [c0124120] autoremove_wake_function+0x0/0x50 [ 756.968934] [c026bf00] scsi_finish_command+0x40/0x60 [ 756.984622] [c03319a5] schedule+0x2e5/0x590 [ 756.997972] [c014e316] vfs_read+0x86/0x110 [ 757.011065] [c014e5f7] sys_read+0x47/0x80 [ 757.023897] [c0102894] syscall_call+0x7/0xb [ 757.037250] === I know this is late given a fix has been issued, but just got this in the logs with no other debug info at all. This kernel has *all* Jens debugging patches + the snippet from Neil above. I'll apply the fix now and run it for a couple of days. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Brad Campbell wrote: Jens Axboe wrote: It looks to be extremely rare. Aliases are extremely rare, front merges are rare. And you need both to happen with the details you outlined. But it's a large user base, and we've had 3-4 reports on this in the past months. So it obviously does happen. I could not make it trigger without doctoring the unplug code when I used aio. Well, not that rare on this particular machine (I had a case where I could reproduce it in less than an hour of normal use previously on this box), and I've had it occur a number of times on my servers, I just never reported it before as I never took the time to set up a serial console and capture the oops. Extremely rare in the sense that it takes md and some certain conditions to happen for it to trigger. So for most people it'll be extremely rare, and for others (such as yourself) that hit it, it wont be so rare :-) Here's a fix for it, confirmed. Shall I leave the other debugging in, apply this and run it for a few hard days? Yes, that would be perfect! -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wednesday April 25, [EMAIL PROTECTED] wrote: Here's a fix for it, confirmed. Patch looks good to me. Hopefully Brad can still wait for the WARN_ON to fire - it might give useful clues to why this is happening. It might be interesting. Thanks, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 25 2007, Neil Brown wrote: On Wednesday April 25, [EMAIL PROTECTED] wrote: Here's a fix for it, confirmed. Patch looks good to me. Good! And thanks for taking the time to look at this bug btw, it's had me puzzled a bit this week. I should not have ruled out the aliases I guess, they must be coming elsewhere than from O_DIRECT. Hopefully Brad can still wait for the WARN_ON to fire - it might give useful clues to why this is happening. It might be interesting. But he did and reported it about an hour ago, so all should be well. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wednesday April 25, [EMAIL PROTECTED] wrote: [ 756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb() [ 756.329615] [c0204fa1] cfq_merged_request+0x71/0x80 [ 756.345046] [c0204f30] cfq_merged_request+0x0/0x80 [ 756.360216] [c01fb4ae] elv_merged_request+0x4e/0x50 [ 756.375647] [c01ff1a7] __make_request+0x1a7/0x2f0 [ 756.390557] [c01ff497] generic_make_request+0x127/0x190 [ 756.407025] [c02bb011] chunk_aligned_read+0x111/0x1c0 ^ [ 756.422974] [c01ff497] generic_make_request+0x127/0x190 [ 756.439443] [c02bb448] make_request+0x388/0x3b0 [ 756.453834] [c016e027] __bio_add_page+0x147/0x1c0 [ 756.468742] [c016dbff] bio_alloc_bioset+0x7f/0x150 [ 756.483913] [c02bac50] raid5_mergeable_bvec+0x0/0x90 [ 756.499604] [c016e107] bio_add_page+0x37/0x50 [ 756.513473] [c01ff497] generic_make_request+0x127/0x190 [ 756.529943] [c013586a] mempool_free+0x2a/0x60 [ 756.543812] [c016dadd] bio_free+0x1d/0x40 [ 756.556645] [c01ff546] submit_bio+0x46/0xc0 [ 756.571389] [c020a396] radix_tree_node_alloc+0x16/0x60 [ 756.587610] [c020a5c2] radix_tree_insert+0xe2/0x130 [ 756.603039] [c0139575] __pagevec_lru_add+0x75/0x80 [ 756.618207] [c0171971] mpage_bio_submit+0x11/0x20 [ 756.633117] [c01720e0] mpage_readpages+0x100/0x140 ^^^ So it is a regular data read (not metadata of any sort) which is by-passing the stripe cache. I was expecting some sort of metadata read. I wander what this is merging in-front of... maybe a stripe-cache read. I guess it doesn't tell us anything really interesting, which is probably a good thing - it means everything else is working properly. I wonder if we should avoid bypassing the stripe cache if the needed stripes are already in the cache... or if at least one needed stripe is or if the array is degraded... Probably in the degraded case we should never bypass the cache, as if we do, then a sequential read of a full stripe will read every block twice. I'd better to some performance measurements. Thanks for all the patient testing. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Degraded RAID performance - Was : Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: I wonder if we should avoid bypassing the stripe cache if the needed stripes are already in the cache... or if at least one needed stripe is or if the array is degraded... Probably in the degraded case we should never bypass the cache, as if we do, then a sequential read of a full stripe will read every block twice. I'd better to some performance measurements. Ok, that would explain some odd performance issues I've noticed. Let's say I run dstat -D sda,sdb,sdc,sdd,md0 5 total-cpu-usage --disk/sdadisk/sdbdisk/sdcdisk/sdddisk/md0- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq|_read write _read write _read write _read write _read write|_recv _send|__in_ _out_|_int_ _csw_ 25 22 0 47 0 6|20.1M0 :20.2M0 :20.1M0 : 0 0 :40.2M0 | 146B 662B| 0 0 |1186 661 26 20 0 46 0 8|19.4M0 :19.4M0 :19.4M0 : 0 0 :38.9M0 | 160B 549B| 0 0 |1365 650 Given I'm doing a read, I would have expected a read to consist of 2 direct reads, one parity read and some calculation. The numbers I'm seeing however show 3 reads for 2 reads worth of bandwidth. [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda[0] sdc[2] sdb[1] 585934080 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] (Dropped Jens and Chuck from the cc as this likely has little interest for them) Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi Jens! On 25 Apr 2007, at 12:18, Jens Axboe wrote: On Wed, Apr 25 2007, Brad Campbell wrote: Jens Axboe wrote: It looks to be extremely rare. Aliases are extremely rare, front merges are rare. And you need both to happen with the details you outlined. But it's a large user base, and we've had 3-4 reports on this in the past months. So it obviously does happen. I could not make it trigger without doctoring the unplug code when I used aio. Well, not that rare on this particular machine (I had a case where I could reproduce it in less than an hour of normal use previously on this box), and I've had it occur a number of times on my servers, I just never reported it before as I never took the time to set up a serial console and capture the oops. Extremely rare in the sense that it takes md and some certain conditions to happen for it to trigger. So for most people it'll be extremely rare, and for others (such as yourself) that hit it, it wont be so rare :-) Here's a fix for it, confirmed. Shall I leave the other debugging in, apply this and run it for a few hard days? Yes, that would be perfect! Okay, I left all debugging patches in, disabled all kernel debugging .config stuff and gave it a spin with our usual killer workload (as far as batch systems are repeatable anyway) and so far there was not a single glitch or message, so I preliminarily conclude that the bug is squashed. The final word will come once my 1800 batch jobs are processed and I have my particle physics result ;-) Ciao, Roland -- TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching Telefon 089/289-12575; Telefax 089/289-12570 -- CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482 -- Any society that would give up a little liberty to gain a little security will deserve neither and lose both. - Benjamin Franklin -BEGIN GEEK CODE BLOCK- Version: 3.12 GS/CS/M/MU d-(++) s:+ a- C+++ UL P+++ L+++ E(+) W+ !N K- w--- M + !V Y+ PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++ h y+++ --END GEEK CODE BLOCK-- smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: Ok, can you try and reproduce with this one applied? It'll keep the system running (unless there are other corruptions going on), so it should help you a bit as well. It will dump some cfq state info when the condition triggers that can perhaps help diagnose this. So if you can apply this patch and reproduce + send the output, I'd much appreciate it! I think I'm wearing holes in my platters. This is being a swine to hit, but I finally got some.. It seems to respond to a combination of high cpu usage and relatively high disk utilisation. I tried all sorts of acrobatics with multiple readers and hammering the array while reading from individual drives.. The only reliable way I can reproduce this seems to be on a degraded array running a "while true ; do md5sum -c md5 ; done" on about a 180GB directory of files. It is taking anywhere from 4 to 96 hours to hit it though.. but at least it's reproducible. [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [105449.683646] cfq: busy=1,drv=0,timer=0 [105449.694871] cfq rr_list: [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [105449.720693] cfq busy_list: [105449.729054] cfq idle_list: [105449.737418] cfq cur_rr: [115435.022192] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [115435.052160] cfq: busy=1,drv=0,timer=0 [115435.063383] cfq rr_list: [115435.071227] 3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [115435.089205] cfq busy_list: [115435.097566] cfq idle_list: [115435.105930] cfq cur_rr: [115616.651883] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [115616.681848] cfq: busy=1,drv=0,timer=0 [115616.693071] cfq rr_list: [115616.700916] 3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=61 [115616.718893] cfq busy_list: [115616.727253] cfq idle_list: [115616.735617] cfq cur_rr: [119679.564753] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [119679.594732] cfq: busy=1,drv=0,timer=0 [119679.605955] cfq rr_list: [119679.613799] 3241: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [119679.631778] cfq busy_list: [119679.640136] cfq idle_list: [119679.648502] cfq cur_rr: Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tue, Apr 24 2007, Roland Kuhn wrote: > Hi Jens! > > On 24 Apr 2007, at 14:32, Jens Axboe wrote: > > >On Tue, Apr 24 2007, Roland Kuhn wrote: > >>Hi Jens! > >> > >>[I made a typo in the Cc: list so that lkml is only included as of > >>now. Actually I copied the typo from you ;-) ] > > > >Well no, you started the typo, I merely propagated it and forgot to > >fix > >it up :-) > > > Actually, I copied it from your printk() ;-) (thinking helps...) Ahhh! Yeah that one indeed has a typo, tsk tsk. > >Sure. You might want to include NFS file access into your tests, > >since we've not triggered this with locally accessing the disks. > >BTW: > > How are you exporting the directory (what exports options) - how > is it > mounted by the client(s)? What chunksize is your raid6 using? > >>> > >>>And what are the nature of the files on the raid (huge, small, ?) > >>>and > >>>what are the client(s) doing? Just approximately, I know these > >>>things > >>>can be hard/difficult/impossible to specify. > >>> > >>The files are 100-400MB in size and the client is merging them into a > >>new file in the same directory using the ROOT library, which does in > >>essence alternating sequences of > >> > >>_llseek(somewhere) > >>read(n bytes) > >>_llseek(somewhere+n) > >>read(m bytes) > >>... > >> > >>and then > >> > >>_llseek(somewhere) > >>rt_sigaction(ignore INT) > >>write(n bytes) > >>rt_sigaction(INT->DFL) > >>time() > >>_llseek(somewhere+n) > >>... > >> > >>where n is of the the order of 30kB. The input files are treated > >>sequentially, not randomly. > > > >Ok, I'll see if I can reproduce it. No luck so far, I'm afraid. > > > Too bad. > > >>BTW: the machine just stopped dead, no sign whatsoever on console or > >>netconsole, so I rebooted with elevator=deadline > >>(need to get some work done besides ;-) ) > > > >Unfortunately expected, if we can race and lose an update to - > >>next_rq, > >we can race and corrupt some of the internal data structures as > >well. If > >you have the time and inclination, it would be interesting to see > >if you > >can reproduce with some debugging options enabled: > > > >- Enable all preempt, spinlock and lockdep debugging measures > >- Possibly slab poisoning, although that may slow you down somewhat > > > Kernel compilation under way, will report back. Thanks! > >Are you using 4kb stacks? > > > No idea, 'grep -i stack .config' gives no indication, but ISTR that > 4k was made the default some time back? You are on x86-64, my mistake. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi Jens! On 24 Apr 2007, at 14:32, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! [I made a typo in the Cc: list so that lkml is only included as of now. Actually I copied the typo from you ;-) ] Well no, you started the typo, I merely propagated it and forgot to fix it up :-) Actually, I copied it from your printk() ;-) (thinking helps...) Sure. You might want to include NFS file access into your tests, since we've not triggered this with locally accessing the disks. BTW: How are you exporting the directory (what exports options) - how is it mounted by the client(s)? What chunksize is your raid6 using? And what are the nature of the files on the raid (huge, small, ?) and what are the client(s) doing? Just approximately, I know these things can be hard/difficult/impossible to specify. The files are 100-400MB in size and the client is merging them into a new file in the same directory using the ROOT library, which does in essence alternating sequences of _llseek(somewhere) read(n bytes) _llseek(somewhere+n) read(m bytes) ... and then _llseek(somewhere) rt_sigaction(ignore INT) write(n bytes) rt_sigaction(INT->DFL) time() _llseek(somewhere+n) ... where n is of the the order of 30kB. The input files are treated sequentially, not randomly. Ok, I'll see if I can reproduce it. No luck so far, I'm afraid. Too bad. BTW: the machine just stopped dead, no sign whatsoever on console or netconsole, so I rebooted with elevator=deadline (need to get some work done besides ;-) ) Unfortunately expected, if we can race and lose an update to - >next_rq, we can race and corrupt some of the internal data structures as well. If you have the time and inclination, it would be interesting to see if you can reproduce with some debugging options enabled: - Enable all preempt, spinlock and lockdep debugging measures - Possibly slab poisoning, although that may slow you down somewhat Kernel compilation under way, will report back. Are you using 4kb stacks? No idea, 'grep -i stack .config' gives no indication, but ISTR that 4k was made the default some time back? Ciao, Roland -- TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching Telefon 089/289-12575; Telefax 089/289-12570 -- CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482 -- Any society that would give up a little liberty to gain a little security will deserve neither and lose both. - Benjamin Franklin -BEGIN GEEK CODE BLOCK- Version: 3.12 GS/CS/M/MU d-(++) s:+ a-> C+++ UL P+++ L+++ E(+) W+ !N K- w--- M + !V Y+ PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++> h y+++ --END GEEK CODE BLOCK-- smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tue, Apr 24 2007, Roland Kuhn wrote: > Hi Jens! > > [I made a typo in the Cc: list so that lkml is only included as of > now. Actually I copied the typo from you ;-) ] Well no, you started the typo, I merely propagated it and forgot to fix it up :-) > >On Tue, Apr 24 2007, Jens Axboe wrote: > >>On Tue, Apr 24 2007, Roland Kuhn wrote: > >>>Hi Jens! > >>> > >>>On 24 Apr 2007, at 11:18, Jens Axboe wrote: > >>> > On Tue, Apr 24 2007, Roland Kuhn wrote: > >Hi Jens! > > > >We're using a custom built fileserver (dual core Athlon64, using > >x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2 > >(9.1GB ext3) after a hardware incident (cable pulled on one > >disk) the > >machine would reliably oops while serving some large files over > >NFSv3. The oops message scrolled partly off the screen, but the IP > >was in cfq_dispatch_insert, so I tried your debug patch from > >yesterday with 2.6.21-rc7. I used netconsole for capturing the > >output > >(which works nicely, thanks Matt!) and as usual the condition > >triggered after about half a minute, this with the following > >printout > >instead of crashing (still works fine): > > > >cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report > >the > >issue to [EMAIL PROTECTED] > >cfq: busy=1,drv=1,timer=0 > >cfq rr_list: > >cfq busy_list: > > 4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 > >cfq idle_list: > >cfq cur_rr: > >cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report > >the > >issue to [EMAIL PROTECTED] > >cfq: busy=1,drv=1,timer=0 > >cfq rr_list: > >cfq busy_list: > > 4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 > >cfq idle_list: > >cfq cur_rr: > > > >There was no backtrace, so the only thing I can tell is that > >for the > >previous crashes some nfs threads were always involved, only > >once did > >it happen inside an interrupt handler (with the "aieee" kind of > >message). > > > >If you want me to try something else, don't hesitate to ask! > > Nifty, great that you can reproduce so quickly. I'll try a 3-drive > raid6 > here and see if read activity along with a resync will trigger > anything. > If that doesn't work for me, I'll provide you with a more extensive > debug patch (if you don't mind). > > >>>Sure. You might want to include NFS file access into your tests, > >>>since we've not triggered this with locally accessing the disks. > >>>BTW: > >> > >>How are you exporting the directory (what exports options) - how > >>is it > >>mounted by the client(s)? What chunksize is your raid6 using? > > > >And what are the nature of the files on the raid (huge, small, ?) and > >what are the client(s) doing? Just approximately, I know these things > >can be hard/difficult/impossible to specify. > > > The files are 100-400MB in size and the client is merging them into a > new file in the same directory using the ROOT library, which does in > essence alternating sequences of > > _llseek(somewhere) > read(n bytes) > _llseek(somewhere+n) > read(m bytes) > ... > > and then > > _llseek(somewhere) > rt_sigaction(ignore INT) > write(n bytes) > rt_sigaction(INT->DFL) > time() > _llseek(somewhere+n) > ... > > where n is of the the order of 30kB. The input files are treated > sequentially, not randomly. Ok, I'll see if I can reproduce it. No luck so far, I'm afraid. > BTW: the machine just stopped dead, no sign whatsoever on console or > netconsole, so I rebooted with elevator=deadline > (need to get some work done besides ;-) ) Unfortunately expected, if we can race and lose an update to ->next_rq, we can race and corrupt some of the internal data structures as well. If you have the time and inclination, it would be interesting to see if you can reproduce with some debugging options enabled: - Enable all preempt, spinlock and lockdep debugging measures - Possibly slab poisoning, although that may slow you down somewhat Are you using 4kb stacks? -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi Jens! [I made a typo in the Cc: list so that lkml is only included as of now. Actually I copied the typo from you ;-) ] On 24 Apr 2007, at 11:40, Jens Axboe wrote: On Tue, Apr 24 2007, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! On 24 Apr 2007, at 11:18, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! We're using a custom built fileserver (dual core Athlon64, using x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2 (9.1GB ext3) after a hardware incident (cable pulled on one disk) the machine would reliably oops while serving some large files over NFSv3. The oops message scrolled partly off the screen, but the IP was in cfq_dispatch_insert, so I tried your debug patch from yesterday with 2.6.21-rc7. I used netconsole for capturing the output (which works nicely, thanks Matt!) and as usual the condition triggered after about half a minute, this with the following printout instead of crashing (still works fine): cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] cfq: busy=1,drv=1,timer=0 cfq rr_list: cfq busy_list: 4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 cfq idle_list: cfq cur_rr: cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] cfq: busy=1,drv=1,timer=0 cfq rr_list: cfq busy_list: 4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 cfq idle_list: cfq cur_rr: There was no backtrace, so the only thing I can tell is that for the previous crashes some nfs threads were always involved, only once did it happen inside an interrupt handler (with the "aieee" kind of message). If you want me to try something else, don't hesitate to ask! Nifty, great that you can reproduce so quickly. I'll try a 3-drive raid6 here and see if read activity along with a resync will trigger anything. If that doesn't work for me, I'll provide you with a more extensive debug patch (if you don't mind). Sure. You might want to include NFS file access into your tests, since we've not triggered this with locally accessing the disks. BTW: How are you exporting the directory (what exports options) - how is it mounted by the client(s)? What chunksize is your raid6 using? And what are the nature of the files on the raid (huge, small, ?) and what are the client(s) doing? Just approximately, I know these things can be hard/difficult/impossible to specify. The files are 100-400MB in size and the client is merging them into a new file in the same directory using the ROOT library, which does in essence alternating sequences of _llseek(somewhere) read(n bytes) _llseek(somewhere+n) read(m bytes) ... and then _llseek(somewhere) rt_sigaction(ignore INT) write(n bytes) rt_sigaction(INT->DFL) time() _llseek(somewhere+n) ... where n is of the the order of 30kB. The input files are treated sequentially, not randomly. BTW: the machine just stopped dead, no sign whatsoever on console or netconsole, so I rebooted with elevator=deadline (need to get some work done besides ;-) ) Ciao, Roland -- TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching Telefon 089/289-12575; Telefax 089/289-12570 -- CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482 -- Any society that would give up a little liberty to gain a little security will deserve neither and lose both. - Benjamin Franklin -BEGIN GEEK CODE BLOCK- Version: 3.12 GS/CS/M/MU d-(++) s:+ a-> C+++ UL P+++ L+++ E(+) W+ !N K- w--- M + !V Y+ PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++> h y+++ --END GEEK CODE BLOCK-- smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi Jens! [I made a typo in the Cc: list so that lkml is only included as of now. Actually I copied the typo from you ;-) ] On 24 Apr 2007, at 11:40, Jens Axboe wrote: On Tue, Apr 24 2007, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! On 24 Apr 2007, at 11:18, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! We're using a custom built fileserver (dual core Athlon64, using x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2 (9.1GB ext3) after a hardware incident (cable pulled on one disk) the machine would reliably oops while serving some large files over NFSv3. The oops message scrolled partly off the screen, but the IP was in cfq_dispatch_insert, so I tried your debug patch from yesterday with 2.6.21-rc7. I used netconsole for capturing the output (which works nicely, thanks Matt!) and as usual the condition triggered after about half a minute, this with the following printout instead of crashing (still works fine): cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] cfq: busy=1,drv=1,timer=0 cfq rr_list: cfq busy_list: 4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 cfq idle_list: cfq cur_rr: cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] cfq: busy=1,drv=1,timer=0 cfq rr_list: cfq busy_list: 4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 cfq idle_list: cfq cur_rr: There was no backtrace, so the only thing I can tell is that for the previous crashes some nfs threads were always involved, only once did it happen inside an interrupt handler (with the aieee kind of message). If you want me to try something else, don't hesitate to ask! Nifty, great that you can reproduce so quickly. I'll try a 3-drive raid6 here and see if read activity along with a resync will trigger anything. If that doesn't work for me, I'll provide you with a more extensive debug patch (if you don't mind). Sure. You might want to include NFS file access into your tests, since we've not triggered this with locally accessing the disks. BTW: How are you exporting the directory (what exports options) - how is it mounted by the client(s)? What chunksize is your raid6 using? And what are the nature of the files on the raid (huge, small, ?) and what are the client(s) doing? Just approximately, I know these things can be hard/difficult/impossible to specify. The files are 100-400MB in size and the client is merging them into a new file in the same directory using the ROOT library, which does in essence alternating sequences of _llseek(somewhere) read(n bytes) _llseek(somewhere+n) read(m bytes) ... and then _llseek(somewhere) rt_sigaction(ignore INT) write(n bytes) rt_sigaction(INT-DFL) time() _llseek(somewhere+n) ... where n is of the the order of 30kB. The input files are treated sequentially, not randomly. BTW: the machine just stopped dead, no sign whatsoever on console or netconsole, so I rebooted with elevator=deadline (need to get some work done besides ;-) ) Ciao, Roland -- TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching Telefon 089/289-12575; Telefax 089/289-12570 -- CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482 -- Any society that would give up a little liberty to gain a little security will deserve neither and lose both. - Benjamin Franklin -BEGIN GEEK CODE BLOCK- Version: 3.12 GS/CS/M/MU d-(++) s:+ a- C+++ UL P+++ L+++ E(+) W+ !N K- w--- M + !V Y+ PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++ h y+++ --END GEEK CODE BLOCK-- smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! [I made a typo in the Cc: list so that lkml is only included as of now. Actually I copied the typo from you ;-) ] Well no, you started the typo, I merely propagated it and forgot to fix it up :-) On Tue, Apr 24 2007, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! On 24 Apr 2007, at 11:18, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! We're using a custom built fileserver (dual core Athlon64, using x64_64 arch) with 22 disks in a RAID6 and while resyncing /dev/md2 (9.1GB ext3) after a hardware incident (cable pulled on one disk) the machine would reliably oops while serving some large files over NFSv3. The oops message scrolled partly off the screen, but the IP was in cfq_dispatch_insert, so I tried your debug patch from yesterday with 2.6.21-rc7. I used netconsole for capturing the output (which works nicely, thanks Matt!) and as usual the condition triggered after about half a minute, this with the following printout instead of crashing (still works fine): cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] cfq: busy=1,drv=1,timer=0 cfq rr_list: cfq busy_list: 4272: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 cfq idle_list: cfq cur_rr: cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] cfq: busy=1,drv=1,timer=0 cfq rr_list: cfq busy_list: 4276: sort=0,next=,q=0/1,a=2/0,d=0/1,f=221 cfq idle_list: cfq cur_rr: There was no backtrace, so the only thing I can tell is that for the previous crashes some nfs threads were always involved, only once did it happen inside an interrupt handler (with the aieee kind of message). If you want me to try something else, don't hesitate to ask! Nifty, great that you can reproduce so quickly. I'll try a 3-drive raid6 here and see if read activity along with a resync will trigger anything. If that doesn't work for me, I'll provide you with a more extensive debug patch (if you don't mind). Sure. You might want to include NFS file access into your tests, since we've not triggered this with locally accessing the disks. BTW: How are you exporting the directory (what exports options) - how is it mounted by the client(s)? What chunksize is your raid6 using? And what are the nature of the files on the raid (huge, small, ?) and what are the client(s) doing? Just approximately, I know these things can be hard/difficult/impossible to specify. The files are 100-400MB in size and the client is merging them into a new file in the same directory using the ROOT library, which does in essence alternating sequences of _llseek(somewhere) read(n bytes) _llseek(somewhere+n) read(m bytes) ... and then _llseek(somewhere) rt_sigaction(ignore INT) write(n bytes) rt_sigaction(INT-DFL) time() _llseek(somewhere+n) ... where n is of the the order of 30kB. The input files are treated sequentially, not randomly. Ok, I'll see if I can reproduce it. No luck so far, I'm afraid. BTW: the machine just stopped dead, no sign whatsoever on console or netconsole, so I rebooted with elevator=deadline (need to get some work done besides ;-) ) Unfortunately expected, if we can race and lose an update to -next_rq, we can race and corrupt some of the internal data structures as well. If you have the time and inclination, it would be interesting to see if you can reproduce with some debugging options enabled: - Enable all preempt, spinlock and lockdep debugging measures - Possibly slab poisoning, although that may slow you down somewhat Are you using 4kb stacks? -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi Jens! On 24 Apr 2007, at 14:32, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! [I made a typo in the Cc: list so that lkml is only included as of now. Actually I copied the typo from you ;-) ] Well no, you started the typo, I merely propagated it and forgot to fix it up :-) Actually, I copied it from your printk() ;-) (thinking helps...) Sure. You might want to include NFS file access into your tests, since we've not triggered this with locally accessing the disks. BTW: How are you exporting the directory (what exports options) - how is it mounted by the client(s)? What chunksize is your raid6 using? And what are the nature of the files on the raid (huge, small, ?) and what are the client(s) doing? Just approximately, I know these things can be hard/difficult/impossible to specify. The files are 100-400MB in size and the client is merging them into a new file in the same directory using the ROOT library, which does in essence alternating sequences of _llseek(somewhere) read(n bytes) _llseek(somewhere+n) read(m bytes) ... and then _llseek(somewhere) rt_sigaction(ignore INT) write(n bytes) rt_sigaction(INT-DFL) time() _llseek(somewhere+n) ... where n is of the the order of 30kB. The input files are treated sequentially, not randomly. Ok, I'll see if I can reproduce it. No luck so far, I'm afraid. Too bad. BTW: the machine just stopped dead, no sign whatsoever on console or netconsole, so I rebooted with elevator=deadline (need to get some work done besides ;-) ) Unfortunately expected, if we can race and lose an update to - next_rq, we can race and corrupt some of the internal data structures as well. If you have the time and inclination, it would be interesting to see if you can reproduce with some debugging options enabled: - Enable all preempt, spinlock and lockdep debugging measures - Possibly slab poisoning, although that may slow you down somewhat Kernel compilation under way, will report back. Are you using 4kb stacks? No idea, 'grep -i stack .config' gives no indication, but ISTR that 4k was made the default some time back? Ciao, Roland -- TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching Telefon 089/289-12575; Telefax 089/289-12570 -- CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482 -- Any society that would give up a little liberty to gain a little security will deserve neither and lose both. - Benjamin Franklin -BEGIN GEEK CODE BLOCK- Version: 3.12 GS/CS/M/MU d-(++) s:+ a- C+++ UL P+++ L+++ E(+) W+ !N K- w--- M + !V Y+ PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++ h y+++ --END GEEK CODE BLOCK-- smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! On 24 Apr 2007, at 14:32, Jens Axboe wrote: On Tue, Apr 24 2007, Roland Kuhn wrote: Hi Jens! [I made a typo in the Cc: list so that lkml is only included as of now. Actually I copied the typo from you ;-) ] Well no, you started the typo, I merely propagated it and forgot to fix it up :-) Actually, I copied it from your printk() ;-) (thinking helps...) Ahhh! Yeah that one indeed has a typo, tsk tsk. Sure. You might want to include NFS file access into your tests, since we've not triggered this with locally accessing the disks. BTW: How are you exporting the directory (what exports options) - how is it mounted by the client(s)? What chunksize is your raid6 using? And what are the nature of the files on the raid (huge, small, ?) and what are the client(s) doing? Just approximately, I know these things can be hard/difficult/impossible to specify. The files are 100-400MB in size and the client is merging them into a new file in the same directory using the ROOT library, which does in essence alternating sequences of _llseek(somewhere) read(n bytes) _llseek(somewhere+n) read(m bytes) ... and then _llseek(somewhere) rt_sigaction(ignore INT) write(n bytes) rt_sigaction(INT-DFL) time() _llseek(somewhere+n) ... where n is of the the order of 30kB. The input files are treated sequentially, not randomly. Ok, I'll see if I can reproduce it. No luck so far, I'm afraid. Too bad. BTW: the machine just stopped dead, no sign whatsoever on console or netconsole, so I rebooted with elevator=deadline (need to get some work done besides ;-) ) Unfortunately expected, if we can race and lose an update to - next_rq, we can race and corrupt some of the internal data structures as well. If you have the time and inclination, it would be interesting to see if you can reproduce with some debugging options enabled: - Enable all preempt, spinlock and lockdep debugging measures - Possibly slab poisoning, although that may slow you down somewhat Kernel compilation under way, will report back. Thanks! Are you using 4kb stacks? No idea, 'grep -i stack .config' gives no indication, but ISTR that 4k was made the default some time back? You are on x86-64, my mistake. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: Ok, can you try and reproduce with this one applied? It'll keep the system running (unless there are other corruptions going on), so it should help you a bit as well. It will dump some cfq state info when the condition triggers that can perhaps help diagnose this. So if you can apply this patch and reproduce + send the output, I'd much appreciate it! I think I'm wearing holes in my platters. This is being a swine to hit, but I finally got some.. It seems to respond to a combination of high cpu usage and relatively high disk utilisation. I tried all sorts of acrobatics with multiple readers and hammering the array while reading from individual drives.. The only reliable way I can reproduce this seems to be on a degraded array running a while true ; do md5sum -c md5 ; done on about a 180GB directory of files. It is taking anywhere from 4 to 96 hours to hit it though.. but at least it's reproducible. [105449.653682] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [105449.683646] cfq: busy=1,drv=0,timer=0 [105449.694871] cfq rr_list: [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [105449.720693] cfq busy_list: [105449.729054] cfq idle_list: [105449.737418] cfq cur_rr: [115435.022192] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [115435.052160] cfq: busy=1,drv=0,timer=0 [115435.063383] cfq rr_list: [115435.071227] 3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [115435.089205] cfq busy_list: [115435.097566] cfq idle_list: [115435.105930] cfq cur_rr: [115616.651883] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [115616.681848] cfq: busy=1,drv=0,timer=0 [115616.693071] cfq rr_list: [115616.700916] 3196: sort=0,next=,q=0/1,a=1/0,d=0/0,f=61 [115616.718893] cfq busy_list: [115616.727253] cfq idle_list: [115616.735617] cfq cur_rr: [119679.564753] cfq: rbroot not empty, but -next_rq == NULL! Fixing up, report the issue to [EMAIL PROTECTED] [119679.594732] cfq: busy=1,drv=0,timer=0 [119679.605955] cfq rr_list: [119679.613799] 3241: sort=0,next=,q=0/1,a=1/0,d=0/0,f=69 [119679.631778] cfq busy_list: [119679.640136] cfq idle_list: [119679.648502] cfq cur_rr: Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Sun, Apr 22 2007, Brad Campbell wrote: > Jens Axboe wrote: > > > >Thanks for testing Brad, be sure to use the next patch I sent instead. > >The one from this mail shouldn't even get you booted. So double check > >that you are still using CFQ :-) > > > > [184901.576773] BUG: unable to handle kernel NULL pointer dereference at > virtual address 005c > [184901.602612] printing eip: > [184901.610990] c0205399 > [184901.617796] *pde = > [184901.626421] Oops: [#1] > [184901.635044] Modules linked in: > [184901.644500] CPU:0 > [184901.644501] EIP:0060:[]Not tainted VLI > [184901.644503] EFLAGS: 00010082 (2.6.21-rc7 #7) > [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70 > [184901.696168] eax: f7f078e0 ebx: f7ca2794 ecx: 0004 edx: > > [184901.716743] esi: c1acaa1c edi: f7c9c6c0 ebp: esp: > dbaefde0 > [184901.737316] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 > task.ti=dbaee000) > [184901.777422] Stack: c1acaa1c f7c9c6c0 c0205509 > e6b61bd8 c0133451 1000 > [184901.803121]0008 0004 0e713800 c1acaa1c > f7c9c6c0 c1acaa1c > [184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 > c01fb829 c016ea8d > [184901.854552] Call Trace: > [184901.862723] [] __cfq_dispatch_requests+0x79/0x170 > [184901.879971] [] do_generic_mapping_read+0x281/0x470 > [184901.897478] [] cfq_dispatch_requests+0x69/0x90 > [184901.913946] [] elv_next_request+0x39/0x130 > [184901.929375] [] bio_endio+0x5d/0x90 > [184901.942725] [] scsi_request_fn+0x45/0x280 > [184901.957896] [] blk_run_queue+0x32/0x70 > [184901.972286] [] scsi_next_command+0x30/0x50 > [184901.987716] [] scsi_end_request+0x9b/0xc0 > [184902.002886] [] scsi_io_completion+0x81/0x330 > [184902.018835] [] scsi_delete_timer+0xb/0x20 > [184902.034006] [] ata_scsi_qc_complete+0x65/0xd0 > [184902.050214] [] sd_rw_intr+0x8b/0x220 > [184902.064085] [] ata_altstatus+0x1c/0x20 > [184902.078475] [] ata_hsm_move+0x14d/0x3f0 > [184902.093126] [] scsi_finish_command+0x40/0x60 > [184902.109075] [] scsi_softirq_done+0x6f/0xe0 > [184902.124506] [] sil_interrupt+0x81/0x90 > [184902.138895] [] blk_done_softirq+0x58/0x70 > [184902.154066] [] __do_softirq+0x6f/0x80 > [184902.181806] [] do_IRQ+0x3e/0x80 > [184902.194380] [] common_interrupt+0x23/0x28 > [184902.209551] === > [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 > 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c > <8b> 72 5c 8b 78 > 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80 > [184902.280564] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP > 0068:dbaefde0 > [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt > [184902.322746] Rebooting in 60 seconds.. > > Ok, it's taken be _ages_ to get the system to a point I can reproduce this, > but I think it's now reproducible with a couple of hours beating. The bad > news is it looks like it has not tickled any of your debugging markers! > This was the 1st thing printed on a clean serial console, so nothing above > that for days. > > I did double check and I was/am certainly running the kernel with the debug > patch compiled in. Ok, can you try and reproduce with this one applied? It'll keep the system running (unless there are other corruptions going on), so it should help you a bit as well. It will dump some cfq state info when the condition triggers that can perhaps help diagnose this. So if you can apply this patch and reproduce + send the output, I'd much appreciate it! diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index b6491c0..2aba928 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -947,6 +947,36 @@ keep_queue: return cfqq; } +static void cfq_dump_queue(struct cfq_queue *cfqq) +{ + printk(" %d: sort=%d,next=%p,q=%d/%d,a=%d/%d,d=%d/%d,f=%x\n", cfqq->key, RB_EMPTY_ROOT(>sort_list), cfqq->next_rq, cfqq->queued[0], cfqq->queued[1], cfqq->allocated[0], cfqq->allocated[1], cfqq->on_dispatch[0], cfqq->on_dispatch[1], cfqq->flags); +} + +static void cfq_dump_state(struct cfq_data *cfqd) +{ + struct cfq_queue *cfqq; + int i; + + printk("cfq: busy=%d,drv=%d,timer=%d\n", cfqd->busy_queues, cfqd->rq_in_driver, timer_pending(>idle_slice_timer)); + + printk("cfq rr_list:\n"); + for (i = 0; i < CFQ_PRIO_LISTS; i++) + list_for_each_entry(cfqq, >rr_list[i], cfq_list) + cfq_dump_queue(cfqq); + + printk("cfq busy_list:\n"); + list_for_each_entry(cfqq, >busy_rr, cfq_list) + cfq_dump_queue(cfqq); + + printk("cfq idle_list:\n"); + list_for_each_entry(cfqq, >idle_rr, cfq_list) + cfq_dump_queue(cfqq); + + printk("cfq cur_rr:\n"); + list_for_each_entry(cfqq, >cur_rr, cfq_list) +
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Sun, Apr 22 2007, Brad Campbell wrote: Jens Axboe wrote: Thanks for testing Brad, be sure to use the next patch I sent instead. The one from this mail shouldn't even get you booted. So double check that you are still using CFQ :-) [184901.576773] BUG: unable to handle kernel NULL pointer dereference at virtual address 005c [184901.602612] printing eip: [184901.610990] c0205399 [184901.617796] *pde = [184901.626421] Oops: [#1] [184901.635044] Modules linked in: [184901.644500] CPU:0 [184901.644501] EIP:0060:[c0205399]Not tainted VLI [184901.644503] EFLAGS: 00010082 (2.6.21-rc7 #7) [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70 [184901.696168] eax: f7f078e0 ebx: f7ca2794 ecx: 0004 edx: [184901.716743] esi: c1acaa1c edi: f7c9c6c0 ebp: esp: dbaefde0 [184901.737316] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 task.ti=dbaee000) [184901.777422] Stack: c1acaa1c f7c9c6c0 c0205509 e6b61bd8 c0133451 1000 [184901.803121]0008 0004 0e713800 c1acaa1c f7c9c6c0 c1acaa1c [184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 c01fb829 c016ea8d [184901.854552] Call Trace: [184901.862723] [c0205509] __cfq_dispatch_requests+0x79/0x170 [184901.879971] [c0133451] do_generic_mapping_read+0x281/0x470 [184901.897478] [c0205749] cfq_dispatch_requests+0x69/0x90 [184901.913946] [c01fb829] elv_next_request+0x39/0x130 [184901.929375] [c016ea8d] bio_endio+0x5d/0x90 [184901.942725] [c0270375] scsi_request_fn+0x45/0x280 [184901.957896] [c01fde92] blk_run_queue+0x32/0x70 [184901.972286] [c026f8e0] scsi_next_command+0x30/0x50 [184901.987716] [c026f9cb] scsi_end_request+0x9b/0xc0 [184902.002886] [c026fb31] scsi_io_completion+0x81/0x330 [184902.018835] [c026daeb] scsi_delete_timer+0xb/0x20 [184902.034006] [c027ee35] ata_scsi_qc_complete+0x65/0xd0 [184902.050214] [c02751bb] sd_rw_intr+0x8b/0x220 [184902.064085] [c0280c0c] ata_altstatus+0x1c/0x20 [184902.078475] [c027b68d] ata_hsm_move+0x14d/0x3f0 [184902.093126] [c026bcc0] scsi_finish_command+0x40/0x60 [184902.109075] [c02702bf] scsi_softirq_done+0x6f/0xe0 [184902.124506] [c0285f61] sil_interrupt+0x81/0x90 [184902.138895] [c01ffa78] blk_done_softirq+0x58/0x70 [184902.154066] [c011721f] __do_softirq+0x6f/0x80 [184902.181806] [c0104cee] do_IRQ+0x3e/0x80 [184902.194380] [c010322f] common_interrupt+0x23/0x28 [184902.209551] === [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80 [184902.280564] EIP: [c0205399] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:dbaefde0 [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt [184902.322746] Rebooting in 60 seconds.. Ok, it's taken be _ages_ to get the system to a point I can reproduce this, but I think it's now reproducible with a couple of hours beating. The bad news is it looks like it has not tickled any of your debugging markers! This was the 1st thing printed on a clean serial console, so nothing above that for days. I did double check and I was/am certainly running the kernel with the debug patch compiled in. Ok, can you try and reproduce with this one applied? It'll keep the system running (unless there are other corruptions going on), so it should help you a bit as well. It will dump some cfq state info when the condition triggers that can perhaps help diagnose this. So if you can apply this patch and reproduce + send the output, I'd much appreciate it! diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index b6491c0..2aba928 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -947,6 +947,36 @@ keep_queue: return cfqq; } +static void cfq_dump_queue(struct cfq_queue *cfqq) +{ + printk( %d: sort=%d,next=%p,q=%d/%d,a=%d/%d,d=%d/%d,f=%x\n, cfqq-key, RB_EMPTY_ROOT(cfqq-sort_list), cfqq-next_rq, cfqq-queued[0], cfqq-queued[1], cfqq-allocated[0], cfqq-allocated[1], cfqq-on_dispatch[0], cfqq-on_dispatch[1], cfqq-flags); +} + +static void cfq_dump_state(struct cfq_data *cfqd) +{ + struct cfq_queue *cfqq; + int i; + + printk(cfq: busy=%d,drv=%d,timer=%d\n, cfqd-busy_queues, cfqd-rq_in_driver, timer_pending(cfqd-idle_slice_timer)); + + printk(cfq rr_list:\n); + for (i = 0; i CFQ_PRIO_LISTS; i++) + list_for_each_entry(cfqq, cfqd-rr_list[i], cfq_list) + cfq_dump_queue(cfqq); + + printk(cfq busy_list:\n); + list_for_each_entry(cfqq, cfqd-busy_rr, cfq_list) + cfq_dump_queue(cfqq); + + printk(cfq idle_list:\n); + list_for_each_entry(cfqq, cfqd-idle_rr, cfq_list) +
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: Thanks for testing Brad, be sure to use the next patch I sent instead. The one from this mail shouldn't even get you booted. So double check that you are still using CFQ :-) [184901.576773] BUG: unable to handle kernel NULL pointer dereference at virtual address 005c [184901.602612] printing eip: [184901.610990] c0205399 [184901.617796] *pde = [184901.626421] Oops: [#1] [184901.635044] Modules linked in: [184901.644500] CPU:0 [184901.644501] EIP:0060:[]Not tainted VLI [184901.644503] EFLAGS: 00010082 (2.6.21-rc7 #7) [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70 [184901.696168] eax: f7f078e0 ebx: f7ca2794 ecx: 0004 edx: [184901.716743] esi: c1acaa1c edi: f7c9c6c0 ebp: esp: dbaefde0 [184901.737316] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 task.ti=dbaee000) [184901.777422] Stack: c1acaa1c f7c9c6c0 c0205509 e6b61bd8 c0133451 1000 [184901.803121]0008 0004 0e713800 c1acaa1c f7c9c6c0 c1acaa1c [184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 c01fb829 c016ea8d [184901.854552] Call Trace: [184901.862723] [] __cfq_dispatch_requests+0x79/0x170 [184901.879971] [] do_generic_mapping_read+0x281/0x470 [184901.897478] [] cfq_dispatch_requests+0x69/0x90 [184901.913946] [] elv_next_request+0x39/0x130 [184901.929375] [] bio_endio+0x5d/0x90 [184901.942725] [] scsi_request_fn+0x45/0x280 [184901.957896] [] blk_run_queue+0x32/0x70 [184901.972286] [] scsi_next_command+0x30/0x50 [184901.987716] [] scsi_end_request+0x9b/0xc0 [184902.002886] [] scsi_io_completion+0x81/0x330 [184902.018835] [] scsi_delete_timer+0xb/0x20 [184902.034006] [] ata_scsi_qc_complete+0x65/0xd0 [184902.050214] [] sd_rw_intr+0x8b/0x220 [184902.064085] [] ata_altstatus+0x1c/0x20 [184902.078475] [] ata_hsm_move+0x14d/0x3f0 [184902.093126] [] scsi_finish_command+0x40/0x60 [184902.109075] [] scsi_softirq_done+0x6f/0xe0 [184902.124506] [] sil_interrupt+0x81/0x90 [184902.138895] [] blk_done_softirq+0x58/0x70 [184902.154066] [] __do_softirq+0x6f/0x80 [184902.181806] [] do_IRQ+0x3e/0x80 [184902.194380] [] common_interrupt+0x23/0x28 [184902.209551] === [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80 [184902.280564] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:dbaefde0 [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt [184902.322746] Rebooting in 60 seconds.. Ok, it's taken be _ages_ to get the system to a point I can reproduce this, but I think it's now reproducible with a couple of hours beating. The bad news is it looks like it has not tickled any of your debugging markers! This was the 1st thing printed on a clean serial console, so nothing above that for days. I did double check and I was/am certainly running the kernel with the debug patch compiled in. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: Thanks for testing Brad, be sure to use the next patch I sent instead. The one from this mail shouldn't even get you booted. So double check that you are still using CFQ :-) [184901.576773] BUG: unable to handle kernel NULL pointer dereference at virtual address 005c [184901.602612] printing eip: [184901.610990] c0205399 [184901.617796] *pde = [184901.626421] Oops: [#1] [184901.635044] Modules linked in: [184901.644500] CPU:0 [184901.644501] EIP:0060:[c0205399]Not tainted VLI [184901.644503] EFLAGS: 00010082 (2.6.21-rc7 #7) [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70 [184901.696168] eax: f7f078e0 ebx: f7ca2794 ecx: 0004 edx: [184901.716743] esi: c1acaa1c edi: f7c9c6c0 ebp: esp: dbaefde0 [184901.737316] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 task.ti=dbaee000) [184901.777422] Stack: c1acaa1c f7c9c6c0 c0205509 e6b61bd8 c0133451 1000 [184901.803121]0008 0004 0e713800 c1acaa1c f7c9c6c0 c1acaa1c [184901.828837]c0205749 f7ca2794 f7ca2794 f79bc000 0282 c01fb829 c016ea8d [184901.854552] Call Trace: [184901.862723] [c0205509] __cfq_dispatch_requests+0x79/0x170 [184901.879971] [c0133451] do_generic_mapping_read+0x281/0x470 [184901.897478] [c0205749] cfq_dispatch_requests+0x69/0x90 [184901.913946] [c01fb829] elv_next_request+0x39/0x130 [184901.929375] [c016ea8d] bio_endio+0x5d/0x90 [184901.942725] [c0270375] scsi_request_fn+0x45/0x280 [184901.957896] [c01fde92] blk_run_queue+0x32/0x70 [184901.972286] [c026f8e0] scsi_next_command+0x30/0x50 [184901.987716] [c026f9cb] scsi_end_request+0x9b/0xc0 [184902.002886] [c026fb31] scsi_io_completion+0x81/0x330 [184902.018835] [c026daeb] scsi_delete_timer+0xb/0x20 [184902.034006] [c027ee35] ata_scsi_qc_complete+0x65/0xd0 [184902.050214] [c02751bb] sd_rw_intr+0x8b/0x220 [184902.064085] [c0280c0c] ata_altstatus+0x1c/0x20 [184902.078475] [c027b68d] ata_hsm_move+0x14d/0x3f0 [184902.093126] [c026bcc0] scsi_finish_command+0x40/0x60 [184902.109075] [c02702bf] scsi_softirq_done+0x6f/0xe0 [184902.124506] [c0285f61] sil_interrupt+0x81/0x90 [184902.138895] [c01ffa78] blk_done_softirq+0x58/0x70 [184902.154066] [c011721f] __do_softirq+0x6f/0x80 [184902.181806] [c0104cee] do_IRQ+0x3e/0x80 [184902.194380] [c010322f] common_interrupt+0x23/0x28 [184902.209551] === [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80 [184902.280564] EIP: [c0205399] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:dbaefde0 [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt [184902.322746] Rebooting in 60 seconds.. Ok, it's taken be _ages_ to get the system to a point I can reproduce this, but I think it's now reproducible with a couple of hours beating. The bad news is it looks like it has not tickled any of your debugging markers! This was the 1st thing printed on a clean serial console, so nothing above that for days. I did double check and I was/am certainly running the kernel with the debug patch compiled in. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 18 2007, Brad Campbell wrote: > Jens Axboe wrote: > > >I had something similar for generic_unplug_request() as well, but didn't > >see/hear any reports of it being tried out. Here's a complete debugging > >patch for this and other potential dangers. > > > > I had a clean 2.6.21-rc7 that I forgot to change the default sched on take > down my main server last night. Another oops of the same variety. > Unfortunately only a digital photo to go by though. > > I've removed Neil's patch, applied yours and fired it up on my 4 drive test > box (the one that was causing the trouble before). I've finally scrounged > another machine for a serial logger and I'll beat it until it breaks. > Perhaps if I can get this to reliably trigger again we can track it down. Thanks for testing Brad, be sure to use the next patch I sent instead. The one from this mail shouldn't even get you booted. So double check that you are still using CFQ :-) -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 18 2007, Jens Axboe wrote: > I had something similar for generic_unplug_request() as well, but didn't > see/hear any reports of it being tried out. Here's a complete debugging > patch for this and other potential dangers. Which had a bug (do the check _after_ deleting from the rbtree, otherwise it'll go rightfully BUG() the first time). diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 6853fb7..9aa7bf7 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -644,12 +644,15 @@ static void cfq_remove_request(struct request *rq) { struct cfq_queue *cfqq = RQ_CFQQ(rq); + BUG_ON(!irqs_disabled()); if (cfqq->next_rq == rq) cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq); list_del_init(>queuelist); cfq_del_rq_rb(rq); + BUG_ON(!cfqq->next_rq && !RB_EMPTY_ROOT(>sort_list)); + if (rq_is_meta(rq)) { WARN_ON(!cfqq->meta_pending); cfqq->meta_pending--; @@ -1231,6 +1234,8 @@ static void cfq_exit_single_io_context(struct cfq_io_context *cic) { struct cfq_data *cfqd = cic->key; + BUG_ON(irqs_disabled()); + if (cfqd) { request_queue_t *q = cfqd->queue; @@ -1699,6 +1704,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, /* * check if this request is a better next-serve candidate)) { */ + BUG_ON(!irqs_disabled()); cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq); BUG_ON(!cfqq->next_rq); @@ -2014,6 +2020,7 @@ static void cfq_exit_queue(elevator_t *e) cfq_shutdown_timer_wq(cfqd); + BUG_ON(irqs_disabled()); spin_lock_irq(q->queue_lock); if (cfqd->active_queue) diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 3de0695..c16863e 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device); **/ void generic_unplug_device(request_queue_t *q) { + BUG_ON(irqs_disabled()); + spin_lock_irq(q->queue_lock); __generic_unplug_device(q); spin_unlock_irq(q->queue_lock); -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: I had something similar for generic_unplug_request() as well, but didn't see/hear any reports of it being tried out. Here's a complete debugging patch for this and other potential dangers. I had a clean 2.6.21-rc7 that I forgot to change the default sched on take down my main server last night. Another oops of the same variety. Unfortunately only a digital photo to go by though. I've removed Neil's patch, applied yours and fired it up on my 4 drive test box (the one that was causing the trouble before). I've finally scrounged another machine for a serial logger and I'll beat it until it breaks. Perhaps if I can get this to reliably trigger again we can track it down. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tue, Apr 17 2007, Neil Brown wrote: > On Monday April 16, [EMAIL PROTECTED] wrote: > > > > cfq_dispatch_insert() was called with rq == 0. This one is getting really > > annoying... and md is involved again (RAID0 this time.) > > Yeah... weird. > RAID0 is so light-weight and so different from RAID1 or RAID5 that I > feel fairly safe concluding that the problem isn't in or near md. > But that doesn't help you. Well the fact is that we have 3-4 distinct reports of this oops, and all of them are using md. No reports have been filed where md isn't managing the disks. While this doesn't conclusively put the finger of blame on md, it is still likely. It could bug a CFQ bug too of course, or (more likely), some bad interaction between md and CFQ. > This really feels like a locking problem. Very much. > The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is > not NULL. That happens plenty of times in the code (particularly as > the first request is inserted) but always under ->queue_lock so it > should never be visible to cfq_dispatch_insert.. > > Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls > elv_next_request which could ultimately call __cfq_dispatch_requests > without taking ->queue_lock (that I can see). But you probably aren't > using ide-scsi (does anyone?). > > Given that interrupts are always disabled when queue_lock is taken, it > might be useful to add >WARN_ON(!irqs_disabled()); > every time ->next_rq is set. > Something like the following. > > It might show something useful if we are lucky. I had something similar for generic_unplug_request() as well, but didn't see/hear any reports of it being tried out. Here's a complete debugging patch for this and other potential dangers. diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index b6491c0..8f749aa 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -587,8 +587,11 @@ static void cfq_remove_request(struct request *rq) { struct cfq_queue *cfqq = RQ_CFQQ(rq); - if (cfqq->next_rq == rq) + BUG_ON(!irqs_disabled()); + if (cfqq->next_rq == rq) { cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq); + BUG_ON(!cfqq->next_rq && !RB_EMPTY_ROOT(>sort_list)); + } list_del_init(>queuelist); cfq_del_rq_rb(rq); @@ -1637,6 +1640,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, /* * check if this request is a better next-serve candidate)) { */ + BUG_ON(!irqs_disabled()); cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq); BUG_ON(!cfqq->next_rq); diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 3de0695..c16863e 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device); **/ void generic_unplug_device(request_queue_t *q) { + BUG_ON(irqs_disabled()); + spin_lock_irq(q->queue_lock); __generic_unplug_device(q); spin_unlock_irq(q->queue_lock); -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Tue, Apr 17 2007, Neil Brown wrote: On Monday April 16, [EMAIL PROTECTED] wrote: cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. But that doesn't help you. Well the fact is that we have 3-4 distinct reports of this oops, and all of them are using md. No reports have been filed where md isn't managing the disks. While this doesn't conclusively put the finger of blame on md, it is still likely. It could bug a CFQ bug too of course, or (more likely), some bad interaction between md and CFQ. This really feels like a locking problem. Very much. The problem occurs when -next_rq is NULL, but -sort_list.rb_node is not NULL. That happens plenty of times in the code (particularly as the first request is inserted) but always under -queue_lock so it should never be visible to cfq_dispatch_insert.. Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls elv_next_request which could ultimately call __cfq_dispatch_requests without taking -queue_lock (that I can see). But you probably aren't using ide-scsi (does anyone?). Given that interrupts are always disabled when queue_lock is taken, it might be useful to add WARN_ON(!irqs_disabled()); every time -next_rq is set. Something like the following. It might show something useful if we are lucky. I had something similar for generic_unplug_request() as well, but didn't see/hear any reports of it being tried out. Here's a complete debugging patch for this and other potential dangers. diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index b6491c0..8f749aa 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -587,8 +587,11 @@ static void cfq_remove_request(struct request *rq) { struct cfq_queue *cfqq = RQ_CFQQ(rq); - if (cfqq-next_rq == rq) + BUG_ON(!irqs_disabled()); + if (cfqq-next_rq == rq) { cfqq-next_rq = cfq_find_next_rq(cfqq-cfqd, cfqq, rq); + BUG_ON(!cfqq-next_rq !RB_EMPTY_ROOT(cfqq-sort_list)); + } list_del_init(rq-queuelist); cfq_del_rq_rb(rq); @@ -1637,6 +1640,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, /* * check if this request is a better next-serve candidate)) { */ + BUG_ON(!irqs_disabled()); cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq); BUG_ON(!cfqq-next_rq); diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 3de0695..c16863e 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device); **/ void generic_unplug_device(request_queue_t *q) { + BUG_ON(irqs_disabled()); + spin_lock_irq(q-queue_lock); __generic_unplug_device(q); spin_unlock_irq(q-queue_lock); -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 18 2007, Jens Axboe wrote: I had something similar for generic_unplug_request() as well, but didn't see/hear any reports of it being tried out. Here's a complete debugging patch for this and other potential dangers. Which had a bug (do the check _after_ deleting from the rbtree, otherwise it'll go rightfully BUG() the first time). diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 6853fb7..9aa7bf7 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -644,12 +644,15 @@ static void cfq_remove_request(struct request *rq) { struct cfq_queue *cfqq = RQ_CFQQ(rq); + BUG_ON(!irqs_disabled()); if (cfqq-next_rq == rq) cfqq-next_rq = cfq_find_next_rq(cfqq-cfqd, cfqq, rq); list_del_init(rq-queuelist); cfq_del_rq_rb(rq); + BUG_ON(!cfqq-next_rq !RB_EMPTY_ROOT(cfqq-sort_list)); + if (rq_is_meta(rq)) { WARN_ON(!cfqq-meta_pending); cfqq-meta_pending--; @@ -1231,6 +1234,8 @@ static void cfq_exit_single_io_context(struct cfq_io_context *cic) { struct cfq_data *cfqd = cic-key; + BUG_ON(irqs_disabled()); + if (cfqd) { request_queue_t *q = cfqd-queue; @@ -1699,6 +1704,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, /* * check if this request is a better next-serve candidate)) { */ + BUG_ON(!irqs_disabled()); cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq); BUG_ON(!cfqq-next_rq); @@ -2014,6 +2020,7 @@ static void cfq_exit_queue(elevator_t *e) cfq_shutdown_timer_wq(cfqd); + BUG_ON(irqs_disabled()); spin_lock_irq(q-queue_lock); if (cfqd-active_queue) diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 3de0695..c16863e 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device); **/ void generic_unplug_device(request_queue_t *q) { + BUG_ON(irqs_disabled()); + spin_lock_irq(q-queue_lock); __generic_unplug_device(q); spin_unlock_irq(q-queue_lock); -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Jens Axboe wrote: I had something similar for generic_unplug_request() as well, but didn't see/hear any reports of it being tried out. Here's a complete debugging patch for this and other potential dangers. I had a clean 2.6.21-rc7 that I forgot to change the default sched on take down my main server last night. Another oops of the same variety. Unfortunately only a digital photo to go by though. I've removed Neil's patch, applied yours and fired it up on my 4 drive test box (the one that was causing the trouble before). I've finally scrounged another machine for a serial logger and I'll beat it until it breaks. Perhaps if I can get this to reliably trigger again we can track it down. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Wed, Apr 18 2007, Brad Campbell wrote: Jens Axboe wrote: I had something similar for generic_unplug_request() as well, but didn't see/hear any reports of it being tried out. Here's a complete debugging patch for this and other potential dangers. I had a clean 2.6.21-rc7 that I forgot to change the default sched on take down my main server last night. Another oops of the same variety. Unfortunately only a digital photo to go by though. I've removed Neil's patch, applied yours and fired it up on my 4 drive test box (the one that was causing the trouble before). I've finally scrounged another machine for a serial logger and I'll beat it until it breaks. Perhaps if I can get this to reliably trigger again we can track it down. Thanks for testing Brad, be sure to use the next patch I sent instead. The one from this mail shouldn't even get you booted. So double check that you are still using CFQ :-) -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi, On Tuesday 17 April 2007, Neil Brown wrote: > On Monday April 16, [EMAIL PROTECTED] wrote: > > > > cfq_dispatch_insert() was called with rq == 0. This one is getting really > > annoying... and md is involved again (RAID0 this time.) > > Yeah... weird. > RAID0 is so light-weight and so different from RAID1 or RAID5 that I > feel fairly safe concluding that the problem isn't in or near md. > But that doesn't help you. > > This really feels like a locking problem. > > The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is > not NULL. That happens plenty of times in the code (particularly as > the first request is inserted) but always under ->queue_lock so it > should never be visible to cfq_dispatch_insert.. > > Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls > elv_next_request which could ultimately call __cfq_dispatch_requests > without taking ->queue_lock (that I can see). But you probably aren't > using ide-scsi (does anyone?). ide-scsi is holding ide_lock while calling elv_next_request() (for ide ide_lock == ->queue_lock) Also from the original report: On Sunday 15 April 2007, Brad Campbell wrote: > > The box is booted with PXE and runs an nfsroot. It's Debian 3.1. It has 2 SIL > 3112 controllers in it > with 4 WD 200GB ATA drives all on PATA->SATA bridges. and you can even see libata functions in the OOPS... Bart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: On Monday April 16, [EMAIL PROTECTED] wrote: cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. Just a quick thought. Could this issue in any way cause silent data corruption on write to the md? I've been chasing a bug for months now on this box which resulted in a small bit of seemingly random corruption in large copies to /dev/md0. Problem is, I changed the PSU in the box (which appeared to help a little) but then about the same time I discovered the oops was in cfq so changed the scheduler also. I've not been able to oops or reproduce the data corruption since. I've now got to go back to cfq and see if I can get it to panic reproducibly again, but I'm wondering now if there may have been another issue there. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: On Monday April 16, [EMAIL PROTECTED] wrote: cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. But that doesn't help you. It's raid-5 here and it was degraded. I've just added the extra disk now and I've been running with the deadline scheduler to work around the issue. I'll add Neil's patch and try some tests with cfq tonight if I get the chance. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: On Monday April 16, [EMAIL PROTECTED] wrote: cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. But that doesn't help you. It's raid-5 here and it was degraded. I've just added the extra disk now and I've been running with the deadline scheduler to work around the issue. I'll add Neil's patch and try some tests with cfq tonight if I get the chance. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Neil Brown wrote: On Monday April 16, [EMAIL PROTECTED] wrote: cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. Just a quick thought. Could this issue in any way cause silent data corruption on write to the md? I've been chasing a bug for months now on this box which resulted in a small bit of seemingly random corruption in large copies to /dev/md0. Problem is, I changed the PSU in the box (which appeared to help a little) but then about the same time I discovered the oops was in cfq so changed the scheduler also. I've not been able to oops or reproduce the data corruption since. I've now got to go back to cfq and see if I can get it to panic reproducibly again, but I'm wondering now if there may have been another issue there. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi, On Tuesday 17 April 2007, Neil Brown wrote: On Monday April 16, [EMAIL PROTECTED] wrote: cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. But that doesn't help you. This really feels like a locking problem. The problem occurs when -next_rq is NULL, but -sort_list.rb_node is not NULL. That happens plenty of times in the code (particularly as the first request is inserted) but always under -queue_lock so it should never be visible to cfq_dispatch_insert.. Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls elv_next_request which could ultimately call __cfq_dispatch_requests without taking -queue_lock (that I can see). But you probably aren't using ide-scsi (does anyone?). ide-scsi is holding ide_lock while calling elv_next_request() (for ide ide_lock == -queue_lock) Also from the original report: On Sunday 15 April 2007, Brad Campbell wrote: The box is booted with PXE and runs an nfsroot. It's Debian 3.1. It has 2 SIL 3112 controllers in it with 4 WD 200GB ATA drives all on PATA-SATA bridges. and you can even see libata functions in the OOPS... Bart - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Monday April 16, [EMAIL PROTECTED] wrote: > > cfq_dispatch_insert() was called with rq == 0. This one is getting really > annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. But that doesn't help you. This really feels like a locking problem. The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is not NULL. That happens plenty of times in the code (particularly as the first request is inserted) but always under ->queue_lock so it should never be visible to cfq_dispatch_insert.. Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls elv_next_request which could ultimately call __cfq_dispatch_requests without taking ->queue_lock (that I can see). But you probably aren't using ide-scsi (does anyone?). Given that interrupts are always disabled when queue_lock is taken, it might be useful to add WARN_ON(!irqs_disabled()); every time ->next_rq is set. Something like the following. It might show something useful if we are lucky. NeilBrown diff .prev/block/cfq-iosched.c ./block/cfq-iosched.c --- .prev/block/cfq-iosched.c 2007-04-17 15:01:36.0 +1000 +++ ./block/cfq-iosched.c 2007-04-17 15:02:25.0 +1000 @@ -628,6 +628,7 @@ static void cfq_remove_request(struct re { struct cfq_queue *cfqq = RQ_CFQQ(rq); + BUG_ON(!irqs_disabled()); if (cfqq->next_rq == rq) cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq); @@ -1810,6 +1811,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, s /* * check if this request is a better next-serve candidate)) { */ + BUG_ON(!irqs_disabled()); cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq); BUG_ON(!cfqq->next_rq); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Brad Campbell wrote: > Brad Campbell wrote: >> G'day all, >> >> All I have is a digital photo of this oops. (It's 3.5mb). I have >> serial console configured, but Murphy is watching me carefully and I >> just can't seem to reproduce it while logging the console output. >> > > And as usual, after trying to capture one for 4 days, I get one 10 mins > after I've sent the E-mail :) > > I think I've just found a way to make this easier to reproduce as > /dev/sdd was not even mounted this > time. I just cold booted and started an md5sum -c run on a directory of > about 180GB. > > [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at > virtual address 005c > [ 2566.218242] printing eip: > [ 2566.226362] c0203169 > [ 2566.232906] *pde = > [ 2566.241274] Oops: [#1] > [ 2566.249637] Modules linked in: > [ 2566.258832] CPU:0 > [ 2566.258833] EIP:0060:[]Not tainted VLI > [ 2566.258834] EFLAGS: 00010082 (2.6.21-rc6-git5 #1) > [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70 > [ 2566.310761] eax: f7a0eae0 ebx: f7a0cb28 ecx: e2f869e8 edx: > > [ 2566.331076] esi: f79fea7c edi: f7d04ac0 ebp: esp: > f6945de0 > [ 2566.351388] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 > task.ti=f6944000) > [ 2566.390975] Stack: f79fea7c f7d04ac0 c02032d9 > f6ae5ef0 c0133411 1000 > [ 2566.416414]0008 0004 0b582fd4 f79fea7c > f7d04ac0 f79fea7c > [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 > c01fb7a9 c016ea4d > [ 2566.467326] Call Trace: > [ 2566.475236] [] __cfq_dispatch_requests+0x79/0x170 > [ 2566.492224] [] do_generic_mapping_read+0x281/0x470 > [ 2566.509473] [] cfq_dispatch_requests+0x69/0x90 > [ 2566.525681] [] elv_next_request+0x39/0x130 > [ 2566.540850] [] bio_endio+0x5d/0x90 > [ 2566.553942] [] scsi_request_fn+0x45/0x280 > [ 2566.568851] [] blk_run_queue+0x32/0x70 > [ 2566.582982] [] scsi_next_command+0x30/0x50 > [ 2566.598154] [] scsi_end_request+0x9b/0xc0 > [ 2566.613063] [] scsi_io_completion+0x81/0x330 > [ 2566.628751] [] scsi_delete_timer+0xb/0x20 > [ 2566.643661] [] ata_scsi_qc_complete+0x65/0xd0 > [ 2566.659613] [] sd_rw_intr+0x8b/0x220 > [ 2566.673222] [] ata_altstatus+0x1c/0x20 > [ 2566.687352] [] ata_hsm_move+0x14d/0x3f0 > [ 2566.701744] [] scsi_finish_command+0x40/0x60 > [ 2566.717434] [] scsi_softirq_done+0x6f/0xe0 > [ 2566.732604] [] sil_interrupt+0x81/0x90 > [ 2566.746733] [] blk_done_softirq+0x58/0x70 > [ 2566.761644] [] __do_softirq+0x6f/0x80 > [ 2566.775516] [] do_softirq+0x27/0x30 > [ 2566.788866] [] do_IRQ+0x3e/0x80 > [ 2566.801177] [] common_interrupt+0x23/0x28 > [ 2566.816090] === > [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 > 00 83 ec 10 89 1c 24 89 6c > 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04 > 89 d0 e8 4a fa ff ff 8b 45 14 > 89 ea 25 01 80 > [ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP > 0068:f6945de0 > [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Brad Campbell wrote: Brad Campbell wrote: G'day all, All I have is a digital photo of this oops. (It's 3.5mb). I have serial console configured, but Murphy is watching me carefully and I just can't seem to reproduce it while logging the console output. And as usual, after trying to capture one for 4 days, I get one 10 mins after I've sent the E-mail :) I think I've just found a way to make this easier to reproduce as /dev/sdd was not even mounted this time. I just cold booted and started an md5sum -c run on a directory of about 180GB. [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at virtual address 005c [ 2566.218242] printing eip: [ 2566.226362] c0203169 [ 2566.232906] *pde = [ 2566.241274] Oops: [#1] [ 2566.249637] Modules linked in: [ 2566.258832] CPU:0 [ 2566.258833] EIP:0060:[c0203169]Not tainted VLI [ 2566.258834] EFLAGS: 00010082 (2.6.21-rc6-git5 #1) [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70 [ 2566.310761] eax: f7a0eae0 ebx: f7a0cb28 ecx: e2f869e8 edx: [ 2566.331076] esi: f79fea7c edi: f7d04ac0 ebp: esp: f6945de0 [ 2566.351388] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 task.ti=f6944000) [ 2566.390975] Stack: f79fea7c f7d04ac0 c02032d9 f6ae5ef0 c0133411 1000 [ 2566.416414]0008 0004 0b582fd4 f79fea7c f7d04ac0 f79fea7c [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 c016ea4d [ 2566.467326] Call Trace: [ 2566.475236] [c02032d9] __cfq_dispatch_requests+0x79/0x170 [ 2566.492224] [c0133411] do_generic_mapping_read+0x281/0x470 [ 2566.509473] [c0203519] cfq_dispatch_requests+0x69/0x90 [ 2566.525681] [c01fb7a9] elv_next_request+0x39/0x130 [ 2566.540850] [c016ea4d] bio_endio+0x5d/0x90 [ 2566.553942] [c0268635] scsi_request_fn+0x45/0x280 [ 2566.568851] [c01fdde2] blk_run_queue+0x32/0x70 [ 2566.582982] [c0267ba0] scsi_next_command+0x30/0x50 [ 2566.598154] [c0267c8b] scsi_end_request+0x9b/0xc0 [ 2566.613063] [c0267df1] scsi_io_completion+0x81/0x330 [ 2566.628751] [c0265dab] scsi_delete_timer+0xb/0x20 [ 2566.643661] [c02770f5] ata_scsi_qc_complete+0x65/0xd0 [ 2566.659613] [c026d47b] sd_rw_intr+0x8b/0x220 [ 2566.673222] [c0278ecc] ata_altstatus+0x1c/0x20 [ 2566.687352] [c027394d] ata_hsm_move+0x14d/0x3f0 [ 2566.701744] [c0263f80] scsi_finish_command+0x40/0x60 [ 2566.717434] [c026857f] scsi_softirq_done+0x6f/0xe0 [ 2566.732604] [c027d521] sil_interrupt+0x81/0x90 [ 2566.746733] [c01ff9b8] blk_done_softirq+0x58/0x70 [ 2566.761644] [c011721f] __do_softirq+0x6f/0x80 [ 2566.775516] [c0117257] do_softirq+0x27/0x30 [ 2566.788866] [c0104cee] do_IRQ+0x3e/0x80 [ 2566.801177] [c010322f] common_interrupt+0x23/0x28 [ 2566.816090] === [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04 89 d0 e8 4a fa ff ff 8b 45 14 89 ea 25 01 80 [ 2566.886586] EIP: [c0203169] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:f6945de0 [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Monday April 16, [EMAIL PROTECTED] wrote: cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. But that doesn't help you. This really feels like a locking problem. The problem occurs when -next_rq is NULL, but -sort_list.rb_node is not NULL. That happens plenty of times in the code (particularly as the first request is inserted) but always under -queue_lock so it should never be visible to cfq_dispatch_insert.. Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls elv_next_request which could ultimately call __cfq_dispatch_requests without taking -queue_lock (that I can see). But you probably aren't using ide-scsi (does anyone?). Given that interrupts are always disabled when queue_lock is taken, it might be useful to add WARN_ON(!irqs_disabled()); every time -next_rq is set. Something like the following. It might show something useful if we are lucky. NeilBrown diff .prev/block/cfq-iosched.c ./block/cfq-iosched.c --- .prev/block/cfq-iosched.c 2007-04-17 15:01:36.0 +1000 +++ ./block/cfq-iosched.c 2007-04-17 15:02:25.0 +1000 @@ -628,6 +628,7 @@ static void cfq_remove_request(struct re { struct cfq_queue *cfqq = RQ_CFQQ(rq); + BUG_ON(!irqs_disabled()); if (cfqq-next_rq == rq) cfqq-next_rq = cfq_find_next_rq(cfqq-cfqd, cfqq, rq); @@ -1810,6 +1811,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, s /* * check if this request is a better next-serve candidate)) { */ + BUG_ON(!irqs_disabled()); cfqq-next_rq = cfq_choose_req(cfqd, cfqq-next_rq, rq); BUG_ON(!cfqq-next_rq); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Adrian Bunk wrote: [ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ] On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote: Brad Campbell wrote: G'day all, All I have is a digital photo of this oops. (It's 3.5mb). I have serial console configured, but Murphy is watching me carefully and I just can't seem to reproduce it while logging the console output. And as usual, after trying to capture one for 4 days, I get one 10 mins after I've sent the E-mail :) I think I've just found a way to make this easier to reproduce as /dev/sdd was not even mounted this time. I just cold booted and started an md5sum -c run on a directory of about 180GB. Is this also present with kernel 2.6.20 or is it a regression? Yes, I reproduced it with 2.6.20 but appear to be unable to do so with 2.6.19. -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ] On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote: > Brad Campbell wrote: > >G'day all, > > > >All I have is a digital photo of this oops. (It's 3.5mb). I have serial > >console configured, but Murphy is watching me carefully and I just can't > >seem to reproduce it while logging the console output. > > > > And as usual, after trying to capture one for 4 days, I get one 10 mins > after I've sent the E-mail :) > > I think I've just found a way to make this easier to reproduce as /dev/sdd > was not even mounted this > time. I just cold booted and started an md5sum -c run on a directory of > about 180GB. Is this also present with kernel 2.6.20 or is it a regression? > [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at > virtual address 005c > [ 2566.218242] printing eip: > [ 2566.226362] c0203169 > [ 2566.232906] *pde = > [ 2566.241274] Oops: [#1] > [ 2566.249637] Modules linked in: > [ 2566.258832] CPU:0 > [ 2566.258833] EIP:0060:[]Not tainted VLI > [ 2566.258834] EFLAGS: 00010082 (2.6.21-rc6-git5 #1) > [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70 > [ 2566.310761] eax: f7a0eae0 ebx: f7a0cb28 ecx: e2f869e8 edx: > [ 2566.331076] esi: f79fea7c edi: f7d04ac0 ebp: esp: f6945de0 > [ 2566.351388] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 > task.ti=f6944000) > [ 2566.390975] Stack: f79fea7c f7d04ac0 c02032d9 f6ae5ef0 > c0133411 1000 > [ 2566.416414]0008 0004 0b582fd4 f79fea7c f7d04ac0 > f79fea7c > [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 > c016ea4d > [ 2566.467326] Call Trace: > [ 2566.475236] [] __cfq_dispatch_requests+0x79/0x170 > [ 2566.492224] [] do_generic_mapping_read+0x281/0x470 > [ 2566.509473] [] cfq_dispatch_requests+0x69/0x90 > [ 2566.525681] [] elv_next_request+0x39/0x130 > [ 2566.540850] [] bio_endio+0x5d/0x90 > [ 2566.553942] [] scsi_request_fn+0x45/0x280 > [ 2566.568851] [] blk_run_queue+0x32/0x70 > [ 2566.582982] [] scsi_next_command+0x30/0x50 > [ 2566.598154] [] scsi_end_request+0x9b/0xc0 > [ 2566.613063] [] scsi_io_completion+0x81/0x330 > [ 2566.628751] [] scsi_delete_timer+0xb/0x20 > [ 2566.643661] [] ata_scsi_qc_complete+0x65/0xd0 > [ 2566.659613] [] sd_rw_intr+0x8b/0x220 > [ 2566.673222] [] ata_altstatus+0x1c/0x20 > [ 2566.687352] [] ata_hsm_move+0x14d/0x3f0 > [ 2566.701744] [] scsi_finish_command+0x40/0x60 > [ 2566.717434] [] scsi_softirq_done+0x6f/0xe0 > [ 2566.732604] [] sil_interrupt+0x81/0x90 > [ 2566.746733] [] blk_done_softirq+0x58/0x70 > [ 2566.761644] [] __do_softirq+0x6f/0x80 > [ 2566.775516] [] do_softirq+0x27/0x30 > [ 2566.788866] [] do_IRQ+0x3e/0x80 > [ 2566.801177] [] common_interrupt+0x23/0x28 > [ 2566.816090] === > [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 > 83 ec 10 89 1c 24 89 6c > 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04 89 > d0 e8 4a fa ff ff 8b 45 14 > 89 ea 25 01 80 > [ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP > 0068:f6945de0 > [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Brad Campbell wrote: G'day all, All I have is a digital photo of this oops. (It's 3.5mb). I have serial console configured, but Murphy is watching me carefully and I just can't seem to reproduce it while logging the console output. And as usual, after trying to capture one for 4 days, I get one 10 mins after I've sent the E-mail :) I think I've just found a way to make this easier to reproduce as /dev/sdd was not even mounted this time. I just cold booted and started an md5sum -c run on a directory of about 180GB. [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at virtual address 005c [ 2566.218242] printing eip: [ 2566.226362] c0203169 [ 2566.232906] *pde = [ 2566.241274] Oops: [#1] [ 2566.249637] Modules linked in: [ 2566.258832] CPU:0 [ 2566.258833] EIP:0060:[]Not tainted VLI [ 2566.258834] EFLAGS: 00010082 (2.6.21-rc6-git5 #1) [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70 [ 2566.310761] eax: f7a0eae0 ebx: f7a0cb28 ecx: e2f869e8 edx: [ 2566.331076] esi: f79fea7c edi: f7d04ac0 ebp: esp: f6945de0 [ 2566.351388] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 task.ti=f6944000) [ 2566.390975] Stack: f79fea7c f7d04ac0 c02032d9 f6ae5ef0 c0133411 1000 [ 2566.416414]0008 0004 0b582fd4 f79fea7c f7d04ac0 f79fea7c [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 c016ea4d [ 2566.467326] Call Trace: [ 2566.475236] [] __cfq_dispatch_requests+0x79/0x170 [ 2566.492224] [] do_generic_mapping_read+0x281/0x470 [ 2566.509473] [] cfq_dispatch_requests+0x69/0x90 [ 2566.525681] [] elv_next_request+0x39/0x130 [ 2566.540850] [] bio_endio+0x5d/0x90 [ 2566.553942] [] scsi_request_fn+0x45/0x280 [ 2566.568851] [] blk_run_queue+0x32/0x70 [ 2566.582982] [] scsi_next_command+0x30/0x50 [ 2566.598154] [] scsi_end_request+0x9b/0xc0 [ 2566.613063] [] scsi_io_completion+0x81/0x330 [ 2566.628751] [] scsi_delete_timer+0xb/0x20 [ 2566.643661] [] ata_scsi_qc_complete+0x65/0xd0 [ 2566.659613] [] sd_rw_intr+0x8b/0x220 [ 2566.673222] [] ata_altstatus+0x1c/0x20 [ 2566.687352] [] ata_hsm_move+0x14d/0x3f0 [ 2566.701744] [] scsi_finish_command+0x40/0x60 [ 2566.717434] [] scsi_softirq_done+0x6f/0xe0 [ 2566.732604] [] sil_interrupt+0x81/0x90 [ 2566.746733] [] blk_done_softirq+0x58/0x70 [ 2566.761644] [] __do_softirq+0x6f/0x80 [ 2566.775516] [] do_softirq+0x27/0x30 [ 2566.788866] [] do_IRQ+0x3e/0x80 [ 2566.801177] [] common_interrupt+0x23/0x28 [ 2566.816090] === [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04 89 d0 e8 4a fa ff ff 8b 45 14 89 ea 25 01 80 [ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:f6945de0 [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ] On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote: Brad Campbell wrote: G'day all, All I have is a digital photo of this oops. (It's 3.5mb). I have serial console configured, but Murphy is watching me carefully and I just can't seem to reproduce it while logging the console output. And as usual, after trying to capture one for 4 days, I get one 10 mins after I've sent the E-mail :) I think I've just found a way to make this easier to reproduce as /dev/sdd was not even mounted this time. I just cold booted and started an md5sum -c run on a directory of about 180GB. Is this also present with kernel 2.6.20 or is it a regression? [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at virtual address 005c [ 2566.218242] printing eip: [ 2566.226362] c0203169 [ 2566.232906] *pde = [ 2566.241274] Oops: [#1] [ 2566.249637] Modules linked in: [ 2566.258832] CPU:0 [ 2566.258833] EIP:0060:[c0203169]Not tainted VLI [ 2566.258834] EFLAGS: 00010082 (2.6.21-rc6-git5 #1) [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70 [ 2566.310761] eax: f7a0eae0 ebx: f7a0cb28 ecx: e2f869e8 edx: [ 2566.331076] esi: f79fea7c edi: f7d04ac0 ebp: esp: f6945de0 [ 2566.351388] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 task.ti=f6944000) [ 2566.390975] Stack: f79fea7c f7d04ac0 c02032d9 f6ae5ef0 c0133411 1000 [ 2566.416414]0008 0004 0b582fd4 f79fea7c f7d04ac0 f79fea7c [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 c016ea4d [ 2566.467326] Call Trace: [ 2566.475236] [c02032d9] __cfq_dispatch_requests+0x79/0x170 [ 2566.492224] [c0133411] do_generic_mapping_read+0x281/0x470 [ 2566.509473] [c0203519] cfq_dispatch_requests+0x69/0x90 [ 2566.525681] [c01fb7a9] elv_next_request+0x39/0x130 [ 2566.540850] [c016ea4d] bio_endio+0x5d/0x90 [ 2566.553942] [c0268635] scsi_request_fn+0x45/0x280 [ 2566.568851] [c01fdde2] blk_run_queue+0x32/0x70 [ 2566.582982] [c0267ba0] scsi_next_command+0x30/0x50 [ 2566.598154] [c0267c8b] scsi_end_request+0x9b/0xc0 [ 2566.613063] [c0267df1] scsi_io_completion+0x81/0x330 [ 2566.628751] [c0265dab] scsi_delete_timer+0xb/0x20 [ 2566.643661] [c02770f5] ata_scsi_qc_complete+0x65/0xd0 [ 2566.659613] [c026d47b] sd_rw_intr+0x8b/0x220 [ 2566.673222] [c0278ecc] ata_altstatus+0x1c/0x20 [ 2566.687352] [c027394d] ata_hsm_move+0x14d/0x3f0 [ 2566.701744] [c0263f80] scsi_finish_command+0x40/0x60 [ 2566.717434] [c026857f] scsi_softirq_done+0x6f/0xe0 [ 2566.732604] [c027d521] sil_interrupt+0x81/0x90 [ 2566.746733] [c01ff9b8] blk_done_softirq+0x58/0x70 [ 2566.761644] [c011721f] __do_softirq+0x6f/0x80 [ 2566.775516] [c0117257] do_softirq+0x27/0x30 [ 2566.788866] [c0104cee] do_IRQ+0x3e/0x80 [ 2566.801177] [c010322f] common_interrupt+0x23/0x28 [ 2566.816090] === [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04 89 d0 e8 4a fa ff ff 8b 45 14 89 ea 25 01 80 [ 2566.886586] EIP: [c0203169] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:f6945de0 [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Adrian Bunk wrote: [ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ] On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote: Brad Campbell wrote: G'day all, All I have is a digital photo of this oops. (It's 3.5mb). I have serial console configured, but Murphy is watching me carefully and I just can't seem to reproduce it while logging the console output. And as usual, after trying to capture one for 4 days, I get one 10 mins after I've sent the E-mail :) I think I've just found a way to make this easier to reproduce as /dev/sdd was not even mounted this time. I just cold booted and started an md5sum -c run on a directory of about 180GB. Is this also present with kernel 2.6.20 or is it a regression? Yes, I reproduced it with 2.6.20 but appear to be unable to do so with 2.6.19. -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Brad Campbell wrote: G'day all, All I have is a digital photo of this oops. (It's 3.5mb). I have serial console configured, but Murphy is watching me carefully and I just can't seem to reproduce it while logging the console output. And as usual, after trying to capture one for 4 days, I get one 10 mins after I've sent the E-mail :) I think I've just found a way to make this easier to reproduce as /dev/sdd was not even mounted this time. I just cold booted and started an md5sum -c run on a directory of about 180GB. [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at virtual address 005c [ 2566.218242] printing eip: [ 2566.226362] c0203169 [ 2566.232906] *pde = [ 2566.241274] Oops: [#1] [ 2566.249637] Modules linked in: [ 2566.258832] CPU:0 [ 2566.258833] EIP:0060:[c0203169]Not tainted VLI [ 2566.258834] EFLAGS: 00010082 (2.6.21-rc6-git5 #1) [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70 [ 2566.310761] eax: f7a0eae0 ebx: f7a0cb28 ecx: e2f869e8 edx: [ 2566.331076] esi: f79fea7c edi: f7d04ac0 ebp: esp: f6945de0 [ 2566.351388] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 task.ti=f6944000) [ 2566.390975] Stack: f79fea7c f7d04ac0 c02032d9 f6ae5ef0 c0133411 1000 [ 2566.416414]0008 0004 0b582fd4 f79fea7c f7d04ac0 f79fea7c [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 c01fb7a9 c016ea4d [ 2566.467326] Call Trace: [ 2566.475236] [c02032d9] __cfq_dispatch_requests+0x79/0x170 [ 2566.492224] [c0133411] do_generic_mapping_read+0x281/0x470 [ 2566.509473] [c0203519] cfq_dispatch_requests+0x69/0x90 [ 2566.525681] [c01fb7a9] elv_next_request+0x39/0x130 [ 2566.540850] [c016ea4d] bio_endio+0x5d/0x90 [ 2566.553942] [c0268635] scsi_request_fn+0x45/0x280 [ 2566.568851] [c01fdde2] blk_run_queue+0x32/0x70 [ 2566.582982] [c0267ba0] scsi_next_command+0x30/0x50 [ 2566.598154] [c0267c8b] scsi_end_request+0x9b/0xc0 [ 2566.613063] [c0267df1] scsi_io_completion+0x81/0x330 [ 2566.628751] [c0265dab] scsi_delete_timer+0xb/0x20 [ 2566.643661] [c02770f5] ata_scsi_qc_complete+0x65/0xd0 [ 2566.659613] [c026d47b] sd_rw_intr+0x8b/0x220 [ 2566.673222] [c0278ecc] ata_altstatus+0x1c/0x20 [ 2566.687352] [c027394d] ata_hsm_move+0x14d/0x3f0 [ 2566.701744] [c0263f80] scsi_finish_command+0x40/0x60 [ 2566.717434] [c026857f] scsi_softirq_done+0x6f/0xe0 [ 2566.732604] [c027d521] sil_interrupt+0x81/0x90 [ 2566.746733] [c01ff9b8] blk_done_softirq+0x58/0x70 [ 2566.761644] [c011721f] __do_softirq+0x6f/0x80 [ 2566.775516] [c0117257] do_softirq+0x27/0x30 [ 2566.788866] [c0104cee] do_IRQ+0x3e/0x80 [ 2566.801177] [c010322f] common_interrupt+0x23/0x28 [ 2566.816090] === [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 8b 72 5c 8b 78 04 89 d0 e8 4a fa ff ff 8b 45 14 89 ea 25 01 80 [ 2566.886586] EIP: [c0203169] cfq_dispatch_insert+0x19/0x70 SS:ESP 0068:f6945de0 [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/