Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Mon, Sep 24, 2007 at 09:35:23AM +0200, Peter Zijlstra wrote: > On Mon, 24 Sep 2007 11:01:10 +0800 Fengguang Wu <[EMAIL PROTECTED]> > wrote: > > > > That is an interesting idea how about this: > > > > It looks like a workaround, but it does solve the most important problem. > > And it is a good logic by itself. So I'd vote for it. > > > > The fundamental problem is that the per-bdi-writeback-completion based > > estimation is not accurate under light loads. The problem remains for > > a light-load sda when there is a heavy-load sdb. > > Well, sure, in that case sda would get to write out a lot of small > things. But in that case it would be fair wrt the other writers. Hmm, I cannot agree it to be fair - but pretty acceptable ;-) Your patch already brings great improvements in the multi-bdi case. > > One more workaround > > could be to grant bdi(s) a minimal bdi_thresh. > > Ah, no, that is no good. For if there were a lot of BDIs this might > happen: > nr_bdis * min_thresh > dirty_limit. Sure it is in the extreme case. However the limit could be ensured if we really want(which I'm really not sure;-) it: if (nr_reclaimable + nr_writeback < dirty_thresh && bdi_nr_reclaimable + bdi_nr_writeback <= bdi_min_thresh) break; > > Or better to adjust the estimation logic? > > Not sure what we can do here. The current thing is simple, fast and fair. Agreed. > > > + /* > > > + * break out early when: > > > + * - we're below the bdi limit > > > + * - we're below half the total limit > > > + * > > > + * we let the numbers exceed the strict bdi limit if the total > > > + * numbers are too low, this avoids (excessive) small writeouts. > > > + */ > > > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh || > > > + nr_reclaimable + nr_writeback < dirty_thresh / 2) > > > break; > > > > This may be slightly better: > > > > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > > break; > > /* > > * Throttle it only when the background writeback cannot > > catchup. > > */ > > if (nr_reclaimable + nr_writeback < > > (background_thresh + dirty_thresh) / 2) > > break; > > Ah, indeed. Good idea. Thank you :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Mon, 24 Sep 2007 11:01:10 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote: > > That is an interesting idea how about this: > > It looks like a workaround, but it does solve the most important problem. > And it is a good logic by itself. So I'd vote for it. > > The fundamental problem is that the per-bdi-writeback-completion based > estimation is not accurate under light loads. The problem remains for > a light-load sda when there is a heavy-load sdb. Well, sure, in that case sda would get to write out a lot of small things. But in that case it would be fair wrt the other writers. > One more workaround > could be to grant bdi(s) a minimal bdi_thresh. Ah, no, that is no good. For if there were a lot of BDIs this might happen: nr_bdis * min_thresh > dirty_limit. > Or better to adjust the estimation logic? Not sure what we can do here. The current thing is simple, fast and fair. > > + /* > > +* break out early when: > > +* - we're below the bdi limit > > +* - we're below half the total limit > > +* > > +* we let the numbers exceed the strict bdi limit if the total > > +* numbers are too low, this avoids (excessive) small writeouts. > > +*/ > > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh || > > + nr_reclaimable + nr_writeback < dirty_thresh / 2) > > break; > > This may be slightly better: > > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > break; > /* > * Throttle it only when the background writeback cannot > catchup. > */ > if (nr_reclaimable + nr_writeback < > (background_thresh + dirty_thresh) / 2) > break; Ah, indeed. Good idea. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Mon, 24 Sep 2007 11:01:10 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: That is an interesting idea how about this: It looks like a workaround, but it does solve the most important problem. And it is a good logic by itself. So I'd vote for it. The fundamental problem is that the per-bdi-writeback-completion based estimation is not accurate under light loads. The problem remains for a light-load sda when there is a heavy-load sdb. Well, sure, in that case sda would get to write out a lot of small things. But in that case it would be fair wrt the other writers. One more workaround could be to grant bdi(s) a minimal bdi_thresh. Ah, no, that is no good. For if there were a lot of BDIs this might happen: nr_bdis * min_thresh dirty_limit. Or better to adjust the estimation logic? Not sure what we can do here. The current thing is simple, fast and fair. + /* +* break out early when: +* - we're below the bdi limit +* - we're below half the total limit +* +* we let the numbers exceed the strict bdi limit if the total +* numbers are too low, this avoids (excessive) small writeouts. +*/ + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh || + nr_reclaimable + nr_writeback dirty_thresh / 2) break; This may be slightly better: if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; /* * Throttle it only when the background writeback cannot catchup. */ if (nr_reclaimable + nr_writeback (background_thresh + dirty_thresh) / 2) break; Ah, indeed. Good idea. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Mon, Sep 24, 2007 at 09:35:23AM +0200, Peter Zijlstra wrote: On Mon, 24 Sep 2007 11:01:10 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: That is an interesting idea how about this: It looks like a workaround, but it does solve the most important problem. And it is a good logic by itself. So I'd vote for it. The fundamental problem is that the per-bdi-writeback-completion based estimation is not accurate under light loads. The problem remains for a light-load sda when there is a heavy-load sdb. Well, sure, in that case sda would get to write out a lot of small things. But in that case it would be fair wrt the other writers. Hmm, I cannot agree it to be fair - but pretty acceptable ;-) Your patch already brings great improvements in the multi-bdi case. One more workaround could be to grant bdi(s) a minimal bdi_thresh. Ah, no, that is no good. For if there were a lot of BDIs this might happen: nr_bdis * min_thresh dirty_limit. Sure it is in the extreme case. However the limit could be ensured if we really want(which I'm really not sure;-) it: if (nr_reclaimable + nr_writeback dirty_thresh bdi_nr_reclaimable + bdi_nr_writeback = bdi_min_thresh) break; Or better to adjust the estimation logic? Not sure what we can do here. The current thing is simple, fast and fair. Agreed. + /* + * break out early when: + * - we're below the bdi limit + * - we're below half the total limit + * + * we let the numbers exceed the strict bdi limit if the total + * numbers are too low, this avoids (excessive) small writeouts. + */ + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh || + nr_reclaimable + nr_writeback dirty_thresh / 2) break; This may be slightly better: if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; /* * Throttle it only when the background writeback cannot catchup. */ if (nr_reclaimable + nr_writeback (background_thresh + dirty_thresh) / 2) break; Ah, indeed. Good idea. Thank you :-) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sun, Sep 23, 2007 at 03:02:35PM +0200, Peter Zijlstra wrote: > On Sun, 23 Sep 2007 09:20:49 +0800 Fengguang Wu <[EMAIL PROTECTED]> > wrote: > > > On Sat, Sep 22, 2007 at 03:16:22PM +0200, Peter Zijlstra wrote: > > > On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu <[EMAIL PROTECTED]> > > > wrote: > > > > > > > --- linux-2.6.22.orig/mm/page-writeback.c > > > > +++ linux-2.6.22/mm/page-writeback.c > > > > @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a > > > > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > > > > } > > > > > > > > + printk(KERN_DEBUG "balance_dirty_pages written %lu %lu > > > > congested %d limits %lu %lu %lu %lu %lu %ld\n", > > > > + pages_written, > > > > + write_chunk - wbc.nr_to_write, > > > > + bdi_write_congested(bdi), > > > > + background_thresh, dirty_thresh, > > > > + bdi_thresh, bdi_nr_reclaimable, > > > > bdi_nr_writeback, > > > > + bdi_thresh - bdi_nr_reclaimable - > > > > bdi_nr_writeback); > > > > + > > > > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > > > > break; > > > > if (pages_written >= write_chunk) > > > > > > > > > > > [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 > > > > 195477 5801 5760 288 -247 > > > > > > > > > > > > Could you perhaps instrument the writeback_inodes() path to see why > > > nothing is written out? - the attached patch would be a nice start. > > > > Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1. > > (need to watch it in a longer time window). > > > > Anyway here's the output of your patch: > > sb_locked 0 > > sb_empty 97011 > > It this the delta during one of these lockups? If so, it would seem delta since boot time, for 2.6.23-rc6-mm1, no lockups ;-) > that although dirty pages are reported against the BDI, no actual dirty > inodes could be found. no lockups, therefore not necessarily. There are many other calls into writeback_inodes(). > [ note to self: writeback_inodes() seems to write out to any superblock > in the system. Might want to limit that to superblocks on wbc->bdi ] generic_sync_sb_inodes() does have something like: if (wbc->bdi && bdi != wbc->bdi) continue; > You say that switching to .23-rc6-mm1 solved it in your case. You are > developing in the writeback_inodes() path, right? Could it be one of > your local changes that confused it here? There are a lot of changes between them: - bdi-v9 vs bdi-v10; - a lot writeback patches in -mm - some writeback patches maintained locally I just rebased my patches to .23-rc6-mm1... > > > Most peculiar. It seems writeback_inodes() doesn't even attempt to > > > write out stuff. Nor are outstanding writeback pages completed. > > > > Still true. Another problem is that balance_dirty_pages() is being called > > even > > when there are only 54 dirty pages. That could slow down writers > > unnecessarily. > > > > balance_dirty_pages() should not be entered at all with small nr_dirty. > > > > Look at these lines: > > [ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 > > global 196554 54 403 196097 bdi 0 0 398 -398 > > [ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 0 0 380 -380 > > [ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 369 -346 > > [ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 366 -343 > > [ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 365 -342 > > [ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 364 -341 > > [ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 362 -339 > > [ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 341 196159 bdi 47 0 327 -280 > > [ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 279 196213 bdi 95 0 279 -184 > > [ 197.43] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 248 196244 bdi 95 0 255 -160 > > [ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 217 196275 bdi 143 0 210 -67 > > [ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 217 196275 bdi 143 0 209 -66 > > [ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 155 196337 bdi 167 0 163 4 > > That is an interesting idea
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sun, 23 Sep 2007 09:20:49 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote: > On Sat, Sep 22, 2007 at 03:16:22PM +0200, Peter Zijlstra wrote: > > On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu <[EMAIL PROTECTED]> > > wrote: > > > > > --- linux-2.6.22.orig/mm/page-writeback.c > > > +++ linux-2.6.22/mm/page-writeback.c > > > @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a > > > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > > > } > > > > > > + printk(KERN_DEBUG "balance_dirty_pages written %lu %lu > > > congested %d limits %lu %lu %lu %lu %lu %ld\n", > > > + pages_written, > > > + write_chunk - wbc.nr_to_write, > > > + bdi_write_congested(bdi), > > > + background_thresh, dirty_thresh, > > > + bdi_thresh, bdi_nr_reclaimable, > > > bdi_nr_writeback, > > > + bdi_thresh - bdi_nr_reclaimable - > > > bdi_nr_writeback); > > > + > > > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > > > break; > > > if (pages_written >= write_chunk) > > > > > > > > [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 > > > 195477 5801 5760 288 -247 > > > > > > > > Could you perhaps instrument the writeback_inodes() path to see why > > nothing is written out? - the attached patch would be a nice start. > > Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1. > (need to watch it in a longer time window). > > Anyway here's the output of your patch: > sb_locked 0 > sb_empty 97011 It this the delta during one of these lockups? If so, it would seem that although dirty pages are reported against the BDI, no actual dirty inodes could be found. [ note to self: writeback_inodes() seems to write out to any superblock in the system. Might want to limit that to superblocks on wbc->bdi ] You say that switching to .23-rc6-mm1 solved it in your case. You are developing in the writeback_inodes() path, right? Could it be one of your local changes that confused it here? > > Most peculiar. It seems writeback_inodes() doesn't even attempt to > > write out stuff. Nor are outstanding writeback pages completed. > > Still true. Another problem is that balance_dirty_pages() is being called even > when there are only 54 dirty pages. That could slow down writers > unnecessarily. > > balance_dirty_pages() should not be entered at all with small nr_dirty. > > Look at these lines: > [ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 global > 196554 54 403 196097 bdi 0 0 398 -398 > [ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global > 196554 54 372 196128 bdi 0 0 380 -380 > [ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global > 196554 54 372 196128 bdi 23 0 369 -346 > [ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global > 196554 54 372 196128 bdi 23 0 366 -343 > [ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global > 196554 54 372 196128 bdi 23 0 365 -342 > [ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global > 196554 54 372 196128 bdi 23 0 364 -341 > [ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global > 196554 54 372 196128 bdi 23 0 362 -339 > [ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global > 196554 54 341 196159 bdi 47 0 327 -280 > [ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global > 196546 54 279 196213 bdi 95 0 279 -184 > [ 197.43] balance_dirty_pages for tar written 405 0 congested 0 global > 196546 54 248 196244 bdi 95 0 255 -160 > [ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global > 196546 54 217 196275 bdi 143 0 210 -67 > [ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global > 196546 54 217 196275 bdi 143 0 209 -66 > [ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global > 196546 54 155 196337 bdi 167 0 163 4 That is an interesting idea how about this: --- Subject: mm: speed up writeback ramp-up on clean systems We allow violation of bdi limits if there is a lot of room on the system. Once we hit half the total limit we start enforcing bdi limits and bdi ramp-up should happen. Doing it this way avoids many small writeouts on an otherwise idle system and should also speed up the ramp-up. Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> --- Index: linux-2.6/mm/page-writeback.c === --- linux-2.6.orig/mm/page-writeback.c +++ linux-2.6/mm/page-writeback.c @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long */ static void balance_dirty_pages(struct address_space *mapping) { - long bdi_nr_reclaimable; - long bdi_nr_writeback; + long nr_reclaimable,
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sun, 23 Sep 2007 09:20:49 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: On Sat, Sep 22, 2007 at 03:16:22PM +0200, Peter Zijlstra wrote: On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n, + pages_written, + write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + background_thresh, dirty_thresh, + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; if (pages_written = write_chunk) [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 195477 5801 5760 288 -247 snip long series of mostly identical lines Could you perhaps instrument the writeback_inodes() path to see why nothing is written out? - the attached patch would be a nice start. Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1. (need to watch it in a longer time window). Anyway here's the output of your patch: sb_locked 0 sb_empty 97011 It this the delta during one of these lockups? If so, it would seem that although dirty pages are reported against the BDI, no actual dirty inodes could be found. [ note to self: writeback_inodes() seems to write out to any superblock in the system. Might want to limit that to superblocks on wbc-bdi ] You say that switching to .23-rc6-mm1 solved it in your case. You are developing in the writeback_inodes() path, right? Could it be one of your local changes that confused it here? Most peculiar. It seems writeback_inodes() doesn't even attempt to write out stuff. Nor are outstanding writeback pages completed. Still true. Another problem is that balance_dirty_pages() is being called even when there are only 54 dirty pages. That could slow down writers unnecessarily. balance_dirty_pages() should not be entered at all with small nr_dirty. Look at these lines: [ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 global 196554 54 403 196097 bdi 0 0 398 -398 [ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 0 0 380 -380 [ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 369 -346 [ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 366 -343 [ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 365 -342 [ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 364 -341 [ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 362 -339 [ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 341 196159 bdi 47 0 327 -280 [ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 279 196213 bdi 95 0 279 -184 [ 197.43] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 248 196244 bdi 95 0 255 -160 [ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 210 -67 [ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 209 -66 [ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 155 196337 bdi 167 0 163 4 That is an interesting idea how about this: --- Subject: mm: speed up writeback ramp-up on clean systems We allow violation of bdi limits if there is a lot of room on the system. Once we hit half the total limit we start enforcing bdi limits and bdi ramp-up should happen. Doing it this way avoids many small writeouts on an otherwise idle system and should also speed up the ramp-up. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- Index: linux-2.6/mm/page-writeback.c === --- linux-2.6.orig/mm/page-writeback.c +++ linux-2.6/mm/page-writeback.c @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long */ static void balance_dirty_pages(struct address_space *mapping) { - long bdi_nr_reclaimable; - long bdi_nr_writeback; + long nr_reclaimable, bdi_nr_reclaimable; + long nr_writeback, bdi_nr_writeback; long background_thresh;
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sun, Sep 23, 2007 at 03:02:35PM +0200, Peter Zijlstra wrote: On Sun, 23 Sep 2007 09:20:49 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: On Sat, Sep 22, 2007 at 03:16:22PM +0200, Peter Zijlstra wrote: On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n, + pages_written, + write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + background_thresh, dirty_thresh, + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; if (pages_written = write_chunk) [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 195477 5801 5760 288 -247 snip long series of mostly identical lines Could you perhaps instrument the writeback_inodes() path to see why nothing is written out? - the attached patch would be a nice start. Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1. (need to watch it in a longer time window). Anyway here's the output of your patch: sb_locked 0 sb_empty 97011 It this the delta during one of these lockups? If so, it would seem delta since boot time, for 2.6.23-rc6-mm1, no lockups ;-) that although dirty pages are reported against the BDI, no actual dirty inodes could be found. no lockups, therefore not necessarily. There are many other calls into writeback_inodes(). [ note to self: writeback_inodes() seems to write out to any superblock in the system. Might want to limit that to superblocks on wbc-bdi ] generic_sync_sb_inodes() does have something like: if (wbc-bdi bdi != wbc-bdi) continue; You say that switching to .23-rc6-mm1 solved it in your case. You are developing in the writeback_inodes() path, right? Could it be one of your local changes that confused it here? There are a lot of changes between them: - bdi-v9 vs bdi-v10; - a lot writeback patches in -mm - some writeback patches maintained locally I just rebased my patches to .23-rc6-mm1... Most peculiar. It seems writeback_inodes() doesn't even attempt to write out stuff. Nor are outstanding writeback pages completed. Still true. Another problem is that balance_dirty_pages() is being called even when there are only 54 dirty pages. That could slow down writers unnecessarily. balance_dirty_pages() should not be entered at all with small nr_dirty. Look at these lines: [ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 global 196554 54 403 196097 bdi 0 0 398 -398 [ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 0 0 380 -380 [ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 369 -346 [ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 366 -343 [ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 365 -342 [ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 364 -341 [ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 362 -339 [ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 341 196159 bdi 47 0 327 -280 [ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 279 196213 bdi 95 0 279 -184 [ 197.43] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 248 196244 bdi 95 0 255 -160 [ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 210 -67 [ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 209 -66 [ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 155 196337 bdi 167 0 163 4 That is an interesting idea how about this: It looks like a workaround, but it does solve the most important problem. And it is a good logic by itself. So I'd vote for it. The fundamental problem is that the
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sat, Sep 22, 2007 at 03:16:22PM +0200, Peter Zijlstra wrote: > On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu <[EMAIL PROTECTED]> > wrote: > > > --- linux-2.6.22.orig/mm/page-writeback.c > > +++ linux-2.6.22/mm/page-writeback.c > > @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a > > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > > } > > > > + printk(KERN_DEBUG "balance_dirty_pages written %lu %lu > > congested %d limits %lu %lu %lu %lu %lu %ld\n", > > + pages_written, > > + write_chunk - wbc.nr_to_write, > > + bdi_write_congested(bdi), > > + background_thresh, dirty_thresh, > > + bdi_thresh, bdi_nr_reclaimable, > > bdi_nr_writeback, > > + bdi_thresh - bdi_nr_reclaimable - > > bdi_nr_writeback); > > + > > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > > break; > > if (pages_written >= write_chunk) > > > > > [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 > > 195477 5801 5760 288 -247 > > > > Could you perhaps instrument the writeback_inodes() path to see why > nothing is written out? - the attached patch would be a nice start. Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1. (need to watch it in a longer time window). Anyway here's the output of your patch: sb_locked 0 sb_empty 97011 > Most peculiar. It seems writeback_inodes() doesn't even attempt to > write out stuff. Nor are outstanding writeback pages completed. Still true. Another problem is that balance_dirty_pages() is being called even when there are only 54 dirty pages. That could slow down writers unnecessarily. balance_dirty_pages() should not be entered at all with small nr_dirty. Look at these lines: [ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 global 196554 54 403 196097 bdi 0 0 398 -398 [ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 0 0 380 -380 [ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 369 -346 [ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 366 -343 [ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 365 -342 [ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 364 -341 [ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 362 -339 [ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 341 196159 bdi 47 0 327 -280 [ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 279 196213 bdi 95 0 279 -184 [ 197.43] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 248 196244 bdi 95 0 255 -160 [ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 210 -67 [ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 209 -66 [ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 155 196337 bdi 167 0 163 4 The trace messages are generated by the following code: --- linux-2.6.23-rc6-mm1.orig/mm/page-writeback.c +++ linux-2.6.23-rc6-mm1/mm/page-writeback.c @@ -421,6 +421,18 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG "balance_dirty_pages for %s written %lu %lu congested %d " + "global %lu %lu %lu %ld bdi %lu %lu %lu %ld\n", + current->comm, + pages_written, write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + dirty_thresh, + global_dirty_unstable_pages(), global_page_state(NR_WRITEBACK), + dirty_thresh - + global_dirty_unstable_pages() - global_page_state(NR_WRITEBACK), + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) break; if (pages_written >= write_chunk) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote: > --- linux-2.6.22.orig/mm/page-writeback.c > +++ linux-2.6.22/mm/page-writeback.c > @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > } > > + printk(KERN_DEBUG "balance_dirty_pages written %lu %lu > congested %d limits %lu %lu %lu %lu %lu %ld\n", > + pages_written, > + write_chunk - wbc.nr_to_write, > + bdi_write_congested(bdi), > + background_thresh, dirty_thresh, > + bdi_thresh, bdi_nr_reclaimable, > bdi_nr_writeback, > + bdi_thresh - bdi_nr_reclaimable - > bdi_nr_writeback); > + > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > break; > if (pages_written >= write_chunk) > > [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 > 195477 5801 5760 288 -247 Most peculiar. It seems writeback_inodes() doesn't even attempt to write out stuff. Nor are outstanding writeback pages completed. Could you perhaps instrument the writeback_inodes() path to see why nothing is written out? - the attached patch would be a nice start. > Here are some messages when doing large dds: > [ 511.733791] balance_dirty_pages written 1540 1540 congested 0 limits 49728 > 198913 10999 18288 0 -7289 > [ 511.735048] balance_dirty_pages written 1540 1540 congested 0 limits 49728 > 198913 12012 16752 0 -4740 > [ 511.736506] balance_dirty_pages written 1540 1540 congested 0 limits 49728 > 198913 12306 15192 1056 -3942 > [ 512.002169] balance_dirty_pages written 1547 1547 congested 2 limits 49726 > 198905 13471 12624 1848 -1001 > [ 512.003795] balance_dirty_pages written 1540 1540 congested 2 limits 49723 > 198892 13470 11088 3384 -1002 > [ 512.083517] balance_dirty_pages written 1540 1540 congested 2 limits 49712 > 198850 13572 9336 4992 -756 > [ 512.085145] balance_dirty_pages written 1540 1540 congested 2 limits 49706 > 198825 13569 7800 6528 -759 > [ 512.086773] balance_dirty_pages written 1540 1540 congested 2 limits 49702 > 198808 13568 6240 8064 -736 > [ 512.184267] balance_dirty_pages written 1539 1539 congested 2 limits 49697 > 198791 13649 5592 8592 -535 > [ 512.185903] balance_dirty_pages written 1540 1540 congested 2 limits 49694 > 198778 13649 4056 10152 -559 > [ 512.187506] balance_dirty_pages written 1540 1540 congested 2 limits 49688 > 198753 13647 2496 11688 -537 > [ 512.259848] balance_dirty_pages written 1546 1546 congested 2 limits 49682 > 198728 13769 744 13248 -223 > [ 512.554646] balance_dirty_pages written 618 618 congested 2 limits 49678 > 198712 14242 1 13368 873 > [ 512.585204] balance_dirty_pages written 794 794 congested 2 limits 49657 > 198630 14500 1 12936 1563 > [ 527.714294] balance_dirty_pages written 1540 1540 congested 0 limits 49608 > 198432 29502 28080 0 1422 This looks like a sane series, we have a surplus of reclaimable pages, start writeout, which increases writeback pages, and congests the device, and eventually all subsides and we finish congestion and quit. > [ 529.298022] balance_dirty_pages written 1540 1540 congested 0 limits 49579 > 198318 30307 34704 0 -4397 > [ 529.304975] balance_dirty_pages written 1539 1539 congested 0 limits 49575 > 198302 32451 30600 0 1851 > [ 529.305205] balance_dirty_pages written 1538 1538 congested 0 limits 49576 > 198306 32571 30384 0 2187 > [ 529.306988] balance_dirty_pages written 1537 1537 congested 0 limits 49580 > 198320 32702 30120 0 2582 > [ 530.893830] balance_dirty_pages written 1541 1541 congested 0 limits 49553 > 198214 34216 35352 0 -1136 > [ 530.893970] balance_dirty_pages written 1538 1538 congested 0 limits 49553 > 198214 34290 35160 0 -870 > [ 530.899873] balance_dirty_pages written 1546 1546 congested 0 limits 49556 > 198227 36248 31248 0 5000 > [ 530.900282] balance_dirty_pages written 1546 1546 congested 0 limits 49557 > 198231 36442 30864 0 5578 > [ 530.900586] balance_dirty_pages written 1539 1539 congested 0 limits 49558 > 198235 36601 30552 0 6049 > [ 532.343097] balance_dirty_pages written 1541 1541 congested 0 limits 49530 > 198120 37862 36072 0 1790 > [ 532.343595] balance_dirty_pages written 1547 1547 congested 0 limits 49533 > 198132 38081 35640 0 2441 > [ 533.872355] balance_dirty_pages written 1540 1540 congested 0 limits 49502 > 198009 41148 37224 0 3924 > [ 542.566083] balance_dirty_pages written 1542 1542 congested 0 limits 49367 > 197469 51948 52680 0 -732 > [ 542.567093] balance_dirty_pages written 1537 1537 congested 0 limits 49370 > 197482 52663 50952 0 1711 > [ 542.586552] balance_dirty_pages written 1540 1540 congested 0 limits 49352 > 197410 54545 46032 0 8513 > [ 542.606002] balance_dirty_pages written 1540 1540
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n, + pages_written, + write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + background_thresh, dirty_thresh, + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; if (pages_written = write_chunk) [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 195477 5801 5760 288 -247 snip long series of mostly identical lines Most peculiar. It seems writeback_inodes() doesn't even attempt to write out stuff. Nor are outstanding writeback pages completed. Could you perhaps instrument the writeback_inodes() path to see why nothing is written out? - the attached patch would be a nice start. Here are some messages when doing large dds: [ 511.733791] balance_dirty_pages written 1540 1540 congested 0 limits 49728 198913 10999 18288 0 -7289 [ 511.735048] balance_dirty_pages written 1540 1540 congested 0 limits 49728 198913 12012 16752 0 -4740 [ 511.736506] balance_dirty_pages written 1540 1540 congested 0 limits 49728 198913 12306 15192 1056 -3942 [ 512.002169] balance_dirty_pages written 1547 1547 congested 2 limits 49726 198905 13471 12624 1848 -1001 [ 512.003795] balance_dirty_pages written 1540 1540 congested 2 limits 49723 198892 13470 11088 3384 -1002 [ 512.083517] balance_dirty_pages written 1540 1540 congested 2 limits 49712 198850 13572 9336 4992 -756 [ 512.085145] balance_dirty_pages written 1540 1540 congested 2 limits 49706 198825 13569 7800 6528 -759 [ 512.086773] balance_dirty_pages written 1540 1540 congested 2 limits 49702 198808 13568 6240 8064 -736 [ 512.184267] balance_dirty_pages written 1539 1539 congested 2 limits 49697 198791 13649 5592 8592 -535 [ 512.185903] balance_dirty_pages written 1540 1540 congested 2 limits 49694 198778 13649 4056 10152 -559 [ 512.187506] balance_dirty_pages written 1540 1540 congested 2 limits 49688 198753 13647 2496 11688 -537 [ 512.259848] balance_dirty_pages written 1546 1546 congested 2 limits 49682 198728 13769 744 13248 -223 [ 512.554646] balance_dirty_pages written 618 618 congested 2 limits 49678 198712 14242 1 13368 873 [ 512.585204] balance_dirty_pages written 794 794 congested 2 limits 49657 198630 14500 1 12936 1563 [ 527.714294] balance_dirty_pages written 1540 1540 congested 0 limits 49608 198432 29502 28080 0 1422 This looks like a sane series, we have a surplus of reclaimable pages, start writeout, which increases writeback pages, and congests the device, and eventually all subsides and we finish congestion and quit. [ 529.298022] balance_dirty_pages written 1540 1540 congested 0 limits 49579 198318 30307 34704 0 -4397 [ 529.304975] balance_dirty_pages written 1539 1539 congested 0 limits 49575 198302 32451 30600 0 1851 [ 529.305205] balance_dirty_pages written 1538 1538 congested 0 limits 49576 198306 32571 30384 0 2187 [ 529.306988] balance_dirty_pages written 1537 1537 congested 0 limits 49580 198320 32702 30120 0 2582 [ 530.893830] balance_dirty_pages written 1541 1541 congested 0 limits 49553 198214 34216 35352 0 -1136 [ 530.893970] balance_dirty_pages written 1538 1538 congested 0 limits 49553 198214 34290 35160 0 -870 [ 530.899873] balance_dirty_pages written 1546 1546 congested 0 limits 49556 198227 36248 31248 0 5000 [ 530.900282] balance_dirty_pages written 1546 1546 congested 0 limits 49557 198231 36442 30864 0 5578 [ 530.900586] balance_dirty_pages written 1539 1539 congested 0 limits 49558 198235 36601 30552 0 6049 [ 532.343097] balance_dirty_pages written 1541 1541 congested 0 limits 49530 198120 37862 36072 0 1790 [ 532.343595] balance_dirty_pages written 1547 1547 congested 0 limits 49533 198132 38081 35640 0 2441 [ 533.872355] balance_dirty_pages written 1540 1540 congested 0 limits 49502 198009 41148 37224 0 3924 [ 542.566083] balance_dirty_pages written 1542 1542 congested 0 limits 49367 197469 51948 52680 0 -732 [ 542.567093] balance_dirty_pages written 1537 1537 congested 0 limits 49370 197482 52663 50952 0 1711 [ 542.586552] balance_dirty_pages written 1540 1540 congested 0 limits 49352 197410 54545 46032 0 8513 [ 542.606002] balance_dirty_pages written 1540 1540 congested 0 limits 49337 197350 55132 44520 0 10612
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Sat, Sep 22, 2007 at 03:16:22PM +0200, Peter Zijlstra wrote: On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n, + pages_written, + write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + background_thresh, dirty_thresh, + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; if (pages_written = write_chunk) [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 195477 5801 5760 288 -247 snip long series of mostly identical lines Could you perhaps instrument the writeback_inodes() path to see why nothing is written out? - the attached patch would be a nice start. Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1. (need to watch it in a longer time window). Anyway here's the output of your patch: sb_locked 0 sb_empty 97011 Most peculiar. It seems writeback_inodes() doesn't even attempt to write out stuff. Nor are outstanding writeback pages completed. Still true. Another problem is that balance_dirty_pages() is being called even when there are only 54 dirty pages. That could slow down writers unnecessarily. balance_dirty_pages() should not be entered at all with small nr_dirty. Look at these lines: [ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 global 196554 54 403 196097 bdi 0 0 398 -398 [ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 0 0 380 -380 [ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 369 -346 [ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 366 -343 [ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 365 -342 [ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 364 -341 [ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 362 -339 [ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 341 196159 bdi 47 0 327 -280 [ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 279 196213 bdi 95 0 279 -184 [ 197.43] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 248 196244 bdi 95 0 255 -160 [ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 210 -67 [ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 209 -66 [ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 155 196337 bdi 167 0 163 4 The trace messages are generated by the following code: --- linux-2.6.23-rc6-mm1.orig/mm/page-writeback.c +++ linux-2.6.23-rc6-mm1/mm/page-writeback.c @@ -421,6 +421,18 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG balance_dirty_pages for %s written %lu %lu congested %d + global %lu %lu %lu %ld bdi %lu %lu %lu %ld\n, + current-comm, + pages_written, write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + dirty_thresh, + global_dirty_unstable_pages(), global_page_state(NR_WRITEBACK), + dirty_thresh - + global_dirty_unstable_pages() - global_page_state(NR_WRITEBACK), + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; if (pages_written = write_chunk) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Thu, Sep 20, 2007 at 12:31:39PM +0100, Hugh Dickins wrote: > On Wed, 19 Sep 2007, Peter Zijlstra wrote: > > On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins > > <[EMAIL PROTECTED]> wrote: > > > > > On Wed, 19 Sep 2007, Andy Whitcroft wrote: > > > > Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs > > > > stuck in a 'D' wait: > > > > > > > > === > > > > mkfs.ext2 D c10220f4 0 6233 6222 > > > > [] io_schedule_timeout+0x1e/0x28 > > > > [] congestion_wait+0x62/0x7a > > > > [] get_dirty_limits+0x16a/0x172 > > > > [] balance_dirty_pages+0x154/0x1be > > > > [] generic_perform_write+0x168/0x18a > > > > [] generic_file_buffered_write+0x73/0x107 > > > > [] __generic_file_aio_write_nolock+0x47a/0x4a5 > > > > [] generic_file_aio_write_nolock+0x48/0x9b > > > > [] do_sync_write+0xbf/0xfc > > > > [] vfs_write+0x8d/0x108 > > > > [] sys_write+0x41/0x67 > > > > [] syscall_call+0x7/0xb > > > > === > > > > > > [edited out some bogus lines from stale stack] > > > > > > > This machine and others have run numerous test runs on this kernel and > > > > this is the first time I've see a hang like this. > > > > > > I've been seeing something like that on 4-way PPC64: in my case I've > > > shells hanging in D state trying to append to kernel build log on ext3 > > > (the builds themselves going on elsewhere, in tmpfs): one of the shells > > > holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. > > > > > > > I wonder if this is the ultimate cause of the couple of mainline hangs > > > > which were seen, but not diagnosed. > > > > > > My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's > > > mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable > > > 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've > > > not done enough to check if those really correlate with the hangs), > > > and I'm wondering if the bdi_stat_sum business is needed on the > > > !nr_reclaimable path. > > > > FWIW my tired brain seems to think it the !nr_reclaimable path needs it > > just the same. So this change seems to make sense for now :-) > > Thanks. > > > > So I'm running now with the patch below, good so far, but can't judge > > > until tomorrow whether it has actually addressed the problem seen. > > Last night's run went well: that patch does indeed seem to have fixed it. > Looking at the timings (some variance but _very_ much less than the night > before), there does appear to be some other occasional slight slowdown - > but I've no reason to suspect your patch for it, nor to suppose it's > something new: it may just be an artifact of my heavy swap thrashing. > > > [PATCH mm] mm per-device dirty threshold fix > > Fix occasional hang when a task couldn't get out of balance_dirty_pages: > mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback > across all cpus when bdi_thresh is low, even in the case when there was > no bdi_nr_reclaimable. > > Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> Thank you Hugh. I ran into similar problems with many dd(large file) operations. This patch seems to fix it. But now my desktop was locked up again when writing a lot of small files. The problem is repeatable with the command $ ketchup 2.6.23-rc6-mm1 I writeup two debug patches: --- mm/page-writeback.c |9 + 1 file changed, 9 insertions(+) --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG "balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n", + pages_written, + write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + background_thresh, dirty_thresh, + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) break; if (pages_written >= write_chunk) --- mm/page-writeback.c |5 + 1 file changed, 5 insertions(+) --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -373,6 +373,7 @@ static void balance_dirty_pages(struct a long bdi_thresh; unsigned long pages_written = 0; unsigned long write_chunk = sync_writeback_pages(); + int i = 0; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -434,6 +435,10 @@ static void balance_dirty_pages(struct a bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, bdi_thresh -
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Thu, Sep 20, 2007 at 12:31:39PM +0100, Hugh Dickins wrote: On Wed, 19 Sep 2007, Peter Zijlstra wrote: On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, Andy Whitcroft wrote: Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs stuck in a 'D' wait: === mkfs.ext2 D c10220f4 0 6233 6222 [c12194da] io_schedule_timeout+0x1e/0x28 [c10454b4] congestion_wait+0x62/0x7a [c10402af] get_dirty_limits+0x16a/0x172 [c104040b] balance_dirty_pages+0x154/0x1be [c103bda3] generic_perform_write+0x168/0x18a [c103be38] generic_file_buffered_write+0x73/0x107 [c103c346] __generic_file_aio_write_nolock+0x47a/0x4a5 [c103c3b9] generic_file_aio_write_nolock+0x48/0x9b [c105d2d6] do_sync_write+0xbf/0xfc [c105d3a0] vfs_write+0x8d/0x108 [c105d4c3] sys_write+0x41/0x67 [c100260a] syscall_call+0x7/0xb === [edited out some bogus lines from stale stack] This machine and others have run numerous test runs on this kernel and this is the first time I've see a hang like this. I've been seeing something like that on 4-way PPC64: in my case I've shells hanging in D state trying to append to kernel build log on ext3 (the builds themselves going on elsewhere, in tmpfs): one of the shells holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. I wonder if this is the ultimate cause of the couple of mainline hangs which were seen, but not diagnosed. My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've not done enough to check if those really correlate with the hangs), and I'm wondering if the bdi_stat_sum business is needed on the !nr_reclaimable path. FWIW my tired brain seems to think it the !nr_reclaimable path needs it just the same. So this change seems to make sense for now :-) Thanks. So I'm running now with the patch below, good so far, but can't judge until tomorrow whether it has actually addressed the problem seen. Last night's run went well: that patch does indeed seem to have fixed it. Looking at the timings (some variance but _very_ much less than the night before), there does appear to be some other occasional slight slowdown - but I've no reason to suspect your patch for it, nor to suppose it's something new: it may just be an artifact of my heavy swap thrashing. [PATCH mm] mm per-device dirty threshold fix Fix occasional hang when a task couldn't get out of balance_dirty_pages: mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback across all cpus when bdi_thresh is low, even in the case when there was no bdi_nr_reclaimable. Signed-off-by: Hugh Dickins [EMAIL PROTECTED] Thank you Hugh. I ran into similar problems with many dd(large file) operations. This patch seems to fix it. But now my desktop was locked up again when writing a lot of small files. The problem is repeatable with the command $ ketchup 2.6.23-rc6-mm1 I writeup two debug patches: --- mm/page-writeback.c |9 + 1 file changed, 9 insertions(+) --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } + printk(KERN_DEBUG balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n, + pages_written, + write_chunk - wbc.nr_to_write, + bdi_write_congested(bdi), + background_thresh, dirty_thresh, + bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, + bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; if (pages_written = write_chunk) --- mm/page-writeback.c |5 + 1 file changed, 5 insertions(+) --- linux-2.6.22.orig/mm/page-writeback.c +++ linux-2.6.22/mm/page-writeback.c @@ -373,6 +373,7 @@ static void balance_dirty_pages(struct a long bdi_thresh; unsigned long pages_written = 0; unsigned long write_chunk = sync_writeback_pages(); + int i = 0; struct backing_dev_info *bdi = mapping-backing_dev_info; @@ -434,6 +435,10 @@ static void balance_dirty_pages(struct a bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback, bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback); + if (i++ 200) { +
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Thu, 20 Sep 2007 12:31:39 +0100 (BST) Hugh Dickins <[EMAIL PROTECTED]> wrote: Thanks Hugh! > [PATCH mm] mm per-device dirty threshold fix > > Fix occasional hang when a task couldn't get out of balance_dirty_pages: > mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback > across all cpus when bdi_thresh is low, even in the case when there was > no bdi_nr_reclaimable. > > Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> Acked-by: Peter Zijlstra <[EMAIL PROTECTED]> > --- > mm/page-writeback.c | 53 +++--- > 1 file changed, 24 insertions(+), 29 deletions(-) > > --- 2.6.23-rc6-mm1/mm/page-writeback.c2007-09-18 12:28:25.0 > +0100 > +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 > @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a > bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > - break; > + break; > > if (!bdi->dirty_exceeded) > bdi->dirty_exceeded = 1; > @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a >*/ > if (bdi_nr_reclaimable) { > writeback_inodes(); > - > + pages_written += write_chunk - wbc.nr_to_write; > get_dirty_limits(_thresh, _thresh, > _thresh, bdi); > + } > > - /* > - * In order to avoid the stacked BDI deadlock we need > - * to ensure we accurately count the 'dirty' pages when > - * the threshold is low. > - * > - * Otherwise it would be possible to get thresh+n pages > - * reported dirty, even though there are thresh-m pages > - * actually dirty; with m+n sitting in the percpu > - * deltas. > - */ > - if (bdi_thresh < 2*bdi_stat_error(bdi)) { > - bdi_nr_reclaimable = > - bdi_stat_sum(bdi, BDI_RECLAIMABLE); > - bdi_nr_writeback = > - bdi_stat_sum(bdi, BDI_WRITEBACK); > - } else { > - bdi_nr_reclaimable = > - bdi_stat(bdi, BDI_RECLAIMABLE); > - bdi_nr_writeback = > - bdi_stat(bdi, BDI_WRITEBACK); > - } > + /* > + * In order to avoid the stacked BDI deadlock we need > + * to ensure we accurately count the 'dirty' pages when > + * the threshold is low. > + * > + * Otherwise it would be possible to get thresh+n pages > + * reported dirty, even though there are thresh-m pages > + * actually dirty; with m+n sitting in the percpu > + * deltas. > + */ > + if (bdi_thresh < 2*bdi_stat_error(bdi)) { > + bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE); > + bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK); > + } else if (bdi_nr_reclaimable) { > + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); > + bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > + } > > - if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > - break; > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > + break; > + if (pages_written >= write_chunk) > + break; /* We've done our duty */ > > - pages_written += write_chunk - wbc.nr_to_write; > - if (pages_written >= write_chunk) > - break; /* We've done our duty */ > - } > congestion_wait(WRITE, HZ/10); > } > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Wed, 19 Sep 2007, Peter Zijlstra wrote: > On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins > <[EMAIL PROTECTED]> wrote: > > > On Wed, 19 Sep 2007, Andy Whitcroft wrote: > > > Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs > > > stuck in a 'D' wait: > > > > > > === > > > mkfs.ext2 D c10220f4 0 6233 6222 > > > [] io_schedule_timeout+0x1e/0x28 > > > [] congestion_wait+0x62/0x7a > > > [] get_dirty_limits+0x16a/0x172 > > > [] balance_dirty_pages+0x154/0x1be > > > [] generic_perform_write+0x168/0x18a > > > [] generic_file_buffered_write+0x73/0x107 > > > [] __generic_file_aio_write_nolock+0x47a/0x4a5 > > > [] generic_file_aio_write_nolock+0x48/0x9b > > > [] do_sync_write+0xbf/0xfc > > > [] vfs_write+0x8d/0x108 > > > [] sys_write+0x41/0x67 > > > [] syscall_call+0x7/0xb > > > === > > > > [edited out some bogus lines from stale stack] > > > > > This machine and others have run numerous test runs on this kernel and > > > this is the first time I've see a hang like this. > > > > I've been seeing something like that on 4-way PPC64: in my case I've > > shells hanging in D state trying to append to kernel build log on ext3 > > (the builds themselves going on elsewhere, in tmpfs): one of the shells > > holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. > > > > > I wonder if this is the ultimate cause of the couple of mainline hangs > > > which were seen, but not diagnosed. > > > > My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's > > mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable > > 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've > > not done enough to check if those really correlate with the hangs), > > and I'm wondering if the bdi_stat_sum business is needed on the > > !nr_reclaimable path. > > FWIW my tired brain seems to think it the !nr_reclaimable path needs it > just the same. So this change seems to make sense for now :-) Thanks. > > So I'm running now with the patch below, good so far, but can't judge > > until tomorrow whether it has actually addressed the problem seen. Last night's run went well: that patch does indeed seem to have fixed it. Looking at the timings (some variance but _very_ much less than the night before), there does appear to be some other occasional slight slowdown - but I've no reason to suspect your patch for it, nor to suppose it's something new: it may just be an artifact of my heavy swap thrashing. [PATCH mm] mm per-device dirty threshold fix Fix occasional hang when a task couldn't get out of balance_dirty_pages: mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback across all cpus when bdi_thresh is low, even in the case when there was no bdi_nr_reclaimable. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- mm/page-writeback.c | 53 +++--- 1 file changed, 24 insertions(+), 29 deletions(-) --- 2.6.23-rc6-mm1/mm/page-writeback.c 2007-09-18 12:28:25.0 +0100 +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) - break; + break; if (!bdi->dirty_exceeded) bdi->dirty_exceeded = 1; @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a */ if (bdi_nr_reclaimable) { writeback_inodes(); - + pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(_thresh, _thresh, _thresh, bdi); + } - /* -* In order to avoid the stacked BDI deadlock we need -* to ensure we accurately count the 'dirty' pages when -* the threshold is low. -* -* Otherwise it would be possible to get thresh+n pages -* reported dirty, even though there are thresh-m pages -* actually dirty; with m+n sitting in the percpu -* deltas. -*/ - if (bdi_thresh < 2*bdi_stat_error(bdi)) { - bdi_nr_reclaimable = - bdi_stat_sum(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat_sum(bdi, BDI_WRITEBACK); - } else { - bdi_nr_reclaimable = - bdi_stat(bdi,
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Wed, 19 Sep 2007, Peter Zijlstra wrote: On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, Andy Whitcroft wrote: Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs stuck in a 'D' wait: === mkfs.ext2 D c10220f4 0 6233 6222 [c12194da] io_schedule_timeout+0x1e/0x28 [c10454b4] congestion_wait+0x62/0x7a [c10402af] get_dirty_limits+0x16a/0x172 [c104040b] balance_dirty_pages+0x154/0x1be [c103bda3] generic_perform_write+0x168/0x18a [c103be38] generic_file_buffered_write+0x73/0x107 [c103c346] __generic_file_aio_write_nolock+0x47a/0x4a5 [c103c3b9] generic_file_aio_write_nolock+0x48/0x9b [c105d2d6] do_sync_write+0xbf/0xfc [c105d3a0] vfs_write+0x8d/0x108 [c105d4c3] sys_write+0x41/0x67 [c100260a] syscall_call+0x7/0xb === [edited out some bogus lines from stale stack] This machine and others have run numerous test runs on this kernel and this is the first time I've see a hang like this. I've been seeing something like that on 4-way PPC64: in my case I've shells hanging in D state trying to append to kernel build log on ext3 (the builds themselves going on elsewhere, in tmpfs): one of the shells holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. I wonder if this is the ultimate cause of the couple of mainline hangs which were seen, but not diagnosed. My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've not done enough to check if those really correlate with the hangs), and I'm wondering if the bdi_stat_sum business is needed on the !nr_reclaimable path. FWIW my tired brain seems to think it the !nr_reclaimable path needs it just the same. So this change seems to make sense for now :-) Thanks. So I'm running now with the patch below, good so far, but can't judge until tomorrow whether it has actually addressed the problem seen. Last night's run went well: that patch does indeed seem to have fixed it. Looking at the timings (some variance but _very_ much less than the night before), there does appear to be some other occasional slight slowdown - but I've no reason to suspect your patch for it, nor to suppose it's something new: it may just be an artifact of my heavy swap thrashing. [PATCH mm] mm per-device dirty threshold fix Fix occasional hang when a task couldn't get out of balance_dirty_pages: mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback across all cpus when bdi_thresh is low, even in the case when there was no bdi_nr_reclaimable. Signed-off-by: Hugh Dickins [EMAIL PROTECTED] --- mm/page-writeback.c | 53 +++--- 1 file changed, 24 insertions(+), 29 deletions(-) --- 2.6.23-rc6-mm1/mm/page-writeback.c 2007-09-18 12:28:25.0 +0100 +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) - break; + break; if (!bdi-dirty_exceeded) bdi-dirty_exceeded = 1; @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a */ if (bdi_nr_reclaimable) { writeback_inodes(wbc); - + pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(background_thresh, dirty_thresh, bdi_thresh, bdi); + } - /* -* In order to avoid the stacked BDI deadlock we need -* to ensure we accurately count the 'dirty' pages when -* the threshold is low. -* -* Otherwise it would be possible to get thresh+n pages -* reported dirty, even though there are thresh-m pages -* actually dirty; with m+n sitting in the percpu -* deltas. -*/ - if (bdi_thresh 2*bdi_stat_error(bdi)) { - bdi_nr_reclaimable = - bdi_stat_sum(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat_sum(bdi, BDI_WRITEBACK); - } else { - bdi_nr_reclaimable = - bdi_stat(bdi,
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Thu, 20 Sep 2007 12:31:39 +0100 (BST) Hugh Dickins [EMAIL PROTECTED] wrote: Thanks Hugh! [PATCH mm] mm per-device dirty threshold fix Fix occasional hang when a task couldn't get out of balance_dirty_pages: mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback across all cpus when bdi_thresh is low, even in the case when there was no bdi_nr_reclaimable. Signed-off-by: Hugh Dickins [EMAIL PROTECTED] Acked-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/page-writeback.c | 53 +++--- 1 file changed, 24 insertions(+), 29 deletions(-) --- 2.6.23-rc6-mm1/mm/page-writeback.c2007-09-18 12:28:25.0 +0100 +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) - break; + break; if (!bdi-dirty_exceeded) bdi-dirty_exceeded = 1; @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a */ if (bdi_nr_reclaimable) { writeback_inodes(wbc); - + pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(background_thresh, dirty_thresh, bdi_thresh, bdi); + } - /* - * In order to avoid the stacked BDI deadlock we need - * to ensure we accurately count the 'dirty' pages when - * the threshold is low. - * - * Otherwise it would be possible to get thresh+n pages - * reported dirty, even though there are thresh-m pages - * actually dirty; with m+n sitting in the percpu - * deltas. - */ - if (bdi_thresh 2*bdi_stat_error(bdi)) { - bdi_nr_reclaimable = - bdi_stat_sum(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat_sum(bdi, BDI_WRITEBACK); - } else { - bdi_nr_reclaimable = - bdi_stat(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat(bdi, BDI_WRITEBACK); - } + /* + * In order to avoid the stacked BDI deadlock we need + * to ensure we accurately count the 'dirty' pages when + * the threshold is low. + * + * Otherwise it would be possible to get thresh+n pages + * reported dirty, even though there are thresh-m pages + * actually dirty; with m+n sitting in the percpu + * deltas. + */ + if (bdi_thresh 2*bdi_stat_error(bdi)) { + bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE); + bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK); + } else if (bdi_nr_reclaimable) { + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); + bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); + } - if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) - break; + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) + break; + if (pages_written = write_chunk) + break; /* We've done our duty */ - pages_written += write_chunk - wbc.nr_to_write; - if (pages_written = write_chunk) - break; /* We've done our duty */ - } congestion_wait(WRITE, HZ/10); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins <[EMAIL PROTECTED]> wrote: > On Wed, 19 Sep 2007, Andy Whitcroft wrote: > > Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs > > stuck in a 'D' wait: > > > > === > > mkfs.ext2 D c10220f4 0 6233 6222 > > [] io_schedule_timeout+0x1e/0x28 > > [] congestion_wait+0x62/0x7a > > [] get_dirty_limits+0x16a/0x172 > > [] balance_dirty_pages+0x154/0x1be > > [] generic_perform_write+0x168/0x18a > > [] generic_file_buffered_write+0x73/0x107 > > [] __generic_file_aio_write_nolock+0x47a/0x4a5 > > [] generic_file_aio_write_nolock+0x48/0x9b > > [] do_sync_write+0xbf/0xfc > > [] vfs_write+0x8d/0x108 > > [] sys_write+0x41/0x67 > > [] syscall_call+0x7/0xb > > === > > [edited out some bogus lines from stale stack] > > > This machine and others have run numerous test runs on this kernel and > > this is the first time I've see a hang like this. > > I've been seeing something like that on 4-way PPC64: in my case I've > shells hanging in D state trying to append to kernel build log on ext3 > (the builds themselves going on elsewhere, in tmpfs): one of the shells > holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. > > > I wonder if this is the ultimate cause of the couple of mainline hangs > > which were seen, but not diagnosed. > > My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's > mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable > 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've > not done enough to check if those really correlate with the hangs), > and I'm wondering if the bdi_stat_sum business is needed on the > !nr_reclaimable path. FWIW my tired brain seems to think it the !nr_reclaimable path needs it just the same. So this change seems to make sense for now :-) > So I'm running now with the patch below, good so far, but can't judge > until tomorrow whether it has actually addressed the problem seen. > > Not-yet-Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> > --- > mm/page-writeback.c | 53 +++--- > 1 file changed, 24 insertions(+), 29 deletions(-) > > --- 2.6.23-rc6-mm1/mm/page-writeback.c2007-09-18 12:28:25.0 > +0100 > +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 > @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a > bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > - break; > + break; > > if (!bdi->dirty_exceeded) > bdi->dirty_exceeded = 1; > @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a >*/ > if (bdi_nr_reclaimable) { > writeback_inodes(); > - > + pages_written += write_chunk - wbc.nr_to_write; > get_dirty_limits(_thresh, _thresh, > _thresh, bdi); > + } > > - /* > - * In order to avoid the stacked BDI deadlock we need > - * to ensure we accurately count the 'dirty' pages when > - * the threshold is low. > - * > - * Otherwise it would be possible to get thresh+n pages > - * reported dirty, even though there are thresh-m pages > - * actually dirty; with m+n sitting in the percpu > - * deltas. > - */ > - if (bdi_thresh < 2*bdi_stat_error(bdi)) { > - bdi_nr_reclaimable = > - bdi_stat_sum(bdi, BDI_RECLAIMABLE); > - bdi_nr_writeback = > - bdi_stat_sum(bdi, BDI_WRITEBACK); > - } else { > - bdi_nr_reclaimable = > - bdi_stat(bdi, BDI_RECLAIMABLE); > - bdi_nr_writeback = > - bdi_stat(bdi, BDI_WRITEBACK); > - } > + /* > + * In order to avoid the stacked BDI deadlock we need > + * to ensure we accurately count the 'dirty' pages when > + * the threshold is low. > + * > + * Otherwise it would be possible to get thresh+n pages > + * reported dirty, even though there are thresh-m pages > + * actually dirty; with m+n sitting in the percpu > + * deltas. > + */ > + if (bdi_thresh < 2*bdi_stat_error(bdi)) { > + bdi_nr_reclaimable =
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Wed, 19 Sep 2007, Andy Whitcroft wrote: > Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs > stuck in a 'D' wait: > > === > mkfs.ext2 D c10220f4 0 6233 6222 > [] io_schedule_timeout+0x1e/0x28 > [] congestion_wait+0x62/0x7a > [] get_dirty_limits+0x16a/0x172 > [] balance_dirty_pages+0x154/0x1be > [] generic_perform_write+0x168/0x18a > [] generic_file_buffered_write+0x73/0x107 > [] __generic_file_aio_write_nolock+0x47a/0x4a5 > [] generic_file_aio_write_nolock+0x48/0x9b > [] do_sync_write+0xbf/0xfc > [] vfs_write+0x8d/0x108 > [] sys_write+0x41/0x67 > [] syscall_call+0x7/0xb > === [edited out some bogus lines from stale stack] > This machine and others have run numerous test runs on this kernel and > this is the first time I've see a hang like this. I've been seeing something like that on 4-way PPC64: in my case I've shells hanging in D state trying to append to kernel build log on ext3 (the builds themselves going on elsewhere, in tmpfs): one of the shells holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. > I wonder if this is the ultimate cause of the couple of mainline hangs > which were seen, but not diagnosed. My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've not done enough to check if those really correlate with the hangs), and I'm wondering if the bdi_stat_sum business is needed on the !nr_reclaimable path. So I'm running now with the patch below, good so far, but can't judge until tomorrow whether it has actually addressed the problem seen. Not-yet-Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- mm/page-writeback.c | 53 +++--- 1 file changed, 24 insertions(+), 29 deletions(-) --- 2.6.23-rc6-mm1/mm/page-writeback.c 2007-09-18 12:28:25.0 +0100 +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) - break; + break; if (!bdi->dirty_exceeded) bdi->dirty_exceeded = 1; @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a */ if (bdi_nr_reclaimable) { writeback_inodes(); - + pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(_thresh, _thresh, _thresh, bdi); + } - /* -* In order to avoid the stacked BDI deadlock we need -* to ensure we accurately count the 'dirty' pages when -* the threshold is low. -* -* Otherwise it would be possible to get thresh+n pages -* reported dirty, even though there are thresh-m pages -* actually dirty; with m+n sitting in the percpu -* deltas. -*/ - if (bdi_thresh < 2*bdi_stat_error(bdi)) { - bdi_nr_reclaimable = - bdi_stat_sum(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat_sum(bdi, BDI_WRITEBACK); - } else { - bdi_nr_reclaimable = - bdi_stat(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat(bdi, BDI_WRITEBACK); - } + /* +* In order to avoid the stacked BDI deadlock we need +* to ensure we accurately count the 'dirty' pages when +* the threshold is low. +* +* Otherwise it would be possible to get thresh+n pages +* reported dirty, even though there are thresh-m pages +* actually dirty; with m+n sitting in the percpu +* deltas. +*/ + if (bdi_thresh < 2*bdi_stat_error(bdi)) { + bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE); + bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK); + } else if (bdi_nr_reclaimable) { + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); + bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); + } -
2.6.23-rc6-mm1 -- mkfs stuck in 'D'
Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs stuck in a 'D' wait: === mkfs.ext2 D c10220f4 0 6233 6222 c344fc80 0082 0286 c10220f4 c344fc90 002ed099 c2963340 c2b9f640 c142bce0 c2b9f640 c344fc90 002ed099 c344fcfc c344fcc0 c1219563 c1109bf2 c344fcc4 c186e4d4 c186e4d4 002ed099 c1022612 c2b9f640 c186e000 c104000c Call Trace: [] lock_timer_base+0x19/0x35 [] schedule_timeout+0x70/0x8d [] prop_fraction_single+0x37/0x5d [] process_timeout+0x0/0x5 [] task_dirty_limit+0x3a/0xb5 [] io_schedule_timeout+0x1e/0x28 [] congestion_wait+0x62/0x7a [] autoremove_wake_function+0x0/0x33 [] get_dirty_limits+0x16a/0x172 [] autoremove_wake_function+0x0/0x33 [] balance_dirty_pages+0x154/0x1be [] generic_perform_write+0x168/0x18a [] generic_file_buffered_write+0x73/0x107 [] __generic_file_aio_write_nolock+0x47a/0x4a5 [] do_sock_write+0x92/0x99 [] sock_aio_write+0x52/0x5e [] generic_file_aio_write_nolock+0x48/0x9b [] do_sync_write+0xbf/0xfc [] autoremove_wake_function+0x0/0x33 [] do_page_fault+0x2cc/0x739 [] vfs_write+0x8d/0x108 [] sys_write+0x41/0x67 [] syscall_call+0x7/0xb === This machine and others have run numerous test runs on this kernel and this is the first time I've see a hang like this. I wonder if this is the ultimate cause of the couple of mainline hangs which were seen, but not diagnosed. -apw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc6-mm1 -- mkfs stuck in 'D'
Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs stuck in a 'D' wait: === mkfs.ext2 D c10220f4 0 6233 6222 c344fc80 0082 0286 c10220f4 c344fc90 002ed099 c2963340 c2b9f640 c142bce0 c2b9f640 c344fc90 002ed099 c344fcfc c344fcc0 c1219563 c1109bf2 c344fcc4 c186e4d4 c186e4d4 002ed099 c1022612 c2b9f640 c186e000 c104000c Call Trace: [c10220f4] lock_timer_base+0x19/0x35 [c1219563] schedule_timeout+0x70/0x8d [c1109bf2] prop_fraction_single+0x37/0x5d [c1022612] process_timeout+0x0/0x5 [c104000c] task_dirty_limit+0x3a/0xb5 [c12194da] io_schedule_timeout+0x1e/0x28 [c10454b4] congestion_wait+0x62/0x7a [c102b021] autoremove_wake_function+0x0/0x33 [c10402af] get_dirty_limits+0x16a/0x172 [c102b021] autoremove_wake_function+0x0/0x33 [c104040b] balance_dirty_pages+0x154/0x1be [c103bda3] generic_perform_write+0x168/0x18a [c103be38] generic_file_buffered_write+0x73/0x107 [c103c346] __generic_file_aio_write_nolock+0x47a/0x4a5 [c11b0fef] do_sock_write+0x92/0x99 [c11b1048] sock_aio_write+0x52/0x5e [c103c3b9] generic_file_aio_write_nolock+0x48/0x9b [c105d2d6] do_sync_write+0xbf/0xfc [c102b021] autoremove_wake_function+0x0/0x33 [c1010311] do_page_fault+0x2cc/0x739 [c105d3a0] vfs_write+0x8d/0x108 [c105d4c3] sys_write+0x41/0x67 [c100260a] syscall_call+0x7/0xb === This machine and others have run numerous test runs on this kernel and this is the first time I've see a hang like this. I wonder if this is the ultimate cause of the couple of mainline hangs which were seen, but not diagnosed. -apw - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Wed, 19 Sep 2007, Andy Whitcroft wrote: Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs stuck in a 'D' wait: === mkfs.ext2 D c10220f4 0 6233 6222 [c12194da] io_schedule_timeout+0x1e/0x28 [c10454b4] congestion_wait+0x62/0x7a [c10402af] get_dirty_limits+0x16a/0x172 [c104040b] balance_dirty_pages+0x154/0x1be [c103bda3] generic_perform_write+0x168/0x18a [c103be38] generic_file_buffered_write+0x73/0x107 [c103c346] __generic_file_aio_write_nolock+0x47a/0x4a5 [c103c3b9] generic_file_aio_write_nolock+0x48/0x9b [c105d2d6] do_sync_write+0xbf/0xfc [c105d3a0] vfs_write+0x8d/0x108 [c105d4c3] sys_write+0x41/0x67 [c100260a] syscall_call+0x7/0xb === [edited out some bogus lines from stale stack] This machine and others have run numerous test runs on this kernel and this is the first time I've see a hang like this. I've been seeing something like that on 4-way PPC64: in my case I've shells hanging in D state trying to append to kernel build log on ext3 (the builds themselves going on elsewhere, in tmpfs): one of the shells holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. I wonder if this is the ultimate cause of the couple of mainline hangs which were seen, but not diagnosed. My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've not done enough to check if those really correlate with the hangs), and I'm wondering if the bdi_stat_sum business is needed on the !nr_reclaimable path. So I'm running now with the patch below, good so far, but can't judge until tomorrow whether it has actually addressed the problem seen. Not-yet-Signed-off-by: Hugh Dickins [EMAIL PROTECTED] --- mm/page-writeback.c | 53 +++--- 1 file changed, 24 insertions(+), 29 deletions(-) --- 2.6.23-rc6-mm1/mm/page-writeback.c 2007-09-18 12:28:25.0 +0100 +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) - break; + break; if (!bdi-dirty_exceeded) bdi-dirty_exceeded = 1; @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a */ if (bdi_nr_reclaimable) { writeback_inodes(wbc); - + pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(background_thresh, dirty_thresh, bdi_thresh, bdi); + } - /* -* In order to avoid the stacked BDI deadlock we need -* to ensure we accurately count the 'dirty' pages when -* the threshold is low. -* -* Otherwise it would be possible to get thresh+n pages -* reported dirty, even though there are thresh-m pages -* actually dirty; with m+n sitting in the percpu -* deltas. -*/ - if (bdi_thresh 2*bdi_stat_error(bdi)) { - bdi_nr_reclaimable = - bdi_stat_sum(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat_sum(bdi, BDI_WRITEBACK); - } else { - bdi_nr_reclaimable = - bdi_stat(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat(bdi, BDI_WRITEBACK); - } + /* +* In order to avoid the stacked BDI deadlock we need +* to ensure we accurately count the 'dirty' pages when +* the threshold is low. +* +* Otherwise it would be possible to get thresh+n pages +* reported dirty, even though there are thresh-m pages +* actually dirty; with m+n sitting in the percpu +* deltas. +*/ + if (bdi_thresh 2*bdi_stat_error(bdi)) { + bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE); + bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK); + } else if (bdi_nr_reclaimable) { + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); +
Re: 2.6.23-rc6-mm1 -- mkfs stuck in 'D'
On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, Andy Whitcroft wrote: Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs stuck in a 'D' wait: === mkfs.ext2 D c10220f4 0 6233 6222 [c12194da] io_schedule_timeout+0x1e/0x28 [c10454b4] congestion_wait+0x62/0x7a [c10402af] get_dirty_limits+0x16a/0x172 [c104040b] balance_dirty_pages+0x154/0x1be [c103bda3] generic_perform_write+0x168/0x18a [c103be38] generic_file_buffered_write+0x73/0x107 [c103c346] __generic_file_aio_write_nolock+0x47a/0x4a5 [c103c3b9] generic_file_aio_write_nolock+0x48/0x9b [c105d2d6] do_sync_write+0xbf/0xfc [c105d3a0] vfs_write+0x8d/0x108 [c105d4c3] sys_write+0x41/0x67 [c100260a] syscall_call+0x7/0xb === [edited out some bogus lines from stale stack] This machine and others have run numerous test runs on this kernel and this is the first time I've see a hang like this. I've been seeing something like that on 4-way PPC64: in my case I've shells hanging in D state trying to append to kernel build log on ext3 (the builds themselves going on elsewhere, in tmpfs): one of the shells holding i_mutex and stuck doing congestion_waits from balance_dirty_pages. I wonder if this is the ultimate cause of the couple of mainline hangs which were seen, but not diagnosed. My *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable 0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've not done enough to check if those really correlate with the hangs), and I'm wondering if the bdi_stat_sum business is needed on the !nr_reclaimable path. FWIW my tired brain seems to think it the !nr_reclaimable path needs it just the same. So this change seems to make sense for now :-) So I'm running now with the patch below, good so far, but can't judge until tomorrow whether it has actually addressed the problem seen. Not-yet-Signed-off-by: Hugh Dickins [EMAIL PROTECTED] --- mm/page-writeback.c | 53 +++--- 1 file changed, 24 insertions(+), 29 deletions(-) --- 2.6.23-rc6-mm1/mm/page-writeback.c2007-09-18 12:28:25.0 +0100 +++ linux/mm/page-writeback.c 2007-09-19 20:00:46.0 +0100 @@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) - break; + break; if (!bdi-dirty_exceeded) bdi-dirty_exceeded = 1; @@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a */ if (bdi_nr_reclaimable) { writeback_inodes(wbc); - + pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(background_thresh, dirty_thresh, bdi_thresh, bdi); + } - /* - * In order to avoid the stacked BDI deadlock we need - * to ensure we accurately count the 'dirty' pages when - * the threshold is low. - * - * Otherwise it would be possible to get thresh+n pages - * reported dirty, even though there are thresh-m pages - * actually dirty; with m+n sitting in the percpu - * deltas. - */ - if (bdi_thresh 2*bdi_stat_error(bdi)) { - bdi_nr_reclaimable = - bdi_stat_sum(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat_sum(bdi, BDI_WRITEBACK); - } else { - bdi_nr_reclaimable = - bdi_stat(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = - bdi_stat(bdi, BDI_WRITEBACK); - } + /* + * In order to avoid the stacked BDI deadlock we need + * to ensure we accurately count the 'dirty' pages when + * the threshold is low. + * + * Otherwise it would be possible to get thresh+n pages + * reported dirty, even though there are thresh-m pages + * actually dirty; with m+n sitting in the percpu + * deltas. + */ + if (bdi_thresh 2*bdi_stat_error(bdi)) { + bdi_nr_reclaimable = bdi_stat_sum(bdi,