Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-16 Thread Michal Hocko
On Mon 16-07-12 01:10:47, Hugh Dickins wrote:
> On Thu, 12 Jul 2012, Michal Hocko wrote:
> > On Wed 11-07-12 18:57:43, Hugh Dickins wrote:
> > > 
> > > I mentioned in Johannes's [03/11] thread a couple of days ago, that
> > > I was having a problem with your wait_on_page_writeback() in mmotm.
> > > 
> > > It turns out that your original patch was fine, but you let dark angels
> > > whisper into your ear, to persuade you to remove the "&& may_enter_fs".
> > > 
> > > Part of my load builds kernels on extN over loop over tmpfs: loop does
> > > mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS))
> > > because it knows it will deadlock, if the loop thread enters reclaim,
> > > and reclaim tries to write back a dirty page, one which needs the loop
> > > thread to perform the write.
> > 
> > Good catch! I have totally missed the loop driver.
> > 
> > > With the may_enter_fs check restored, all is well.
> 
> Not as well as I thought when I wrote that: but those issues I'll deal
> with in separate mail (and my alternative patch was no better).
> 
> > > I don't entirely
> > > like your patch: I think it would be much better to wait in the same
> > > place as the wait_iff_congested(), when the pages gathered have been
> > > sent for writing and unlocked and putback and freed; 
> > 
> > I guess you mean
> > if (nr_writeback && nr_writeback >=
> > (nr_taken >> (DEF_PRIORITY - sc->priority)))
> > wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
> 
> Yes, I've appended the patch I was meaning below; but although it's
> the way I had approached the issue, I don't in practice see any better
> behaviour from mine than from yours.  So unless a good reason appears
> later, to do it my way instead of yours, let's just forget about mine.

OK

> > I have tried to hook here but it has some issues. First of all we do not
> > know how long we should wait. Waiting for specific pages sounded more
> > event based and more precise.
> > 
> > We can surely do better but I wanted to stop the OOM first without any
> > other possible side effects on the global reclaim. I have tried to make
> > the band aid as simple as possible. Memcg dirty pages accounting is
> > forming already so we are one (tiny) step closer to the throttling.
> >  
> > > and I also wonder if it should go beyond the !global_reclaim case for
> > > swap pages, because they don't participate in dirty limiting.
> > 
> > Worth a separate patch?
> 
> If I could ever generate a suitable testcase, yes.  But in practice,
> the only way I've managed to generate such a preponderance of swapping
> over file reclaim, is by using memcgs, which your patch already catches.
> And if there actually is the swapping issue I suggest, then it's been
> around for a very long time, apparently without complaint.
> 
> Here is the patch I had in mind: I'm posting it as illustration, so we
> can look back to it in the archives if necessary; but it's definitely
> not signed-off, I've seen no practical advantage over yours, probably
> we just forget about this one below now.
> 
> But more mail to follow, returning to yours...
> 
> Hugh
> 
> p.s. KAMEZAWA-san, if you wonder why you're suddenly brought into this
> conversation, it's because there was a typo in your email address before.

Sorry, my fault. I misspelled the domain (jp.fujtisu.com).

> --- 3.5-rc6/vmscan.c  2012-06-03 06:42:11.0 -0700
> +++ linux/vmscan.c2012-07-13 11:53:20.372087273 -0700
> @@ -675,7 +675,8 @@ static unsigned long shrink_page_list(st
> struct zone *zone,
> struct scan_control *sc,
> unsigned long *ret_nr_dirty,
> -   unsigned long *ret_nr_writeback)
> +   unsigned long *ret_nr_writeback,
> +   struct page **slow_page)
>  {
>   LIST_HEAD(ret_pages);
>   LIST_HEAD(free_pages);
> @@ -720,6 +721,27 @@ static unsigned long shrink_page_list(st
>   (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
>  
>   if (PageWriteback(page)) {
> + /*
> +  * memcg doesn't have any dirty pages throttling so we
> +  * could easily OOM just because too many pages are in
> +  * writeback from reclaim and there is nothing else to
> +  * reclaim.  Nor is swap subject to dirty throttling.
> +  *
> +  * Check may_enter_fs, certainly because a loop driver
> +  * thread might enter reclaim, and deadlock if it waits
> +  * on a page for which it is needed to do the write
> +  * (loop masks off __GFP_IO|__GFP_FS for this reason);
> +  * but more thought would probably show more reasons.
> +  *
> +   

Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-16 Thread Hugh Dickins
On Fri, 13 Jul 2012, Michal Hocko wrote:
> On Thu 12-07-12 15:42:53, Hugh Dickins wrote:
> > On Thu, 12 Jul 2012, Andrew Morton wrote:
> > > 
> > > I wasn't planning on 3.5, given the way it's been churning around.
> > 
> > I don't know if you had been intending to send it in for 3.5 earlier;
> > but I'm sorry if my late intervention on may_enter_fs has delayed it.
> 
> Well I should investigate more when the question came up...
>  
> > > How about we put it into 3.6 and tag it for a -stable backport, so
> > > it gets a bit of a run in mainline before we inflict it upon -stable
> > > users?
> > 
> > That sounds good enough to me, but does fall short of Michal's hope.
> 
> I would be happier if it went into 3.5 already because the problem (OOM
> on too many dirty pages) is real and long term (basically since ever).
> We have the patch in SLES11-SP2 for quite some time (the original one
> with the may_enter_fs check) and it helped a lot.
> The patch was designed as a band aid primarily because it is very simple
> that way and with a hope that the real fix will come later.
> The decision is up to you Andrew, but I vote for pushing it as soon as
> possible and try to come up with something more clever for 3.6.

Once I got to trying dd in memcg to FS on USB stick, yes, I very much
agree that the problem is real and well worth fixing, and that your
patch takes us most of the way there.

But Andrew's caution has proved to be well founded: in the last
few days I've found several problems with it.

I guess it makes more sense to go into detail in the patch I'm about
to send, fixing up what is (I think) currently in mmotm.

But in brief: my insistence on may_enter_fs actually took us backwards
on ext4, because that does __GFP_NOFS page allocations when writing.
I still don't understand how this showed up in none of my testing at
the end of the week, and only hit me today (er, yesterday).  But not
as big a problem as I thought at first, because loop also turns off
__GFP_IO, so we can go by that instead.

And though I found your patch works most of the time, one in five
or ten attempts would OOM just as before: we actually have a problem
also with PageWriteback pages which are not PageReclaim, but the
answer is to mark those PageReclaim.

Patch follows separately in a moment.  I'm pretty happy with it now,
but I've not yet tried xfs, btrfs, vfat, tmpfs.  I notice now that
you specifically describe testing on ext3, but don't mention ext4:
I wonder if you got bogged down in the problems I've fixed on that.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-16 Thread Hugh Dickins
On Thu, 12 Jul 2012, Michal Hocko wrote:
> On Wed 11-07-12 18:57:43, Hugh Dickins wrote:
> > 
> > I mentioned in Johannes's [03/11] thread a couple of days ago, that
> > I was having a problem with your wait_on_page_writeback() in mmotm.
> > 
> > It turns out that your original patch was fine, but you let dark angels
> > whisper into your ear, to persuade you to remove the "&& may_enter_fs".
> > 
> > Part of my load builds kernels on extN over loop over tmpfs: loop does
> > mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS))
> > because it knows it will deadlock, if the loop thread enters reclaim,
> > and reclaim tries to write back a dirty page, one which needs the loop
> > thread to perform the write.
> 
> Good catch! I have totally missed the loop driver.
> 
> > With the may_enter_fs check restored, all is well.

Not as well as I thought when I wrote that: but those issues I'll deal
with in separate mail (and my alternative patch was no better).

> > I don't entirely
> > like your patch: I think it would be much better to wait in the same
> > place as the wait_iff_congested(), when the pages gathered have been
> > sent for writing and unlocked and putback and freed; 
> 
> I guess you mean
>   if (nr_writeback && nr_writeback >=
> (nr_taken >> (DEF_PRIORITY - sc->priority)))
> wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

Yes, I've appended the patch I was meaning below; but although it's
the way I had approached the issue, I don't in practice see any better
behaviour from mine than from yours.  So unless a good reason appears
later, to do it my way instead of yours, let's just forget about mine.

> 
> I have tried to hook here but it has some issues. First of all we do not
> know how long we should wait. Waiting for specific pages sounded more
> event based and more precise.
> 
> We can surely do better but I wanted to stop the OOM first without any
> other possible side effects on the global reclaim. I have tried to make
> the band aid as simple as possible. Memcg dirty pages accounting is
> forming already so we are one (tiny) step closer to the throttling.
>  
> > and I also wonder if it should go beyond the !global_reclaim case for
> > swap pages, because they don't participate in dirty limiting.
> 
> Worth a separate patch?

If I could ever generate a suitable testcase, yes.  But in practice,
the only way I've managed to generate such a preponderance of swapping
over file reclaim, is by using memcgs, which your patch already catches.
And if there actually is the swapping issue I suggest, then it's been
around for a very long time, apparently without complaint.

Here is the patch I had in mind: I'm posting it as illustration, so we
can look back to it in the archives if necessary; but it's definitely
not signed-off, I've seen no practical advantage over yours, probably
we just forget about this one below now.

But more mail to follow, returning to yours...

Hugh

p.s. KAMEZAWA-san, if you wonder why you're suddenly brought into this
conversation, it's because there was a typo in your email address before.

--- 3.5-rc6/vmscan.c2012-06-03 06:42:11.0 -0700
+++ linux/vmscan.c  2012-07-13 11:53:20.372087273 -0700
@@ -675,7 +675,8 @@ static unsigned long shrink_page_list(st
  struct zone *zone,
  struct scan_control *sc,
  unsigned long *ret_nr_dirty,
- unsigned long *ret_nr_writeback)
+ unsigned long *ret_nr_writeback,
+ struct page **slow_page)
 {
LIST_HEAD(ret_pages);
LIST_HEAD(free_pages);
@@ -720,6 +721,27 @@ static unsigned long shrink_page_list(st
(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
if (PageWriteback(page)) {
+   /*
+* memcg doesn't have any dirty pages throttling so we
+* could easily OOM just because too many pages are in
+* writeback from reclaim and there is nothing else to
+* reclaim.  Nor is swap subject to dirty throttling.
+*
+* Check may_enter_fs, certainly because a loop driver
+* thread might enter reclaim, and deadlock if it waits
+* on a page for which it is needed to do the write
+* (loop masks off __GFP_IO|__GFP_FS for this reason);
+* but more thought would probably show more reasons.
+*
+* Just use one page per shrink for this: wait on its
+* writeback once we have done the rest.  If device is
+* slow, in due course we shall choose one of its pages.
+  

Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-16 Thread Hugh Dickins
On Thu, 12 Jul 2012, Michal Hocko wrote:
 On Wed 11-07-12 18:57:43, Hugh Dickins wrote:
  
  I mentioned in Johannes's [03/11] thread a couple of days ago, that
  I was having a problem with your wait_on_page_writeback() in mmotm.
  
  It turns out that your original patch was fine, but you let dark angels
  whisper into your ear, to persuade you to remove the  may_enter_fs.
  
  Part of my load builds kernels on extN over loop over tmpfs: loop does
  mapping_set_gfp_mask(mapping, lo-old_gfp_mask  ~(__GFP_IO|__GFP_FS))
  because it knows it will deadlock, if the loop thread enters reclaim,
  and reclaim tries to write back a dirty page, one which needs the loop
  thread to perform the write.
 
 Good catch! I have totally missed the loop driver.
 
  With the may_enter_fs check restored, all is well.

Not as well as I thought when I wrote that: but those issues I'll deal
with in separate mail (and my alternative patch was no better).

  I don't entirely
  like your patch: I think it would be much better to wait in the same
  place as the wait_iff_congested(), when the pages gathered have been
  sent for writing and unlocked and putback and freed; 
 
 I guess you mean
   if (nr_writeback  nr_writeback =
 (nr_taken  (DEF_PRIORITY - sc-priority)))
 wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

Yes, I've appended the patch I was meaning below; but although it's
the way I had approached the issue, I don't in practice see any better
behaviour from mine than from yours.  So unless a good reason appears
later, to do it my way instead of yours, let's just forget about mine.

 
 I have tried to hook here but it has some issues. First of all we do not
 know how long we should wait. Waiting for specific pages sounded more
 event based and more precise.
 
 We can surely do better but I wanted to stop the OOM first without any
 other possible side effects on the global reclaim. I have tried to make
 the band aid as simple as possible. Memcg dirty pages accounting is
 forming already so we are one (tiny) step closer to the throttling.
  
  and I also wonder if it should go beyond the !global_reclaim case for
  swap pages, because they don't participate in dirty limiting.
 
 Worth a separate patch?

If I could ever generate a suitable testcase, yes.  But in practice,
the only way I've managed to generate such a preponderance of swapping
over file reclaim, is by using memcgs, which your patch already catches.
And if there actually is the swapping issue I suggest, then it's been
around for a very long time, apparently without complaint.

Here is the patch I had in mind: I'm posting it as illustration, so we
can look back to it in the archives if necessary; but it's definitely
not signed-off, I've seen no practical advantage over yours, probably
we just forget about this one below now.

But more mail to follow, returning to yours...

Hugh

p.s. KAMEZAWA-san, if you wonder why you're suddenly brought into this
conversation, it's because there was a typo in your email address before.

--- 3.5-rc6/vmscan.c2012-06-03 06:42:11.0 -0700
+++ linux/vmscan.c  2012-07-13 11:53:20.372087273 -0700
@@ -675,7 +675,8 @@ static unsigned long shrink_page_list(st
  struct zone *zone,
  struct scan_control *sc,
  unsigned long *ret_nr_dirty,
- unsigned long *ret_nr_writeback)
+ unsigned long *ret_nr_writeback,
+ struct page **slow_page)
 {
LIST_HEAD(ret_pages);
LIST_HEAD(free_pages);
@@ -720,6 +721,27 @@ static unsigned long shrink_page_list(st
(PageSwapCache(page)  (sc-gfp_mask  __GFP_IO));
 
if (PageWriteback(page)) {
+   /*
+* memcg doesn't have any dirty pages throttling so we
+* could easily OOM just because too many pages are in
+* writeback from reclaim and there is nothing else to
+* reclaim.  Nor is swap subject to dirty throttling.
+*
+* Check may_enter_fs, certainly because a loop driver
+* thread might enter reclaim, and deadlock if it waits
+* on a page for which it is needed to do the write
+* (loop masks off __GFP_IO|__GFP_FS for this reason);
+* but more thought would probably show more reasons.
+*
+* Just use one page per shrink for this: wait on its
+* writeback once we have done the rest.  If device is
+* slow, in due course we shall choose one of its pages.
+*/
+   if (!*slow_page  may_enter_fs  

Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-16 Thread Hugh Dickins
On Fri, 13 Jul 2012, Michal Hocko wrote:
 On Thu 12-07-12 15:42:53, Hugh Dickins wrote:
  On Thu, 12 Jul 2012, Andrew Morton wrote:
   
   I wasn't planning on 3.5, given the way it's been churning around.
  
  I don't know if you had been intending to send it in for 3.5 earlier;
  but I'm sorry if my late intervention on may_enter_fs has delayed it.
 
 Well I should investigate more when the question came up...
  
   How about we put it into 3.6 and tag it for a -stable backport, so
   it gets a bit of a run in mainline before we inflict it upon -stable
   users?
  
  That sounds good enough to me, but does fall short of Michal's hope.
 
 I would be happier if it went into 3.5 already because the problem (OOM
 on too many dirty pages) is real and long term (basically since ever).
 We have the patch in SLES11-SP2 for quite some time (the original one
 with the may_enter_fs check) and it helped a lot.
 The patch was designed as a band aid primarily because it is very simple
 that way and with a hope that the real fix will come later.
 The decision is up to you Andrew, but I vote for pushing it as soon as
 possible and try to come up with something more clever for 3.6.

Once I got to trying dd in memcg to FS on USB stick, yes, I very much
agree that the problem is real and well worth fixing, and that your
patch takes us most of the way there.

But Andrew's caution has proved to be well founded: in the last
few days I've found several problems with it.

I guess it makes more sense to go into detail in the patch I'm about
to send, fixing up what is (I think) currently in mmotm.

But in brief: my insistence on may_enter_fs actually took us backwards
on ext4, because that does __GFP_NOFS page allocations when writing.
I still don't understand how this showed up in none of my testing at
the end of the week, and only hit me today (er, yesterday).  But not
as big a problem as I thought at first, because loop also turns off
__GFP_IO, so we can go by that instead.

And though I found your patch works most of the time, one in five
or ten attempts would OOM just as before: we actually have a problem
also with PageWriteback pages which are not PageReclaim, but the
answer is to mark those PageReclaim.

Patch follows separately in a moment.  I'm pretty happy with it now,
but I've not yet tried xfs, btrfs, vfat, tmpfs.  I notice now that
you specifically describe testing on ext3, but don't mention ext4:
I wonder if you got bogged down in the problems I've fixed on that.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-16 Thread Michal Hocko
On Mon 16-07-12 01:10:47, Hugh Dickins wrote:
 On Thu, 12 Jul 2012, Michal Hocko wrote:
  On Wed 11-07-12 18:57:43, Hugh Dickins wrote:
   
   I mentioned in Johannes's [03/11] thread a couple of days ago, that
   I was having a problem with your wait_on_page_writeback() in mmotm.
   
   It turns out that your original patch was fine, but you let dark angels
   whisper into your ear, to persuade you to remove the  may_enter_fs.
   
   Part of my load builds kernels on extN over loop over tmpfs: loop does
   mapping_set_gfp_mask(mapping, lo-old_gfp_mask  ~(__GFP_IO|__GFP_FS))
   because it knows it will deadlock, if the loop thread enters reclaim,
   and reclaim tries to write back a dirty page, one which needs the loop
   thread to perform the write.
  
  Good catch! I have totally missed the loop driver.
  
   With the may_enter_fs check restored, all is well.
 
 Not as well as I thought when I wrote that: but those issues I'll deal
 with in separate mail (and my alternative patch was no better).
 
   I don't entirely
   like your patch: I think it would be much better to wait in the same
   place as the wait_iff_congested(), when the pages gathered have been
   sent for writing and unlocked and putback and freed; 
  
  I guess you mean
  if (nr_writeback  nr_writeback =
  (nr_taken  (DEF_PRIORITY - sc-priority)))
  wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
 
 Yes, I've appended the patch I was meaning below; but although it's
 the way I had approached the issue, I don't in practice see any better
 behaviour from mine than from yours.  So unless a good reason appears
 later, to do it my way instead of yours, let's just forget about mine.

OK

  I have tried to hook here but it has some issues. First of all we do not
  know how long we should wait. Waiting for specific pages sounded more
  event based and more precise.
  
  We can surely do better but I wanted to stop the OOM first without any
  other possible side effects on the global reclaim. I have tried to make
  the band aid as simple as possible. Memcg dirty pages accounting is
  forming already so we are one (tiny) step closer to the throttling.
   
   and I also wonder if it should go beyond the !global_reclaim case for
   swap pages, because they don't participate in dirty limiting.
  
  Worth a separate patch?
 
 If I could ever generate a suitable testcase, yes.  But in practice,
 the only way I've managed to generate such a preponderance of swapping
 over file reclaim, is by using memcgs, which your patch already catches.
 And if there actually is the swapping issue I suggest, then it's been
 around for a very long time, apparently without complaint.
 
 Here is the patch I had in mind: I'm posting it as illustration, so we
 can look back to it in the archives if necessary; but it's definitely
 not signed-off, I've seen no practical advantage over yours, probably
 we just forget about this one below now.
 
 But more mail to follow, returning to yours...
 
 Hugh
 
 p.s. KAMEZAWA-san, if you wonder why you're suddenly brought into this
 conversation, it's because there was a typo in your email address before.

Sorry, my fault. I misspelled the domain (jp.fujtisu.com).

 --- 3.5-rc6/vmscan.c  2012-06-03 06:42:11.0 -0700
 +++ linux/vmscan.c2012-07-13 11:53:20.372087273 -0700
 @@ -675,7 +675,8 @@ static unsigned long shrink_page_list(st
 struct zone *zone,
 struct scan_control *sc,
 unsigned long *ret_nr_dirty,
 -   unsigned long *ret_nr_writeback)
 +   unsigned long *ret_nr_writeback,
 +   struct page **slow_page)
  {
   LIST_HEAD(ret_pages);
   LIST_HEAD(free_pages);
 @@ -720,6 +721,27 @@ static unsigned long shrink_page_list(st
   (PageSwapCache(page)  (sc-gfp_mask  __GFP_IO));
  
   if (PageWriteback(page)) {
 + /*
 +  * memcg doesn't have any dirty pages throttling so we
 +  * could easily OOM just because too many pages are in
 +  * writeback from reclaim and there is nothing else to
 +  * reclaim.  Nor is swap subject to dirty throttling.
 +  *
 +  * Check may_enter_fs, certainly because a loop driver
 +  * thread might enter reclaim, and deadlock if it waits
 +  * on a page for which it is needed to do the write
 +  * (loop masks off __GFP_IO|__GFP_FS for this reason);
 +  * but more thought would probably show more reasons.
 +  *
 +  * Just use one page per shrink for this: wait on its
 +  * writeback once we have done the rest.  If device is
 +  

Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-13 Thread Michal Hocko
On Thu 12-07-12 15:42:53, Hugh Dickins wrote:
> On Thu, 12 Jul 2012, Andrew Morton wrote:
> > On Thu, 12 Jul 2012 09:05:01 +0200
> > Michal Hocko  wrote:
> > 
> > > When we are back to the patch. Is it going into 3.5? I hope so and I
> > > think it is really worth stable as well. Andrew?
> > 
> > What patch.   "memcg: prevent OOM with too many dirty pages"?
> 
> Yes.
> 
> > 
> > I wasn't planning on 3.5, given the way it's been churning around.
> 
> I don't know if you had been intending to send it in for 3.5 earlier;
> but I'm sorry if my late intervention on may_enter_fs has delayed it.

Well I should investigate more when the question came up...
 
> > How about we put it into 3.6 and tag it for a -stable backport, so
> > it gets a bit of a run in mainline before we inflict it upon -stable
> > users?
> 
> That sounds good enough to me, but does fall short of Michal's hope.

I would be happier if it went into 3.5 already because the problem (OOM
on too many dirty pages) is real and long term (basically since ever).
We have the patch in SLES11-SP2 for quite some time (the original one
with the may_enter_fs check) and it helped a lot.
The patch was designed as a band aid primarily because it is very simple
that way and with a hope that the real fix will come later.
The decision is up to you Andrew, but I vote for pushing it as soon as
possible and try to come up with something more clever for 3.6.

> 
> Hugh

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-13 Thread Michal Hocko
On Thu 12-07-12 15:42:53, Hugh Dickins wrote:
 On Thu, 12 Jul 2012, Andrew Morton wrote:
  On Thu, 12 Jul 2012 09:05:01 +0200
  Michal Hocko mho...@suse.cz wrote:
  
   When we are back to the patch. Is it going into 3.5? I hope so and I
   think it is really worth stable as well. Andrew?
  
  What patch.   memcg: prevent OOM with too many dirty pages?
 
 Yes.
 
  
  I wasn't planning on 3.5, given the way it's been churning around.
 
 I don't know if you had been intending to send it in for 3.5 earlier;
 but I'm sorry if my late intervention on may_enter_fs has delayed it.

Well I should investigate more when the question came up...
 
  How about we put it into 3.6 and tag it for a -stable backport, so
  it gets a bit of a run in mainline before we inflict it upon -stable
  users?
 
 That sounds good enough to me, but does fall short of Michal's hope.

I would be happier if it went into 3.5 already because the problem (OOM
on too many dirty pages) is real and long term (basically since ever).
We have the patch in SLES11-SP2 for quite some time (the original one
with the may_enter_fs check) and it helped a lot.
The patch was designed as a band aid primarily because it is very simple
that way and with a hope that the real fix will come later.
The decision is up to you Andrew, but I vote for pushing it as soon as
possible and try to come up with something more clever for 3.6.

 
 Hugh

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-12 Thread Hugh Dickins
On Thu, 12 Jul 2012, Andrew Morton wrote:
> On Thu, 12 Jul 2012 09:05:01 +0200
> Michal Hocko  wrote:
> 
> > When we are back to the patch. Is it going into 3.5? I hope so and I
> > think it is really worth stable as well. Andrew?
> 
> What patch.   "memcg: prevent OOM with too many dirty pages"?

Yes.

> 
> I wasn't planning on 3.5, given the way it's been churning around.

I don't know if you had been intending to send it in for 3.5 earlier;
but I'm sorry if my late intervention on may_enter_fs has delayed it.

> How
> about we put it into 3.6 and tag it for a -stable backport, so it gets
> a bit of a run in mainline before we inflict it upon -stable users?

That sounds good enough to me, but does fall short of Michal's hope.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-12 Thread Andrew Morton
On Thu, 12 Jul 2012 09:05:01 +0200
Michal Hocko  wrote:

> When we are back to the patch. Is it going into 3.5? I hope so and I
> think it is really worth stable as well. Andrew?

What patch.   "memcg: prevent OOM with too many dirty pages"?

I wasn't planning on 3.5, given the way it's been churning around.  How
about we put it into 3.6 and tag it for a -stable backport, so it gets
a bit of a run in mainline before we inflict it upon -stable users?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-12 Thread Michal Hocko
On Wed 11-07-12 18:57:43, Hugh Dickins wrote:
> Hi Michal,

Hi,

> 
> On Wed, 20 Jun 2012, Michal Hocko wrote:
> > Hi Andrew,
> > here is an updated version if it is easier for you to drop the previous
> > one.
> > changes since v1
> > * added Mel's Reviewed-by
> > * updated changelog as per Andrew
> > * updated the condition to be optimized for no-memcg case
> 
> I mentioned in Johannes's [03/11] thread a couple of days ago, that
> I was having a problem with your wait_on_page_writeback() in mmotm.
> 
> It turns out that your original patch was fine, but you let dark angels
> whisper into your ear, to persuade you to remove the "&& may_enter_fs".
> 
> Part of my load builds kernels on extN over loop over tmpfs: loop does
> mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS))
> because it knows it will deadlock, if the loop thread enters reclaim,
> and reclaim tries to write back a dirty page, one which needs the loop
> thread to perform the write.

Good catch! I have totally missed the loop driver.

> With the may_enter_fs check restored, all is well.  I don't entirely
> like your patch: I think it would be much better to wait in the same
> place as the wait_iff_congested(), when the pages gathered have been
> sent for writing and unlocked and putback and freed; 

I guess you mean
if (nr_writeback && nr_writeback >=
(nr_taken >> (DEF_PRIORITY - sc->priority)))
wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

I have tried to hook here but it has some issues. First of all we do not
know how long we should wait. Waiting for specific pages sounded more
event based and more precise.

We can surely do better but I wanted to stop the OOM first without any
other possible side effects on the global reclaim. I have tried to make
the band aid as simple as possible. Memcg dirty pages accounting is
forming already so we are one (tiny) step closer to the throttling.
 
> and I also wonder if it should go beyond the !global_reclaim case for
> swap pages, because they don't participate in dirty limiting.

Worth a separate patch?

> But those are things I should investigate later - I did write a patch
> like that before, when I was having some unexpected OOM trouble with a
> private kernel; but my OOMs then were because of something silly that
> I'd left out, and I'm not at present sure if we have a problem in this
> regard or not.
> 
> The important thing is to get the may_enter_fs back into your patch:
> I can't really Sign-off the below because it's yours, but
> Acked-by: Hugh Dickins 

Thanks a lot Hugh!

When we are back to the patch. Is it going into 3.5? I hope so and I
think it is really worth stable as well. Andrew?

> ---
> 
>  mm/vmscan.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- 3.5-rc6-mm1/mm/vmscan.c   2012-07-11 14:42:13.668335884 -0700
> +++ linux/mm/vmscan.c 2012-07-11 16:01:20.712814127 -0700
> @@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
>* writeback from reclaim and there is nothing else to
>* reclaim.
>*/
> - if (!global_reclaim(sc) && PageReclaim(page))
> + if (!global_reclaim(sc) && PageReclaim(page) &&
> + may_enter_fs)
>   wait_on_page_writeback(page);
>   else {
>   nr_writeback++;

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-12 Thread Michal Hocko
On Wed 11-07-12 18:57:43, Hugh Dickins wrote:
 Hi Michal,

Hi,

 
 On Wed, 20 Jun 2012, Michal Hocko wrote:
  Hi Andrew,
  here is an updated version if it is easier for you to drop the previous
  one.
  changes since v1
  * added Mel's Reviewed-by
  * updated changelog as per Andrew
  * updated the condition to be optimized for no-memcg case
 
 I mentioned in Johannes's [03/11] thread a couple of days ago, that
 I was having a problem with your wait_on_page_writeback() in mmotm.
 
 It turns out that your original patch was fine, but you let dark angels
 whisper into your ear, to persuade you to remove the  may_enter_fs.
 
 Part of my load builds kernels on extN over loop over tmpfs: loop does
 mapping_set_gfp_mask(mapping, lo-old_gfp_mask  ~(__GFP_IO|__GFP_FS))
 because it knows it will deadlock, if the loop thread enters reclaim,
 and reclaim tries to write back a dirty page, one which needs the loop
 thread to perform the write.

Good catch! I have totally missed the loop driver.

 With the may_enter_fs check restored, all is well.  I don't entirely
 like your patch: I think it would be much better to wait in the same
 place as the wait_iff_congested(), when the pages gathered have been
 sent for writing and unlocked and putback and freed; 

I guess you mean
if (nr_writeback  nr_writeback =
(nr_taken  (DEF_PRIORITY - sc-priority)))
wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

I have tried to hook here but it has some issues. First of all we do not
know how long we should wait. Waiting for specific pages sounded more
event based and more precise.

We can surely do better but I wanted to stop the OOM first without any
other possible side effects on the global reclaim. I have tried to make
the band aid as simple as possible. Memcg dirty pages accounting is
forming already so we are one (tiny) step closer to the throttling.
 
 and I also wonder if it should go beyond the !global_reclaim case for
 swap pages, because they don't participate in dirty limiting.

Worth a separate patch?

 But those are things I should investigate later - I did write a patch
 like that before, when I was having some unexpected OOM trouble with a
 private kernel; but my OOMs then were because of something silly that
 I'd left out, and I'm not at present sure if we have a problem in this
 regard or not.
 
 The important thing is to get the may_enter_fs back into your patch:
 I can't really Sign-off the below because it's yours, but
 Acked-by: Hugh Dickins hu...@google.com

Thanks a lot Hugh!

When we are back to the patch. Is it going into 3.5? I hope so and I
think it is really worth stable as well. Andrew?

 ---
 
  mm/vmscan.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 --- 3.5-rc6-mm1/mm/vmscan.c   2012-07-11 14:42:13.668335884 -0700
 +++ linux/mm/vmscan.c 2012-07-11 16:01:20.712814127 -0700
 @@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
* writeback from reclaim and there is nothing else to
* reclaim.
*/
 - if (!global_reclaim(sc)  PageReclaim(page))
 + if (!global_reclaim(sc)  PageReclaim(page) 
 + may_enter_fs)
   wait_on_page_writeback(page);
   else {
   nr_writeback++;

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-12 Thread Andrew Morton
On Thu, 12 Jul 2012 09:05:01 +0200
Michal Hocko mho...@suse.cz wrote:

 When we are back to the patch. Is it going into 3.5? I hope so and I
 think it is really worth stable as well. Andrew?

What patch.   memcg: prevent OOM with too many dirty pages?

I wasn't planning on 3.5, given the way it's been churning around.  How
about we put it into 3.6 and tag it for a -stable backport, so it gets
a bit of a run in mainline before we inflict it upon -stable users?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-12 Thread Hugh Dickins
On Thu, 12 Jul 2012, Andrew Morton wrote:
 On Thu, 12 Jul 2012 09:05:01 +0200
 Michal Hocko mho...@suse.cz wrote:
 
  When we are back to the patch. Is it going into 3.5? I hope so and I
  think it is really worth stable as well. Andrew?
 
 What patch.   memcg: prevent OOM with too many dirty pages?

Yes.

 
 I wasn't planning on 3.5, given the way it's been churning around.

I don't know if you had been intending to send it in for 3.5 earlier;
but I'm sorry if my late intervention on may_enter_fs has delayed it.

 How
 about we put it into 3.6 and tag it for a -stable backport, so it gets
 a bit of a run in mainline before we inflict it upon -stable users?

That sounds good enough to me, but does fall short of Michal's hope.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-11 Thread Hugh Dickins
On Wed, 11 Jul 2012, Andrew Morton wrote:
> On Wed, 11 Jul 2012 18:57:43 -0700 (PDT) Hugh Dickins  
> wrote:
> 
> > --- 3.5-rc6-mm1/mm/vmscan.c 2012-07-11 14:42:13.668335884 -0700
> > +++ linux/mm/vmscan.c   2012-07-11 16:01:20.712814127 -0700
> > @@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
> >  * writeback from reclaim and there is nothing else to
> >  * reclaim.
> >  */
> > -   if (!global_reclaim(sc) && PageReclaim(page))
> > +   if (!global_reclaim(sc) && PageReclaim(page) &&
> > +   may_enter_fs)
> > wait_on_page_writeback(page);
> > else {
> > nr_writeback++;
> 
> um, that may_enter_fs test got removed because nobody knew why it was
> there.  Nobody knew why it was there because it was undocumented.  Do
> you see where I'm going with this?

I was hoping you might do that bit ;)  Here's my display of ignorance:

--- 3.5-rc6-mm1/mm/vmscan.c 2012-07-11 14:42:13.668335884 -0700
+++ linux/mm/vmscan.c   2012-07-11 20:09:33.182829986 -0700
@@ -725,8 +725,15 @@ static unsigned long shrink_page_list(st
 * could easily OOM just because too many pages are in
 * writeback from reclaim and there is nothing else to
 * reclaim.
+*
+* Check may_enter_fs, certainly because a loop driver
+* thread might enter reclaim, and deadlock if it waits
+* on a page for which it is needed to do the write
+* (loop masks off __GFP_IO|__GFP_FS for this reason);
+* but more thought would probably show more reasons.
 */
-   if (!global_reclaim(sc) && PageReclaim(page))
+   if (!global_reclaim(sc) && PageReclaim(page) &&
+   may_enter_fs)
wait_on_page_writeback(page);
else {
nr_writeback++;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-11 Thread Andrew Morton
On Wed, 11 Jul 2012 18:57:43 -0700 (PDT) Hugh Dickins  wrote:

> --- 3.5-rc6-mm1/mm/vmscan.c   2012-07-11 14:42:13.668335884 -0700
> +++ linux/mm/vmscan.c 2012-07-11 16:01:20.712814127 -0700
> @@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
>* writeback from reclaim and there is nothing else to
>* reclaim.
>*/
> - if (!global_reclaim(sc) && PageReclaim(page))
> + if (!global_reclaim(sc) && PageReclaim(page) &&
> + may_enter_fs)
>   wait_on_page_writeback(page);
>   else {
>   nr_writeback++;

um, that may_enter_fs test got removed because nobody knew why it was
there.  Nobody knew why it was there because it was undocumented.  Do
you see where I'm going with this?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-11 Thread Hugh Dickins
Hi Michal,

On Wed, 20 Jun 2012, Michal Hocko wrote:
> Hi Andrew,
> here is an updated version if it is easier for you to drop the previous
> one.
> changes since v1
> * added Mel's Reviewed-by
> * updated changelog as per Andrew
> * updated the condition to be optimized for no-memcg case

I mentioned in Johannes's [03/11] thread a couple of days ago, that
I was having a problem with your wait_on_page_writeback() in mmotm.

It turns out that your original patch was fine, but you let dark angels
whisper into your ear, to persuade you to remove the "&& may_enter_fs".

Part of my load builds kernels on extN over loop over tmpfs: loop does
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS))
because it knows it will deadlock, if the loop thread enters reclaim,
and reclaim tries to write back a dirty page, one which needs the loop
thread to perform the write.

With the may_enter_fs check restored, all is well.  I don't entirely
like your patch: I think it would be much better to wait in the same
place as the wait_iff_congested(), when the pages gathered have been
sent for writing and unlocked and putback and freed; and I also wonder
if it should go beyond the !global_reclaim case for swap pages, because
they don't participate in dirty limiting.

But those are things I should investigate later - I did write a patch
like that before, when I was having some unexpected OOM trouble with a
private kernel; but my OOMs then were because of something silly that
I'd left out, and I'm not at present sure if we have a problem in this
regard or not.

The important thing is to get the may_enter_fs back into your patch:
I can't really Sign-off the below because it's yours, but
Acked-by: Hugh Dickins 
---

 mm/vmscan.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- 3.5-rc6-mm1/mm/vmscan.c 2012-07-11 14:42:13.668335884 -0700
+++ linux/mm/vmscan.c   2012-07-11 16:01:20.712814127 -0700
@@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
 * writeback from reclaim and there is nothing else to
 * reclaim.
 */
-   if (!global_reclaim(sc) && PageReclaim(page))
+   if (!global_reclaim(sc) && PageReclaim(page) &&
+   may_enter_fs)
wait_on_page_writeback(page);
else {
nr_writeback++;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-11 Thread Hugh Dickins
Hi Michal,

On Wed, 20 Jun 2012, Michal Hocko wrote:
 Hi Andrew,
 here is an updated version if it is easier for you to drop the previous
 one.
 changes since v1
 * added Mel's Reviewed-by
 * updated changelog as per Andrew
 * updated the condition to be optimized for no-memcg case

I mentioned in Johannes's [03/11] thread a couple of days ago, that
I was having a problem with your wait_on_page_writeback() in mmotm.

It turns out that your original patch was fine, but you let dark angels
whisper into your ear, to persuade you to remove the  may_enter_fs.

Part of my load builds kernels on extN over loop over tmpfs: loop does
mapping_set_gfp_mask(mapping, lo-old_gfp_mask  ~(__GFP_IO|__GFP_FS))
because it knows it will deadlock, if the loop thread enters reclaim,
and reclaim tries to write back a dirty page, one which needs the loop
thread to perform the write.

With the may_enter_fs check restored, all is well.  I don't entirely
like your patch: I think it would be much better to wait in the same
place as the wait_iff_congested(), when the pages gathered have been
sent for writing and unlocked and putback and freed; and I also wonder
if it should go beyond the !global_reclaim case for swap pages, because
they don't participate in dirty limiting.

But those are things I should investigate later - I did write a patch
like that before, when I was having some unexpected OOM trouble with a
private kernel; but my OOMs then were because of something silly that
I'd left out, and I'm not at present sure if we have a problem in this
regard or not.

The important thing is to get the may_enter_fs back into your patch:
I can't really Sign-off the below because it's yours, but
Acked-by: Hugh Dickins hu...@google.com
---

 mm/vmscan.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- 3.5-rc6-mm1/mm/vmscan.c 2012-07-11 14:42:13.668335884 -0700
+++ linux/mm/vmscan.c   2012-07-11 16:01:20.712814127 -0700
@@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
 * writeback from reclaim and there is nothing else to
 * reclaim.
 */
-   if (!global_reclaim(sc)  PageReclaim(page))
+   if (!global_reclaim(sc)  PageReclaim(page) 
+   may_enter_fs)
wait_on_page_writeback(page);
else {
nr_writeback++;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-11 Thread Andrew Morton
On Wed, 11 Jul 2012 18:57:43 -0700 (PDT) Hugh Dickins hu...@google.com wrote:

 --- 3.5-rc6-mm1/mm/vmscan.c   2012-07-11 14:42:13.668335884 -0700
 +++ linux/mm/vmscan.c 2012-07-11 16:01:20.712814127 -0700
 @@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
* writeback from reclaim and there is nothing else to
* reclaim.
*/
 - if (!global_reclaim(sc)  PageReclaim(page))
 + if (!global_reclaim(sc)  PageReclaim(page) 
 + may_enter_fs)
   wait_on_page_writeback(page);
   else {
   nr_writeback++;

um, that may_enter_fs test got removed because nobody knew why it was
there.  Nobody knew why it was there because it was undocumented.  Do
you see where I'm going with this?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 -mm] memcg: prevent from OOM with too many dirty pages

2012-07-11 Thread Hugh Dickins
On Wed, 11 Jul 2012, Andrew Morton wrote:
 On Wed, 11 Jul 2012 18:57:43 -0700 (PDT) Hugh Dickins hu...@google.com 
 wrote:
 
  --- 3.5-rc6-mm1/mm/vmscan.c 2012-07-11 14:42:13.668335884 -0700
  +++ linux/mm/vmscan.c   2012-07-11 16:01:20.712814127 -0700
  @@ -726,7 +726,8 @@ static unsigned long shrink_page_list(st
   * writeback from reclaim and there is nothing else to
   * reclaim.
   */
  -   if (!global_reclaim(sc)  PageReclaim(page))
  +   if (!global_reclaim(sc)  PageReclaim(page) 
  +   may_enter_fs)
  wait_on_page_writeback(page);
  else {
  nr_writeback++;
 
 um, that may_enter_fs test got removed because nobody knew why it was
 there.  Nobody knew why it was there because it was undocumented.  Do
 you see where I'm going with this?

I was hoping you might do that bit ;)  Here's my display of ignorance:

--- 3.5-rc6-mm1/mm/vmscan.c 2012-07-11 14:42:13.668335884 -0700
+++ linux/mm/vmscan.c   2012-07-11 20:09:33.182829986 -0700
@@ -725,8 +725,15 @@ static unsigned long shrink_page_list(st
 * could easily OOM just because too many pages are in
 * writeback from reclaim and there is nothing else to
 * reclaim.
+*
+* Check may_enter_fs, certainly because a loop driver
+* thread might enter reclaim, and deadlock if it waits
+* on a page for which it is needed to do the write
+* (loop masks off __GFP_IO|__GFP_FS for this reason);
+* but more thought would probably show more reasons.
 */
-   if (!global_reclaim(sc)  PageReclaim(page))
+   if (!global_reclaim(sc)  PageReclaim(page) 
+   may_enter_fs)
wait_on_page_writeback(page);
else {
nr_writeback++;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/