Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Wanpeng Li
On Thu, Jul 12, 2012 at 02:29:38PM +0200, Michal Hocko wrote:
>On Thu 12-07-12 19:51:25, Wanpeng Li wrote:
>> On Thu, Jul 12, 2012 at 01:08:38PM +0200, Michal Hocko wrote:
>> >On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
>> >> From: Wanpeng Li 
>> >> 
>> >> Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
>> >> there are two break points in mem_cgroup_reclaim:
>> >> if (total && (flag & MEM_CGROUP_RECLAIM_SHIRINK))
>> >>   break;
>> >> if (mem_cgroup_margin(memcg))
>> >>   break;
>> >> so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
>> >> which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
>> >> (mem_over_limit) >= nr_pages is not true, the process will go to
>> >> mem_cgroup_wait_acct_move to wait doubly charge counted caused by
>> >> task move. 
>> >
>> >I am sorry but I have no idea what you are trying to say. The
>> >mem_cgroup_wait_acct_move just makes sure that we are waiting until
>> >charge is moved (which can potentially free some charges) rather than
>> >OOM which should be the last resort so it makes sense to retry them
>> >charge.
>> >
>> >> But this time still can't guarantee enough pages(nr_pages) is
>> >> ready, directly return CHARGE_RETRY is incorret. 
>> >
>> >So you think it is better to oom? Why? What prevents you from a race
>> >that your mem_cgroup_margin returns true but another CPU consumes those
>> >charges right after that. See? The check is pointless. It doesn't
>> 
>> Hmm, if there are a race as you mentioned it can't guarantee enough pages 
>> is ready. 
>
>And there is no point in guaranteeing anything which I tried to tell you
>by the example... The only thing that matters is whether we get the charge
>on the next attempt and if not whether we are able to reclaim something.
>See?
>
>> But it also means that available memory is too low if this
>> race happen. If available charges still less than nr_pages
>> after mem_cgroup_wait_acct_move(which can potentially
>> free some charges) return, the CHAGE_RETRY will trigged,
>> and then mem_cgroup_do_charge=>meory_cgroup_reclaim
>> =>mem_cgroup_wait_acct_move, if available charges still less than
>> nr_pages in this round, CHAGE_RETRY.
>
>> To avoid this infinite retry when available memory 
>
>I do not see a realistic scenario which would cause this to be infinite loop
>withou OOM jumping in.
>We would have to hit the wait for move after each reclaim and the move would
>have to keep the the usage constant (move is really fast without moving
>charges).
>So what you are trying to address (if I understand it at all) is to fix
>an almost impossible to trigger issue with a bogus change which doesn't
>help at all because it is racy as well.

OK. Thank you Michal! :-)

Thanks & Best Regards,
Wanpeng Li

>
>> in this memcg is very low, go to OOM if mem_cgroup_margin(mem_over_limit) 
>> < nr_pages is a better way I think. Because the codes have already try
>> its best to reclaim some pages. :-)
>
>
>> 
>[...]
>-- 
>Michal Hocko
>SUSE Labs
>SUSE LINUX s.r.o.
>Lihovarska 1060/12
>190 00 Praha 9
>Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Michal Hocko
On Thu 12-07-12 19:51:25, Wanpeng Li wrote:
> On Thu, Jul 12, 2012 at 01:08:38PM +0200, Michal Hocko wrote:
> >On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
> >> From: Wanpeng Li 
> >> 
> >> Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
> >> there are two break points in mem_cgroup_reclaim:
> >> if (total && (flag & MEM_CGROUP_RECLAIM_SHIRINK))
> >>break;
> >> if (mem_cgroup_margin(memcg))
> >>break;
> >> so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
> >> which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
> >> (mem_over_limit) >= nr_pages is not true, the process will go to
> >> mem_cgroup_wait_acct_move to wait doubly charge counted caused by
> >> task move. 
> >
> >I am sorry but I have no idea what you are trying to say. The
> >mem_cgroup_wait_acct_move just makes sure that we are waiting until
> >charge is moved (which can potentially free some charges) rather than
> >OOM which should be the last resort so it makes sense to retry them
> >charge.
> >
> >> But this time still can't guarantee enough pages(nr_pages) is
> >> ready, directly return CHARGE_RETRY is incorret. 
> >
> >So you think it is better to oom? Why? What prevents you from a race
> >that your mem_cgroup_margin returns true but another CPU consumes those
> >charges right after that. See? The check is pointless. It doesn't
> 
> Hmm, if there are a race as you mentioned it can't guarantee enough pages 
> is ready. 

And there is no point in guaranteeing anything which I tried to tell you
by the example... The only thing that matters is whether we get the charge
on the next attempt and if not whether we are able to reclaim something.
See?

> But it also means that available memory is too low if this
> race happen. If available charges still less than nr_pages
> after mem_cgroup_wait_acct_move(which can potentially
> free some charges) return, the CHAGE_RETRY will trigged,
> and then mem_cgroup_do_charge=>meory_cgroup_reclaim
> =>mem_cgroup_wait_acct_move, if available charges still less than
> nr_pages in this round, CHAGE_RETRY.

> To avoid this infinite retry when available memory 

I do not see a realistic scenario which would cause this to be infinite loop
withou OOM jumping in.
We would have to hit the wait for move after each reclaim and the move would
have to keep the the usage constant (move is really fast without moving
charges).
So what you are trying to address (if I understand it at all) is to fix
an almost impossible to trigger issue with a bogus change which doesn't
help at all because it is racy as well.

> in this memcg is very low, go to OOM if mem_cgroup_margin(mem_over_limit) 
> < nr_pages is a better way I think. Because the codes have already try
> its best to reclaim some pages. :-)


> 
[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Wanpeng Li
On Thu, Jul 12, 2012 at 01:08:38PM +0200, Michal Hocko wrote:
>On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
>> From: Wanpeng Li 
>> 
>> Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
>> there are two break points in mem_cgroup_reclaim:
>> if (total && (flag & MEM_CGROUP_RECLAIM_SHIRINK))
>>  break;
>> if (mem_cgroup_margin(memcg))
>>  break;
>> so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
>> which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
>> (mem_over_limit) >= nr_pages is not true, the process will go to
>> mem_cgroup_wait_acct_move to wait doubly charge counted caused by
>> task move. 
>
>I am sorry but I have no idea what you are trying to say. The
>mem_cgroup_wait_acct_move just makes sure that we are waiting until
>charge is moved (which can potentially free some charges) rather than
>OOM which should be the last resort so it makes sense to retry them
>charge.
>
>> But this time still can't guarantee enough pages(nr_pages) is
>> ready, directly return CHARGE_RETRY is incorret. 
>
>So you think it is better to oom? Why? What prevents you from a race
>that your mem_cgroup_margin returns true but another CPU consumes those
>charges right after that. See? The check is pointless. It doesn't

Hmm, if there are a race as you mentioned it can't guarantee enough pages 
is ready. But it also means that available memory is too low if this 
race happen. If available charges still less than nr_pages after 
mem_cgroup_wait_acct_move(which can potentially free some charges) return,
the CHAGE_RETRY will trigged, and then 
mem_cgroup_do_charge=>meory_cgroup_reclaim
=>mem_cgroup_wait_acct_move, if available charges still less than nr_pages in 
this round, CHAGE_RETRY. To avoid this infinite retry when available memory 
in this memcg is very low , go to OOM if mem_cgroup_margin(mem_over_limit) 
< nr_pages is a better way I think. Because the codes have already try
its best to reclaim some pages. :-)

>guarantee anything.
>
>> We should add a check to confirm enough pages is ready, otherwise go
>> to oom.
>
>What you are trying to fix? How have you tested it? Why do you think it
>makes any sense at all?
>You are just changing the behavior without any rationale. And that's a
>_no_go_!
>
>> 
>> Signed-off-by: Wanpeng Li 
>> ---
>>  mm/memcontrol.c |3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>> 
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index f72b5e5..4ae3848 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2210,7 +2210,8 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
>> *memcg, gfp_t gfp_mask,
>>   * At task move, charge accounts can be doubly counted. So, it's
>>   * better to wait until the end of task_move if something is going on.
>>   */
>> -if (mem_cgroup_wait_acct_move(mem_over_limit))
>> +if (mem_cgroup_wait_acct_move(mem_over_limit)
>> +&& mem_cgroup_margin(mem_over_limit) >= nr_pages)
>>  return CHARGE_RETRY;
>>  
>>  /* If we don't need to call oom-killer at el, return immediately */
>> -- 
>> 1.7.5.4
>> 
>
>-- 
>Michal Hocko
>SUSE Labs
>SUSE LINUX s.r.o.
>Lihovarska 1060/12
>190 00 Praha 9
>Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Michal Hocko
On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
> there are two break points in mem_cgroup_reclaim:
> if (total && (flag & MEM_CGROUP_RECLAIM_SHIRINK))
>   break;
> if (mem_cgroup_margin(memcg))
>   break;
> so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
> which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
> (mem_over_limit) >= nr_pages is not true, the process will go to
> mem_cgroup_wait_acct_move to wait doubly charge counted caused by
> task move. 

I am sorry but I have no idea what you are trying to say. The
mem_cgroup_wait_acct_move just makes sure that we are waiting until
charge is moved (which can potentially free some charges) rather than
OOM which should be the last resort so it makes sense to retry the
charge.

> But this time still can't guarantee enough pages(nr_pages) is
> ready, directly return CHARGE_RETRY is incorret. 

So you think it is better to oom? Why? What prevents you from a race
that your mem_cgroup_margin returns true but another CPU consumes those
charges right after that. See? The check is pointless. It doesn't
guarantee anything.

> We should add a check to confirm enough pages is ready, otherwise go
> to oom.

What you are trying to fix? How have you tested it? Why do you think it
makes any sense at all?
You are just changing the behavior without any rationale. And that's a
_no_go_!

> 
> Signed-off-by: Wanpeng Li 
> ---
>  mm/memcontrol.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f72b5e5..4ae3848 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2210,7 +2210,8 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>* At task move, charge accounts can be doubly counted. So, it's
>* better to wait until the end of task_move if something is going on.
>*/
> - if (mem_cgroup_wait_acct_move(mem_over_limit))
> + if (mem_cgroup_wait_acct_move(mem_over_limit)
> + && mem_cgroup_margin(mem_over_limit) >= nr_pages)
>   return CHARGE_RETRY;
>  
>   /* If we don't need to call oom-killer at el, return immediately */
> -- 
> 1.7.5.4
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Michal Hocko
On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
 From: Wanpeng Li l...@linux.vnet.ibm.com
 
 Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
 there are two break points in mem_cgroup_reclaim:
 if (total  (flag  MEM_CGROUP_RECLAIM_SHIRINK))
   break;
 if (mem_cgroup_margin(memcg))
   break;
 so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
 which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
 (mem_over_limit) = nr_pages is not true, the process will go to
 mem_cgroup_wait_acct_move to wait doubly charge counted caused by
 task move. 

I am sorry but I have no idea what you are trying to say. The
mem_cgroup_wait_acct_move just makes sure that we are waiting until
charge is moved (which can potentially free some charges) rather than
OOM which should be the last resort so it makes sense to retry the
charge.

 But this time still can't guarantee enough pages(nr_pages) is
 ready, directly return CHARGE_RETRY is incorret. 

So you think it is better to oom? Why? What prevents you from a race
that your mem_cgroup_margin returns true but another CPU consumes those
charges right after that. See? The check is pointless. It doesn't
guarantee anything.

 We should add a check to confirm enough pages is ready, otherwise go
 to oom.

What you are trying to fix? How have you tested it? Why do you think it
makes any sense at all?
You are just changing the behavior without any rationale. And that's a
_no_go_!

 
 Signed-off-by: Wanpeng Li liwp.li...@gmail.com
 ---
  mm/memcontrol.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index f72b5e5..4ae3848 100644
 --- a/mm/memcontrol.c
 +++ b/mm/memcontrol.c
 @@ -2210,7 +2210,8 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
 *memcg, gfp_t gfp_mask,
* At task move, charge accounts can be doubly counted. So, it's
* better to wait until the end of task_move if something is going on.
*/
 - if (mem_cgroup_wait_acct_move(mem_over_limit))
 + if (mem_cgroup_wait_acct_move(mem_over_limit)
 +  mem_cgroup_margin(mem_over_limit) = nr_pages)
   return CHARGE_RETRY;
  
   /* If we don't need to call oom-killer at el, return immediately */
 -- 
 1.7.5.4
 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Wanpeng Li
On Thu, Jul 12, 2012 at 01:08:38PM +0200, Michal Hocko wrote:
On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
 From: Wanpeng Li l...@linux.vnet.ibm.com
 
 Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
 there are two break points in mem_cgroup_reclaim:
 if (total  (flag  MEM_CGROUP_RECLAIM_SHIRINK))
  break;
 if (mem_cgroup_margin(memcg))
  break;
 so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
 which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
 (mem_over_limit) = nr_pages is not true, the process will go to
 mem_cgroup_wait_acct_move to wait doubly charge counted caused by
 task move. 

I am sorry but I have no idea what you are trying to say. The
mem_cgroup_wait_acct_move just makes sure that we are waiting until
charge is moved (which can potentially free some charges) rather than
OOM which should be the last resort so it makes sense to retry them
charge.

 But this time still can't guarantee enough pages(nr_pages) is
 ready, directly return CHARGE_RETRY is incorret. 

So you think it is better to oom? Why? What prevents you from a race
that your mem_cgroup_margin returns true but another CPU consumes those
charges right after that. See? The check is pointless. It doesn't

Hmm, if there are a race as you mentioned it can't guarantee enough pages 
is ready. But it also means that available memory is too low if this 
race happen. If available charges still less than nr_pages after 
mem_cgroup_wait_acct_move(which can potentially free some charges) return,
the CHAGE_RETRY will trigged, and then 
mem_cgroup_do_charge=meory_cgroup_reclaim
=mem_cgroup_wait_acct_move, if available charges still less than nr_pages in 
this round, CHAGE_RETRY. To avoid this infinite retry when available memory 
in this memcg is very low , go to OOM if mem_cgroup_margin(mem_over_limit) 
 nr_pages is a better way I think. Because the codes have already try
its best to reclaim some pages. :-)

guarantee anything.

 We should add a check to confirm enough pages is ready, otherwise go
 to oom.

What you are trying to fix? How have you tested it? Why do you think it
makes any sense at all?
You are just changing the behavior without any rationale. And that's a
_no_go_!

 
 Signed-off-by: Wanpeng Li liwp.li...@gmail.com
 ---
  mm/memcontrol.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index f72b5e5..4ae3848 100644
 --- a/mm/memcontrol.c
 +++ b/mm/memcontrol.c
 @@ -2210,7 +2210,8 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
 *memcg, gfp_t gfp_mask,
   * At task move, charge accounts can be doubly counted. So, it's
   * better to wait until the end of task_move if something is going on.
   */
 -if (mem_cgroup_wait_acct_move(mem_over_limit))
 +if (mem_cgroup_wait_acct_move(mem_over_limit)
 + mem_cgroup_margin(mem_over_limit) = nr_pages)
  return CHARGE_RETRY;
  
  /* If we don't need to call oom-killer at el, return immediately */
 -- 
 1.7.5.4
 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Michal Hocko
On Thu 12-07-12 19:51:25, Wanpeng Li wrote:
 On Thu, Jul 12, 2012 at 01:08:38PM +0200, Michal Hocko wrote:
 On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
  From: Wanpeng Li l...@linux.vnet.ibm.com
  
  Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
  there are two break points in mem_cgroup_reclaim:
  if (total  (flag  MEM_CGROUP_RECLAIM_SHIRINK))
 break;
  if (mem_cgroup_margin(memcg))
 break;
  so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
  which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
  (mem_over_limit) = nr_pages is not true, the process will go to
  mem_cgroup_wait_acct_move to wait doubly charge counted caused by
  task move. 
 
 I am sorry but I have no idea what you are trying to say. The
 mem_cgroup_wait_acct_move just makes sure that we are waiting until
 charge is moved (which can potentially free some charges) rather than
 OOM which should be the last resort so it makes sense to retry them
 charge.
 
  But this time still can't guarantee enough pages(nr_pages) is
  ready, directly return CHARGE_RETRY is incorret. 
 
 So you think it is better to oom? Why? What prevents you from a race
 that your mem_cgroup_margin returns true but another CPU consumes those
 charges right after that. See? The check is pointless. It doesn't
 
 Hmm, if there are a race as you mentioned it can't guarantee enough pages 
 is ready. 

And there is no point in guaranteeing anything which I tried to tell you
by the example... The only thing that matters is whether we get the charge
on the next attempt and if not whether we are able to reclaim something.
See?

 But it also means that available memory is too low if this
 race happen. If available charges still less than nr_pages
 after mem_cgroup_wait_acct_move(which can potentially
 free some charges) return, the CHAGE_RETRY will trigged,
 and then mem_cgroup_do_charge=meory_cgroup_reclaim
 =mem_cgroup_wait_acct_move, if available charges still less than
 nr_pages in this round, CHAGE_RETRY.

 To avoid this infinite retry when available memory 

I do not see a realistic scenario which would cause this to be infinite loop
withou OOM jumping in.
We would have to hit the wait for move after each reclaim and the move would
have to keep the the usage constant (move is really fast without moving
charges).
So what you are trying to address (if I understand it at all) is to fix
an almost impossible to trigger issue with a bogus change which doesn't
help at all because it is racy as well.

 in this memcg is very low, go to OOM if mem_cgroup_margin(mem_over_limit) 
  nr_pages is a better way I think. Because the codes have already try
 its best to reclaim some pages. :-)


 
[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] mm/memcg: recalculate chargeable space after waiting migrating charges

2012-07-12 Thread Wanpeng Li
On Thu, Jul 12, 2012 at 02:29:38PM +0200, Michal Hocko wrote:
On Thu 12-07-12 19:51:25, Wanpeng Li wrote:
 On Thu, Jul 12, 2012 at 01:08:38PM +0200, Michal Hocko wrote:
 On Thu 12-07-12 18:39:21, Wanpeng Li wrote:
  From: Wanpeng Li l...@linux.vnet.ibm.com
  
  Function mem_cgroup_do_charge will call mem_cgroup_reclaim,
  there are two break points in mem_cgroup_reclaim:
  if (total  (flag  MEM_CGROUP_RECLAIM_SHIRINK))
break;
  if (mem_cgroup_margin(memcg))
break;
  so mem_cgroup_reclaim can't guarantee reclaim enough pages(nr_pages) 
  which is requested from mem_cgroup_do_charge, if mem_cgroup_margin
  (mem_over_limit) = nr_pages is not true, the process will go to
  mem_cgroup_wait_acct_move to wait doubly charge counted caused by
  task move. 
 
 I am sorry but I have no idea what you are trying to say. The
 mem_cgroup_wait_acct_move just makes sure that we are waiting until
 charge is moved (which can potentially free some charges) rather than
 OOM which should be the last resort so it makes sense to retry them
 charge.
 
  But this time still can't guarantee enough pages(nr_pages) is
  ready, directly return CHARGE_RETRY is incorret. 
 
 So you think it is better to oom? Why? What prevents you from a race
 that your mem_cgroup_margin returns true but another CPU consumes those
 charges right after that. See? The check is pointless. It doesn't
 
 Hmm, if there are a race as you mentioned it can't guarantee enough pages 
 is ready. 

And there is no point in guaranteeing anything which I tried to tell you
by the example... The only thing that matters is whether we get the charge
on the next attempt and if not whether we are able to reclaim something.
See?

 But it also means that available memory is too low if this
 race happen. If available charges still less than nr_pages
 after mem_cgroup_wait_acct_move(which can potentially
 free some charges) return, the CHAGE_RETRY will trigged,
 and then mem_cgroup_do_charge=meory_cgroup_reclaim
 =mem_cgroup_wait_acct_move, if available charges still less than
 nr_pages in this round, CHAGE_RETRY.

 To avoid this infinite retry when available memory 

I do not see a realistic scenario which would cause this to be infinite loop
withou OOM jumping in.
We would have to hit the wait for move after each reclaim and the move would
have to keep the the usage constant (move is really fast without moving
charges).
So what you are trying to address (if I understand it at all) is to fix
an almost impossible to trigger issue with a bogus change which doesn't
help at all because it is racy as well.

OK. Thank you Michal! :-)

Thanks  Best Regards,
Wanpeng Li


 in this memcg is very low, go to OOM if mem_cgroup_margin(mem_over_limit) 
  nr_pages is a better way I think. Because the codes have already try
 its best to reclaim some pages. :-)


 
[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/