Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-03 Thread Zhouguanghui (OS Kernel)
在 2021/3/3 15:46, Michal Hocko 写道:
> On Tue 02-03-21 17:56:07, Johannes Weiner wrote:
>> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
>>> On Tue, 2 Mar 2021, Michal Hocko wrote:
 [Cc Johannes for awareness and fixup Nick's email]

 On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> When split page, the memory cgroup info recorded in first page is
> not copied to tail pages. In this case, when the tail pages are
> freed, the uncharge operation is not performed. As a result, the
> usage of this memcg keeps increasing, and the OOM may occur.
>
> So, the copying of first page's memory cgroup info to tail pages
> is needed when split page.

 I was not aware that alloc_pages_exact is used for accounted allocations
 but git grep told me otherwise so this is not a theoretical one. Both
 users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
 used in dma allocator but I got lost in indirection so I have no idea
 whether there are any users there.
>>>
>>> Yes, it's a bit worrying that such a low-level thing as split_page()
>>> can now get caught up in memcg accounting, but I suppose that's okay.
>>>
>>> I feel rather strongly that whichever way it is done, THP splitting
>>> and split_page() should use the same interface to memcg.
>>>
>>> And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
>>> there need to be css_get()s too - or better, a css_get_many().
>>>
>>> Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
>>> it mem_cgroup_split_page_fixup(), and take order from caller.
>>
>> +1
>>
>> There is already a split_page_owner() in both these places as well
>> which does a similar thing. Mabye we can match that by calling it
>> split_page_memcg() and having it take a nr of pages?
> 
> Sounds good to me.
> 
  Hi, Michal, Johannes, Hugh, and Zi Yan, thank you for taking time for 
this.

I agree, and will send v2 patches for taking these.

Thanks


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-03 Thread Michal Hocko
On Tue 02-03-21 17:56:07, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
> > On Tue, 2 Mar 2021, Michal Hocko wrote:
> > > [Cc Johannes for awareness and fixup Nick's email]
> > > 
> > > On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > > > When split page, the memory cgroup info recorded in first page is
> > > > not copied to tail pages. In this case, when the tail pages are
> > > > freed, the uncharge operation is not performed. As a result, the
> > > > usage of this memcg keeps increasing, and the OOM may occur.
> > > > 
> > > > So, the copying of first page's memory cgroup info to tail pages
> > > > is needed when split page.
> > > 
> > > I was not aware that alloc_pages_exact is used for accounted allocations
> > > but git grep told me otherwise so this is not a theoretical one. Both
> > > users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> > > used in dma allocator but I got lost in indirection so I have no idea
> > > whether there are any users there.
> > 
> > Yes, it's a bit worrying that such a low-level thing as split_page()
> > can now get caught up in memcg accounting, but I suppose that's okay.
> > 
> > I feel rather strongly that whichever way it is done, THP splitting
> > and split_page() should use the same interface to memcg.
> > 
> > And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
> > there need to be css_get()s too - or better, a css_get_many().
> > 
> > Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
> > it mem_cgroup_split_page_fixup(), and take order from caller.
> 
> +1
> 
> There is already a split_page_owner() in both these places as well
> which does a similar thing. Mabye we can match that by calling it
> split_page_memcg() and having it take a nr of pages?

Sounds good to me.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-03 Thread Hugh Dickins
On Tue, 2 Mar 2021, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
> > On Tue, 2 Mar 2021, Michal Hocko wrote:
> > > [Cc Johannes for awareness and fixup Nick's email]
> > > 
> > > On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > > > When split page, the memory cgroup info recorded in first page is
> > > > not copied to tail pages. In this case, when the tail pages are
> > > > freed, the uncharge operation is not performed. As a result, the
> > > > usage of this memcg keeps increasing, and the OOM may occur.
> > > > 
> > > > So, the copying of first page's memory cgroup info to tail pages
> > > > is needed when split page.
> > > 
> > > I was not aware that alloc_pages_exact is used for accounted allocations
> > > but git grep told me otherwise so this is not a theoretical one. Both
> > > users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> > > used in dma allocator but I got lost in indirection so I have no idea
> > > whether there are any users there.
> > 
> > Yes, it's a bit worrying that such a low-level thing as split_page()
> > can now get caught up in memcg accounting, but I suppose that's okay.
> > 
> > I feel rather strongly that whichever way it is done, THP splitting
> > and split_page() should use the same interface to memcg.
> > 
> > And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
> > there need to be css_get()s too - or better, a css_get_many().
> > 
> > Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
> > it mem_cgroup_split_page_fixup(), and take order from caller.
> 
> +1
> 
> There is already a split_page_owner() in both these places as well
> which does a similar thing. Mabye we can match that by calling it
> split_page_memcg() and having it take a nr of pages?

Agreed on both counts :) "fixup" was not an inspiring name.

> 
> > Though I've never much liked that separate pass: would it be
> > better page by page, like this copy_page_memcg() does?  Though
> > mem_cgroup_disabled() and css_getting make that less appealing.
> 
> Agreed on both counts. mem_cgroup_disabled() is a jump label and would
> be okay, IMO, but the refcounting - though it is (usually) per-cpu -
> adds at least two branches and rcu read locking.


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-03 Thread Johannes Weiner
On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
> On Tue, 2 Mar 2021, Michal Hocko wrote:
> > [Cc Johannes for awareness and fixup Nick's email]
> > 
> > On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > > When split page, the memory cgroup info recorded in first page is
> > > not copied to tail pages. In this case, when the tail pages are
> > > freed, the uncharge operation is not performed. As a result, the
> > > usage of this memcg keeps increasing, and the OOM may occur.
> > > 
> > > So, the copying of first page's memory cgroup info to tail pages
> > > is needed when split page.
> > 
> > I was not aware that alloc_pages_exact is used for accounted allocations
> > but git grep told me otherwise so this is not a theoretical one. Both
> > users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> > used in dma allocator but I got lost in indirection so I have no idea
> > whether there are any users there.
> 
> Yes, it's a bit worrying that such a low-level thing as split_page()
> can now get caught up in memcg accounting, but I suppose that's okay.
> 
> I feel rather strongly that whichever way it is done, THP splitting
> and split_page() should use the same interface to memcg.
> 
> And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
> there need to be css_get()s too - or better, a css_get_many().
> 
> Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
> it mem_cgroup_split_page_fixup(), and take order from caller.

+1

There is already a split_page_owner() in both these places as well
which does a similar thing. Mabye we can match that by calling it
split_page_memcg() and having it take a nr of pages?

> Though I've never much liked that separate pass: would it be
> better page by page, like this copy_page_memcg() does?  Though
> mem_cgroup_disabled() and css_getting make that less appealing.

Agreed on both counts. mem_cgroup_disabled() is a jump label and would
be okay, IMO, but the refcounting - though it is (usually) per-cpu -
adds at least two branches and rcu read locking.


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-02 Thread Hugh Dickins
On Tue, 2 Mar 2021, Michal Hocko wrote:
> [Cc Johannes for awareness and fixup Nick's email]
> 
> On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > When split page, the memory cgroup info recorded in first page is
> > not copied to tail pages. In this case, when the tail pages are
> > freed, the uncharge operation is not performed. As a result, the
> > usage of this memcg keeps increasing, and the OOM may occur.
> > 
> > So, the copying of first page's memory cgroup info to tail pages
> > is needed when split page.
> 
> I was not aware that alloc_pages_exact is used for accounted allocations
> but git grep told me otherwise so this is not a theoretical one. Both
> users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> used in dma allocator but I got lost in indirection so I have no idea
> whether there are any users there.

Yes, it's a bit worrying that such a low-level thing as split_page()
can now get caught up in memcg accounting, but I suppose that's okay.

I feel rather strongly that whichever way it is done, THP splitting
and split_page() should use the same interface to memcg.

And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
there need to be css_get()s too - or better, a css_get_many().

Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
it mem_cgroup_split_page_fixup(), and take order from caller.

Though I've never much liked that separate pass: would it be
better page by page, like this copy_page_memcg() does?  Though
mem_cgroup_disabled() and css_getting make that less appealing.

Hugh

> 
> The page itself looks reasonable to me.
> 
> > Signed-off-by: Zhou Guanghui 
> 
> Acked-by: Michal Hocko 
> 
> Minor nit
> 
> > ---
> >  include/linux/memcontrol.h | 10 ++
> >  mm/page_alloc.c|  4 +++-
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index e6dc793d587d..c7e2b4421dc1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup 
> > *memcg);
> >  extern bool cgroup_memory_noswap;
> >  #endif
> >  
> > +static inline void copy_page_memcg(struct page *dst, struct page *src)
> > +{
> > +   if (src->memcg_data)
> > +   dst->memcg_data = src->memcg_data;
> 
> I would just drop the test. The struct page is a single cache line which
> is dirty by the reference count so another store will unlikely be
> noticeable even when NULL is stored here and you safe a conditional.
> 
> > +}
> > +
> >  struct mem_cgroup *lock_page_memcg(struct page *page);
> >  void __unlock_page_memcg(struct mem_cgroup *memcg);
> >  void unlock_page_memcg(struct page *page);
> > @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup 
> > *memcg)
> >  {
> >  }
> >  
> > +static inline void copy_page_memcg(struct page *dst, struct page *src)
> > +{
> > +}
> > +
> >  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
> >  {
> > return NULL;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int 
> > order)
> > VM_BUG_ON_PAGE(PageCompound(page), page);
> > VM_BUG_ON_PAGE(!page_count(page), page);
> >  
> > -   for (i = 1; i < (1 << order); i++)
> > +   for (i = 1; i < (1 << order); i++) {
> > set_page_refcounted(page + i);
> > +   copy_page_memcg(page + i, page);
> > +   }
> > split_page_owner(page, 1 << order);
> >  }
> >  EXPORT_SYMBOL_GPL(split_page);
> > -- 
> > 2.25.0
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-02 Thread Michal Hocko
On Tue 02-03-21 10:17:03, Michal Hocko wrote:
> [Cc Johannes for awareness and fixup Nick's email]
> 
> On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > When split page, the memory cgroup info recorded in first page is
> > not copied to tail pages. In this case, when the tail pages are
> > freed, the uncharge operation is not performed. As a result, the
> > usage of this memcg keeps increasing, and the OOM may occur.
> > 
> > So, the copying of first page's memory cgroup info to tail pages
> > is needed when split page.
> 
> I was not aware that alloc_pages_exact is used for accounted allocations
> but git grep told me otherwise so this is not a theoretical one. Both
> users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> used in dma allocator but I got lost in indirection so I have no idea
> whether there are any users there.
> 
> The page itself looks reasonable to me.
> 
> > Signed-off-by: Zhou Guanghui 
> 
> Acked-by: Michal Hocko 
> 
> Minor nit
> 
> > ---
> >  include/linux/memcontrol.h | 10 ++
> >  mm/page_alloc.c|  4 +++-
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index e6dc793d587d..c7e2b4421dc1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup 
> > *memcg);
> >  extern bool cgroup_memory_noswap;
> >  #endif
> >  
> > +static inline void copy_page_memcg(struct page *dst, struct page *src)
> > +{
> > +   if (src->memcg_data)
> > +   dst->memcg_data = src->memcg_data;
> 
> I would just drop the test. The struct page is a single cache line which
> is dirty by the reference count so another store will unlikely be
> noticeable even when NULL is stored here and you safe a conditional.

Disregard this. As Zi Yan mentioned in other reply, we need to keep the
check and take a css reference along with transfering the memcg.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-02 Thread Michal Hocko
On Tue 02-03-21 10:37:13, Zi Yan wrote:
[...]
> I have a question on copy_page_memcg above. By reading 
> __memcg_kmem_charge_page
> and __memcg_kmem_uncharge_page, it seems to me that every single page requires
> a css_get(>css) at charge time and a css_put(>css) at uncharge 
> time.
> But your copy_page_memcg does not do css_get for split subpages. Will it cause
> memcg->css underflow when subpages are uncharged?

yes, well spotted. I have completely missed that. This will also discard
my comment on testing the memcg.


-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-02 Thread Zi Yan
On 2 Mar 2021, at 2:05, Zhouguanghui (OS Kernel) wrote:

> 在 2021/3/2 10:00, Zi Yan 写道:
>> On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:
>>
>>> When split page, the memory cgroup info recorded in first page is
>>> not copied to tail pages. In this case, when the tail pages are
>>> freed, the uncharge operation is not performed. As a result, the
>>> usage of this memcg keeps increasing, and the OOM may occur.
>>>
>>> So, the copying of first page's memory cgroup info to tail pages
>>> is needed when split page.
>>>
>>> Signed-off-by: Zhou Guanghui 
>>> ---
>>>   include/linux/memcontrol.h | 10 ++
>>>   mm/page_alloc.c|  4 +++-
>>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>> index e6dc793d587d..c7e2b4421dc1 100644
>>> --- a/include/linux/memcontrol.h
>>> +++ b/include/linux/memcontrol.h
>>> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup 
>>> *memcg);
>>>   extern bool cgroup_memory_noswap;
>>>   #endif
>>>
>>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>>> +{
>>> +   if (src->memcg_data)
>>> +   dst->memcg_data = src->memcg_data;
>>> +}
>>> +
>>>   struct mem_cgroup *lock_page_memcg(struct page *page);
>>>   void __unlock_page_memcg(struct mem_cgroup *memcg);
>>>   void unlock_page_memcg(struct page *page);
>>> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup 
>>> *memcg)
>>>   {
>>>   }
>>>
>>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>>> +{
>>> +}
>>> +
>>>   static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>>>   {
>>> return NULL;
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int 
>>> order)
>>> VM_BUG_ON_PAGE(PageCompound(page), page);
>>> VM_BUG_ON_PAGE(!page_count(page), page);
>>>
>>> -   for (i = 1; i < (1 << order); i++)
>>> +   for (i = 1; i < (1 << order); i++) {
>>> set_page_refcounted(page + i);
>>> +   copy_page_memcg(page + i, page);
>>> +   }
>>> split_page_owner(page, 1 << order);
>>>   }
>>>   EXPORT_SYMBOL_GPL(split_page);
>>> -- 
>>> 2.25.0
>>
>> +memcg maintainers
>>
>> split_page() is used for non-compound higher-order pages. I am not sure
>> if there is any such pages monitored by memcg. Please let me know
>> if I miss anything.
>
> Thank you for taking time for this.
>
> This should be put in kmemcg, and I'll modify it.
>
> When the kmemcg is enabled and _GFP_ACCOUNT is set, the charged and
> uncharged sizes do not match when alloc/free_pages_exact method is used
> to apply for or free memory with exact size. This is because memcg data
> of the tail page is not set during the split page.

Thanks for your clarification. I missed kmemcg.

I have a question on copy_page_memcg above. By reading __memcg_kmem_charge_page
and __memcg_kmem_uncharge_page, it seems to me that every single page requires
a css_get(>css) at charge time and a css_put(>css) at uncharge 
time.
But your copy_page_memcg does not do css_get for split subpages. Will it cause
memcg->css underflow when subpages are uncharged?


—
Best Regards,
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-02 Thread Michal Hocko
[Cc Johannes for awareness and fixup Nick's email]

On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> When split page, the memory cgroup info recorded in first page is
> not copied to tail pages. In this case, when the tail pages are
> freed, the uncharge operation is not performed. As a result, the
> usage of this memcg keeps increasing, and the OOM may occur.
> 
> So, the copying of first page's memory cgroup info to tail pages
> is needed when split page.

I was not aware that alloc_pages_exact is used for accounted allocations
but git grep told me otherwise so this is not a theoretical one. Both
users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
used in dma allocator but I got lost in indirection so I have no idea
whether there are any users there.

The page itself looks reasonable to me.

> Signed-off-by: Zhou Guanghui 

Acked-by: Michal Hocko 

Minor nit

> ---
>  include/linux/memcontrol.h | 10 ++
>  mm/page_alloc.c|  4 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..c7e2b4421dc1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup 
> *memcg);
>  extern bool cgroup_memory_noswap;
>  #endif
>  
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> + if (src->memcg_data)
> + dst->memcg_data = src->memcg_data;

I would just drop the test. The struct page is a single cache line which
is dirty by the reference count so another store will unlikely be
noticeable even when NULL is stored here and you safe a conditional.

> +}
> +
>  struct mem_cgroup *lock_page_memcg(struct page *page);
>  void __unlock_page_memcg(struct mem_cgroup *memcg);
>  void unlock_page_memcg(struct page *page);
> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>  {
>  }
>  
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +}
> +
>  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>  {
>   return NULL;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>   VM_BUG_ON_PAGE(PageCompound(page), page);
>   VM_BUG_ON_PAGE(!page_count(page), page);
>  
> - for (i = 1; i < (1 << order); i++)
> + for (i = 1; i < (1 << order); i++) {
>   set_page_refcounted(page + i);
> + copy_page_memcg(page + i, page);
> + }
>   split_page_owner(page, 1 << order);
>  }
>  EXPORT_SYMBOL_GPL(split_page);
> -- 
> 2.25.0
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-02 Thread Zhouguanghui (OS Kernel)
在 2021/3/2 10:00, Zi Yan 写道:
> On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:
> 
>> When split page, the memory cgroup info recorded in first page is
>> not copied to tail pages. In this case, when the tail pages are
>> freed, the uncharge operation is not performed. As a result, the
>> usage of this memcg keeps increasing, and the OOM may occur.
>>
>> So, the copying of first page's memory cgroup info to tail pages
>> is needed when split page.
>>
>> Signed-off-by: Zhou Guanghui 
>> ---
>>   include/linux/memcontrol.h | 10 ++
>>   mm/page_alloc.c|  4 +++-
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index e6dc793d587d..c7e2b4421dc1 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup 
>> *memcg);
>>   extern bool cgroup_memory_noswap;
>>   #endif
>>
>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>> +{
>> +if (src->memcg_data)
>> +dst->memcg_data = src->memcg_data;
>> +}
>> +
>>   struct mem_cgroup *lock_page_memcg(struct page *page);
>>   void __unlock_page_memcg(struct mem_cgroup *memcg);
>>   void unlock_page_memcg(struct page *page);
>> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>>   {
>>   }
>>
>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>> +{
>> +}
>> +
>>   static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>>   {
>>  return NULL;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>>  VM_BUG_ON_PAGE(PageCompound(page), page);
>>  VM_BUG_ON_PAGE(!page_count(page), page);
>>
>> -for (i = 1; i < (1 << order); i++)
>> +for (i = 1; i < (1 << order); i++) {
>>  set_page_refcounted(page + i);
>> +copy_page_memcg(page + i, page);
>> +}
>>  split_page_owner(page, 1 << order);
>>   }
>>   EXPORT_SYMBOL_GPL(split_page);
>> -- 
>> 2.25.0
> 
> +memcg maintainers
> 
> split_page() is used for non-compound higher-order pages. I am not sure
> if there is any such pages monitored by memcg. Please let me know
> if I miss anything.

Thank you for taking time for this.

This should be put in kmemcg, and I'll modify it.

When the kmemcg is enabled and _GFP_ACCOUNT is set, the charged and 
uncharged sizes do not match when alloc/free_pages_exact method is used 
to apply for or free memory with exact size. This is because memcg data 
of the tail page is not set during the split page.



Re: [PATCH] mm/memcg: set memcg when split pages

2021-03-01 Thread Zi Yan
On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:

> When split page, the memory cgroup info recorded in first page is
> not copied to tail pages. In this case, when the tail pages are
> freed, the uncharge operation is not performed. As a result, the
> usage of this memcg keeps increasing, and the OOM may occur.
>
> So, the copying of first page's memory cgroup info to tail pages
> is needed when split page.
>
> Signed-off-by: Zhou Guanghui 
> ---
>  include/linux/memcontrol.h | 10 ++
>  mm/page_alloc.c|  4 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..c7e2b4421dc1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup 
> *memcg);
>  extern bool cgroup_memory_noswap;
>  #endif
>
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> + if (src->memcg_data)
> + dst->memcg_data = src->memcg_data;
> +}
> +
>  struct mem_cgroup *lock_page_memcg(struct page *page);
>  void __unlock_page_memcg(struct mem_cgroup *memcg);
>  void unlock_page_memcg(struct page *page);
> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>  {
>  }
>
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +}
> +
>  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>  {
>   return NULL;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>   VM_BUG_ON_PAGE(PageCompound(page), page);
>   VM_BUG_ON_PAGE(!page_count(page), page);
>
> - for (i = 1; i < (1 << order); i++)
> + for (i = 1; i < (1 << order); i++) {
>   set_page_refcounted(page + i);
> + copy_page_memcg(page + i, page);
> + }
>   split_page_owner(page, 1 << order);
>  }
>  EXPORT_SYMBOL_GPL(split_page);
> -- 
> 2.25.0

+memcg maintainers

split_page() is used for non-compound higher-order pages. I am not sure
if there is any such pages monitored by memcg. Please let me know
if I miss anything.

—
Best Regards,
Yan Zi


signature.asc
Description: OpenPGP digital signature