Re: [Devel] [PATCH rh7 v2] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Denis V. Lunev
On 7/30/20 6:57 PM, Evgenii Shatokhin wrote:
> On 30.07.2020 17:00, Denis V. Lunev wrote:
>> On 7/30/20 4:58 PM, Andrey Ryabinin wrote:
>>> Exceeding cache above cache.limit_in_bytes schedules high_work_func()
>>> which tries to reclaim 32 pages. If cache generated fast enough or
>>> it allows
>>> cgroup to steadily grow above cache.limit_in_bytes because we don't
>>> reclaim
>>> enough. Try to reclaim exceeded amount of cache instead.
>>>
>>> https://jira.sw.ru/browse/PSBM-106384
>>> Signed-off-by: Andrey Ryabinin 
>>> ---
>>>
>>> Changes since v1: add bug link to changelog
>>>
>>>   mm/memcontrol.c | 10 +++---
>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 3cf200f506c3..e5adb0e81cbb 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup
>>> *memcg,
>>>   {
>>>     do {
>>> +    unsigned long cache_overused;
>>> +
>>>   if (page_counter_read(>memory) > memcg->high)
>>>   try_to_free_mem_cgroup_pages(memcg, nr_pages,
>>> gfp_mask, 0);
>>>   -    if (page_counter_read(>cache) > memcg->cache.limit)
>>> -    try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
>>> -    MEM_CGROUP_RECLAIM_NOSWAP);
>>> +    cache_overused = page_counter_read(>cache) -
>>> +    memcg->cache.limit;
>>> +    if (cache_overused)
>>> +    try_to_free_mem_cgroup_pages(memcg, cache_overused,
>>> +    gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
>>>     } while ((memcg = parent_mem_cgroup(memcg)));
>>>   }
>> can we run some testing and after that create custom RK to make check
>> with
>> HostEurope on Monday?
>
> 1. Which kernel version(s)?

151.14

>
> 2. Would it be enough to prepare the live patch as .ko file and load
> it manually ('kpatch load') or RPM package is preferable?
>

sure

> If you plan to use the fix on more than one node, I think, an RPM
> package is easier to use. For a single node, *.ko file would be enough.
>
one node - for test purposes

Den
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7 v4] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Andrey Ryabinin


On 7/30/20 6:52 PM, Evgenii Shatokhin wrote:
> Hi,
> 
> On 30.07.2020 18:02, Andrey Ryabinin wrote:
>> Exceeding cache above cache.limit_in_bytes schedules high_work_func()
>> which tries to reclaim 32 pages. If cache generated fast enough or it allows
>> cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
>> enough. Try to reclaim exceeded amount of cache instead.
>>
>> https://jira.sw.ru/browse/PSBM-106384
>> Signed-off-by: Andrey Ryabinin 
>> ---
>>
>>   - Changes since v1: add bug link to changelog
>>   - Changes since v2: Fix cache_overused check (We should check if it's 
>> positive).
>>  Made this stupid bug during cleanup, patch was tested without bogus 
>> cleanup,
>>  so it shoud work.
>>   - Chnages since v3: Compilation fixes, properly tested now.
>>
>>   mm/memcontrol.c | 10 +++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 3cf200f506c3..16cbd451a588 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
>>   {
>>     do {
>> +    long cache_overused;
>> +
>>   if (page_counter_read(>memory) > memcg->high)
>>   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 0);
>>   -    if (page_counter_read(>cache) > memcg->cache.limit)
>> -    try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
>> -    MEM_CGROUP_RECLAIM_NOSWAP);
>> +    cache_overused = page_counter_read(>cache) -
>> +    memcg->cache.limit;
> 
> If cache_overused is less than 32 pages, the kernel would try to reclaim less 
> than before the patch. It it OK, or should it try to reclaim at least 32 
> pages?

It's ok, try_to_free_mem_cgroup_pages will increase it:

unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
   unsigned long nr_pages,
   gfp_t gfp_mask,
   int flags)


.nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX),


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7 v2] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Evgenii Shatokhin

On 30.07.2020 17:00, Denis V. Lunev wrote:

On 7/30/20 4:58 PM, Andrey Ryabinin wrote:

Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.

https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin 
---

Changes since v1: add bug link to changelog

  mm/memcontrol.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3cf200f506c3..e5adb0e81cbb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
  {
  
  	do {

+   unsigned long cache_overused;
+
if (page_counter_read(>memory) > memcg->high)
try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 
0);
  
-		if (page_counter_read(>cache) > memcg->cache.limit)

-   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
-   MEM_CGROUP_RECLAIM_NOSWAP);
+   cache_overused = page_counter_read(>cache) -
+   memcg->cache.limit;
+   if (cache_overused)
+   try_to_free_mem_cgroup_pages(memcg, cache_overused,
+   gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
  
  	} while ((memcg = parent_mem_cgroup(memcg)));

  }

can we run some testing and after that create custom RK to make check with
HostEurope on Monday?


1. Which kernel version(s)?

2. Would it be enough to prepare the live patch as .ko file and load it 
manually ('kpatch load') or RPM package is preferable?


If you plan to use the fix on more than one node, I think, an RPM 
package is easier to use. For a single node, *.ko file would be enough.



.



___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7 v4] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Evgenii Shatokhin

Hi,

On 30.07.2020 18:02, Andrey Ryabinin wrote:

Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.

https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin 
---

  - Changes since v1: add bug link to changelog
  - Changes since v2: Fix cache_overused check (We should check if it's 
positive).
 Made this stupid bug during cleanup, patch was tested without bogus 
cleanup,
 so it shoud work.
  - Chnages since v3: Compilation fixes, properly tested now.

  mm/memcontrol.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3cf200f506c3..16cbd451a588 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
  {
  
  	do {

+   long cache_overused;
+
if (page_counter_read(>memory) > memcg->high)
try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 
0);
  
-		if (page_counter_read(>cache) > memcg->cache.limit)

-   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
-   MEM_CGROUP_RECLAIM_NOSWAP);
+   cache_overused = page_counter_read(>cache) -
+   memcg->cache.limit;


If cache_overused is less than 32 pages, the kernel would try to reclaim 
less than before the patch. It it OK, or should it try to reclaim at 
least 32 pages?



+   if (cache_overused > 0)
+   try_to_free_mem_cgroup_pages(memcg, cache_overused,
+   gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
  
  	} while ((memcg = parent_mem_cgroup(memcg)));

  }



___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 v4] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Andrey Ryabinin
Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.

https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin 
---

 - Changes since v1: add bug link to changelog
 - Changes since v2: Fix cache_overused check (We should check if it's 
positive).
Made this stupid bug during cleanup, patch was tested without bogus cleanup,
so it shoud work.
 - Chnages since v3: Compilation fixes, properly tested now.

 mm/memcontrol.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3cf200f506c3..16cbd451a588 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
 {
 
do {
+   long cache_overused;
+
if (page_counter_read(>memory) > memcg->high)
try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 
0);
 
-   if (page_counter_read(>cache) > memcg->cache.limit)
-   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
-   MEM_CGROUP_RECLAIM_NOSWAP);
+   cache_overused = page_counter_read(>cache) -
+   memcg->cache.limit;
+   if (cache_overused > 0)
+   try_to_free_mem_cgroup_pages(memcg, cache_overused,
+   gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
 
} while ((memcg = parent_mem_cgroup(memcg)));
 }
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 v3] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Andrey Ryabinin
Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.

https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin 
---

 Changes since v1: add bug link to changelog
 Changes since v2: Fix cache_overused check (We should check if it's positive).
Made this stupid bug during cleanup, patch was tested without bogus cleanup,
so it shoud work.

 mm/memcontrol.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3cf200f506c3..e23e546fd00f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
 {
 
do {
+   long cache_overused;
+
if (page_counter_read(>memory) > memcg->high)
try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 
0);
 
-   if (page_counter_read(>cache) > memcg->cache.limit)
-   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
-   MEM_CGROUP_RECLAIM_NOSWAP);
+   cache_overused = page_counter_read(>cache) -
+   memcg->cache.limit;
+   if (cache_overused > 0)
+   try_to_free_mem_cgroup_pages(memcg, max(CHARGE_BATCH, 
cache_overused,
+   gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
 
} while ((memcg = parent_mem_cgroup(memcg)));
 }
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7 v2] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Denis V. Lunev
On 7/30/20 4:58 PM, Andrey Ryabinin wrote:
> Exceeding cache above cache.limit_in_bytes schedules high_work_func()
> which tries to reclaim 32 pages. If cache generated fast enough or it allows
> cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
> enough. Try to reclaim exceeded amount of cache instead.
>
> https://jira.sw.ru/browse/PSBM-106384
> Signed-off-by: Andrey Ryabinin 
> ---
>
> Changes since v1: add bug link to changelog
>
>  mm/memcontrol.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3cf200f506c3..e5adb0e81cbb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
>  {
>  
>   do {
> + unsigned long cache_overused;
> +
>   if (page_counter_read(>memory) > memcg->high)
>   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 
> 0);
>  
> - if (page_counter_read(>cache) > memcg->cache.limit)
> - try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
> - MEM_CGROUP_RECLAIM_NOSWAP);
> + cache_overused = page_counter_read(>cache) -
> + memcg->cache.limit;
> + if (cache_overused)
> + try_to_free_mem_cgroup_pages(memcg, cache_overused,
> + gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
>  
>   } while ((memcg = parent_mem_cgroup(memcg)));
>  }
can we run some testing and after that create custom RK to make check with
HostEurope on Monday?
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 v2] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Andrey Ryabinin
Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.

https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin 
---

Changes since v1: add bug link to changelog

 mm/memcontrol.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3cf200f506c3..e5adb0e81cbb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
 {
 
do {
+   unsigned long cache_overused;
+
if (page_counter_read(>memory) > memcg->high)
try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 
0);
 
-   if (page_counter_read(>cache) > memcg->cache.limit)
-   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
-   MEM_CGROUP_RECLAIM_NOSWAP);
+   cache_overused = page_counter_read(>cache) -
+   memcg->cache.limit;
+   if (cache_overused)
+   try_to_free_mem_cgroup_pages(memcg, cache_overused,
+   gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
 
} while ((memcg = parent_mem_cgroup(memcg)));
 }
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH] mm/memcg: fix cache growth above cache.limit_in_bytes

2020-07-30 Thread Andrey Ryabinin
Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.

Signed-off-by: Andrey Ryabinin 
---
 mm/memcontrol.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3cf200f506c3..e5adb0e81cbb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3080,12 +3080,16 @@ static void reclaim_high(struct mem_cgroup *memcg,
 {
 
do {
+   unsigned long cache_overused;
+
if (page_counter_read(>memory) > memcg->high)
try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, 
0);
 
-   if (page_counter_read(>cache) > memcg->cache.limit)
-   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask,
-   MEM_CGROUP_RECLAIM_NOSWAP);
+   cache_overused = page_counter_read(>cache) -
+   memcg->cache.limit;
+   if (cache_overused)
+   try_to_free_mem_cgroup_pages(memcg, cache_overused,
+   gfp_mask, MEM_CGROUP_RECLAIM_NOSWAP);
 
} while ((memcg = parent_mem_cgroup(memcg)));
 }
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH RHEL7 v21 11/14] ve/cgroup: set release_agent_path for root cgroups separately for each ve.

2020-07-30 Thread Kirill Tkhai
On 28.07.2020 20:53, Valeriy Vdovin wrote:
> This is done so that each container could set it's own release agent.
> Release agent information is now stored in per-cgroup-root data
> structure in ve.
> 
> https://jira.sw.ru/browse/PSBM-83887
> 
> Signed-off-by: Valeriy Vdovin 
> ---
>  include/linux/cgroup.h |   3 --
>  include/linux/ve.h |   6 +++
>  kernel/cgroup.c| 100 
> -
>  kernel/ve/ve.c |  72 +++
>  4 files changed, 161 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 5f1460d..fc138c0 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -429,9 +429,6 @@ struct cgroupfs_root {
>   /* IDs for cgroups in this hierarchy */
>   struct ida cgroup_ida;
>  
> - /* The path to use for release notifications. */
> - char release_agent_path[PATH_MAX];
> -
>   /* The name for this hierarchy - may be empty */
>   char name[MAX_CGROUP_ROOT_NAMELEN];
>  };
> diff --git a/include/linux/ve.h b/include/linux/ve.h
> index 65413d5..b6662637 100644
> --- a/include/linux/ve.h
> +++ b/include/linux/ve.h
> @@ -214,6 +214,12 @@ void do_update_load_avg_ve(void);
>  
>  void ve_add_to_release_list(struct cgroup *cgrp);
>  void ve_rm_from_release_list(struct cgroup *cgrp);
> +
> +int ve_set_release_agent_path(struct ve_struct *ve, struct cgroup *cgroot,
> + const char *release_agent);
> +
> +const char *ve_get_release_agent_path(struct cgroup *cgrp_root);
> +
>  extern struct ve_struct *get_ve(struct ve_struct *ve);
>  extern void put_ve(struct ve_struct *ve);
>  
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index aa93cf2..1d9c889 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1092,9 +1092,12 @@ static int rebind_subsystems(struct cgroupfs_root 
> *root,
>  
>  static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry)
>  {
> + const char *release_agent;
>   struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
>   struct cgroup_subsys *ss;
> + struct cgroup *root_cgrp = >top_cgroup;
>  
> + mutex_lock(_mutex);
>   mutex_lock(_root_mutex);
>   for_each_subsys(root, ss)
>   seq_printf(seq, ",%s", ss->name);
> @@ -1106,14 +1109,37 @@ static int cgroup_show_options(struct seq_file *seq, 
> struct dentry *dentry)
>   seq_puts(seq, ",xattr");
>   if (root->flags & CGRP_ROOT_CPUSET_V2_MODE)
>   seq_puts(seq, ",cpuset_v2_mode");
> - if (strlen(root->release_agent_path))
> - seq_show_option(seq, "release_agent",
> - root->release_agent_path);
> +#ifdef CONFIG_VE
> + {
> + struct ve_struct *ve = get_exec_env();
> +
> + if (!ve_is_super(ve)) {
> + /*
> +  * ve->init_task is NULL in case when cgroup is accessed
> +  * before ve_start_container has been called.
> +  *
> +  * ve->init_task is synchronized via ve->ve_ns rcu, see
> +  * ve_grab_context/drop_context.
> +  */
> + rcu_read_lock();
> + if (ve->ve_ns)
> + root_cgrp = task_cgroup_from_root(ve->init_task,
> + root);
> + rcu_read_unlock();
> + }
> + }
> +#endif
> + rcu_read_lock();
> + release_agent = ve_get_release_agent_path(root_cgrp);
> + if (release_agent && release_agent[0])
> + seq_show_option(seq, "release_agent", release_agent);
> + rcu_read_unlock();
>   if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, >top_cgroup.flags))
>   seq_puts(seq, ",clone_children");
>   if (strlen(root->name))
>   seq_show_option(seq, "name", root->name);
>   mutex_unlock(_root_mutex);
> + mutex_unlock(_mutex);
>   return 0;
>  }
>  
> @@ -1386,8 +1412,13 @@ static int cgroup_remount(struct super_block *sb, int 
> *flags, char *data)
>   /* re-populate subsystem files */
>   cgroup_populate_dir(cgrp, false, added_mask);
>  
> - if (opts.release_agent)
> - strcpy(root->release_agent_path, opts.release_agent);
> + if (opts.release_agent) {
> + struct cgroup *root_cgrp;
> + root_cgrp = cgroup_get_local_root(cgrp);
> + if (root_cgrp->ve_owner)
> + ret = ve_set_release_agent_path(root_cgrp,
> + opts.release_agent);
> + }
>   out_unlock:
>   kfree(opts.release_agent);
>   kfree(opts.name);
> @@ -1549,8 +1580,6 @@ static struct cgroupfs_root 
> *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
>   root->subsys_mask = opts->subsys_mask;
>   root->flags = opts->flags;
>   ida_init(>cgroup_ida);
> - if (opts->release_agent)
> - 

[Devel] [PATCH RH7 v4] cgroup: add export_operations to cgroup super block

2020-07-30 Thread Andrey Zhadchenko
criu uses fhandle from fdinfo to dump inotify objects. cgroup super block has
no export operations, but .encode_fh and .fh_to_dentry are needed for
inotify_fdinfo function and open_by_handle_at syscall in order to correctly
open files located on cgroupfs by fhandle.
Add hash table as a storage for inodes with exported fhandle.

v3: use inode->i_gen to protect from i_ino reusage. increase fhandle size to
2 * u32.
Add an option to take reference of inode in cgroup_find_inode, so no one can
delete recently found inode.
v4: introduced hashtable helper functions to avoid races.
changed i_gen generation from get_seconds to prandom_u32.

https://jira.sw.ru/browse/PSBM-105889
Signed-off-by: Andrey Zhadchenko 
---
 kernel/cgroup.c | 168 +++-
 1 file changed, 167 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9fdba79..956a9ac 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -62,6 +62,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
@@ -765,6 +767,7 @@ static struct inode *cgroup_new_inode(umode_t mode, struct 
super_block *sb)
 
if (inode) {
inode->i_ino = get_next_ino();
+   inode->i_generation = prandom_u32();
inode->i_mode = mode;
inode->i_uid = current_fsuid();
inode->i_gid = current_fsgid();
@@ -1390,9 +1393,171 @@ out:
 }
 #endif
 
+/*
+ * hashtable for inodes that have exported fhandles.
+ * When we export fhandle, we add it's inode into
+ * hashtable so we can find it fast
+ */
+
+#define CGROUP_INODE_HASH_BITS 10
+static DEFINE_HASHTABLE(cgroup_inode_table, CGROUP_INODE_HASH_BITS);
+static DEFINE_SPINLOCK(cgroup_inode_table_lock);
+
+struct cg_inode_hitem {
+   struct inode *inode;
+   struct hlist_node hlist;
+};
+
+static inline unsigned long cgroup_inode_get_hash(unsigned int i_ino)
+{
+   return hash_32(i_ino, CGROUP_INODE_HASH_BITS);
+}
+
+static struct cg_inode_hitem *cgroup_find_item_no_lock(unsigned long fh[2])
+{
+   struct cg_inode_hitem *i;
+   struct hlist_head *head = cgroup_inode_table
+   + cgroup_inode_get_hash(fh[1]);
+   struct cg_inode_hitem *found = NULL;
+
+   hlist_for_each_entry(i, head, hlist) {
+   if (i->inode->i_generation == fh[0] &&
+   i->inode->i_ino == fh[1]) {
+   found = i;
+   break;
+   }
+   }
+
+   return found;
+}
+
+static struct inode *cgroup_find_inode(unsigned long fh[2], char take_ref)
+{
+   struct cg_inode_hitem *item;
+   struct inode *ret = NULL;
+
+   spin_lock(_inode_table_lock);
+   item = cgroup_find_item_no_lock(fh);
+
+   /*
+* If we need to increase refcount, we should be aware of possible
+* deadlock. Another thread may have started deleting this inode:
+* iput->iput_final->cgroup_delete_inode->cgroup_hash_del
+* If we just call igrab, it will try to take i_lock and this will
+* result in deadlock, because deleting thread has already taken it
+* and waits on cgroup_inode_table_lock to find inode in hashtable.
+*
+* If i_count is zero, someone is deleting it -> skip.
+*/
+   if (take_ref && item)
+   if (!atomic_inc_not_zero(>inode->i_count))
+   item = NULL;
+
+   spin_unlock(_inode_table_lock);
+
+   if (item)
+   ret = item->inode;
+
+   return ret;
+}
+
+static int cgroup_hash_add(struct inode *inode)
+{
+   unsigned long fh[2] = {inode->i_generation, inode->i_ino};
+
+   if (!cgroup_find_inode(fh, 0)) {
+   struct cg_inode_hitem *item;
+   struct cg_inode_hitem *existing_item = 0;
+   struct hlist_head *head = cgroup_inode_table
+   + cgroup_inode_get_hash(inode->i_ino);
+
+   item = kmalloc(sizeof(struct cg_inode_hitem), GFP_KERNEL);
+   if (!item)
+   return -ENOMEM;
+   item->inode = inode;
+
+   spin_lock(_inode_table_lock);
+   existing_item = cgroup_find_item_no_lock(fh);
+   if (!existing_item)
+   hlist_add_head(>hlist, head);
+   spin_unlock(_inode_table_lock);
+
+   if (existing_item)
+   kfree(item);
+   }
+
+   return 0;
+}
+
+static void cgroup_hash_del(struct inode *inode)
+{
+   struct cg_inode_hitem *item;
+   unsigned long fh[2] = {inode->i_generation, inode->i_ino};
+
+   spin_lock(_inode_table_lock);
+   item = cgroup_find_item_no_lock(fh);
+   if (item)
+   hlist_del(>hlist);
+   spin_unlock(_inode_table_lock);
+
+   kfree(item);
+   return;
+}
+
+static struct dentry *cgroup_fh_to_dentry(struct super_block *sb,
+   struct fid *fid, int fh_len, int fh_type)
+{
+   struct