[PATCH] Re: [PATCH v3 1/2] mm: memcg: remote memcg charging for kmem allocations
On Thu, Mar 15, 2018 at 10:49 AM, Michal Hocko wrote: > Charging path is still a _hot path_. Especially when the kmem accounting > is enabled by default. You cannot simply downplay the overhead. We have > _one_ user but all users should pay the price. This is simply hard to > justify. Maybe we can thing of something that would put the burden on > the charging context? What do you think of the following? Signed-off-by: Shakeel Butt --- mm/memcontrol.c | 37 ++--- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5d3ea8799a2c..205043283716 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -701,6 +701,20 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) return memcg; } +static __always_inline struct mem_cgroup *get_mem_cgroup( + struct mem_cgroup *memcg, struct mm_struct *mm) +{ + if (unlikely(memcg)) { + rcu_read_lock(); + if (css_tryget_online(&memcg->css)) { + rcu_read_unlock(); + return memcg; + } + rcu_read_unlock(); + } + return get_mem_cgroup_from_mm(mm); +} + /** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root @@ -2119,15 +2133,6 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, } #ifndef CONFIG_SLOB -static struct mem_cgroup *get_mem_cgroup(struct mem_cgroup *memcg) -{ - rcu_read_lock(); - if (!css_tryget_online(&memcg->css)) - memcg = NULL; - rcu_read_unlock(); - return memcg; -} - static int memcg_alloc_cache_id(void) { int id, size; @@ -2257,7 +2262,7 @@ static inline bool memcg_kmem_bypass(void) */ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) { - struct mem_cgroup *memcg = NULL; + struct mem_cgroup *memcg; struct kmem_cache *memcg_cachep; int kmemcg_id; @@ -2269,10 +2274,7 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) if (current->memcg_kmem_skip_account) return cachep; - if (current->target_memcg) - memcg = get_mem_cgroup(current->target_memcg); - if (!memcg) - memcg = get_mem_cgroup_from_mm(current->mm); + memcg = get_mem_cgroup(current->target_memcg, current->mm); kmemcg_id = READ_ONCE(memcg->kmemcg_id); if (kmemcg_id < 0) goto out; @@ -2350,16 +2352,13 @@ int memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order, */ int memcg_kmem_charge(struct page *page, gfp_t gfp, int order) { - struct mem_cgroup *memcg = NULL; + struct mem_cgroup *memcg; int ret = 0; if (memcg_kmem_bypass()) return 0; - if (current->target_memcg) - memcg = get_mem_cgroup(current->target_memcg); - if (!memcg) - memcg = get_mem_cgroup_from_mm(current->mm); + memcg = get_mem_cgroup(current->target_memcg, current->mm); if (!mem_cgroup_is_root(memcg)) { ret = memcg_kmem_charge_memcg(page, gfp, order, memcg); if (!ret) -- 2.17.0.484.g0c8726318c-goog
Re: [PATCH v3 1/2] mm: memcg: remote memcg charging for kmem allocations
On Thu, Mar 15, 2018 at 10:49 AM, Michal Hocko wrote: > On Tue 13-03-18 10:55:18, Shakeel Butt wrote: >> On Tue, Mar 13, 2018 at 6:49 AM, Michal Hocko wrote: >> > On Wed 21-02-18 14:37:56, Shakeel Butt wrote: >> > [...] >> >> +#ifdef CONFIG_MEMCG >> >> +static inline struct mem_cgroup *memalloc_memcg_save(struct mem_cgroup >> >> *memcg) >> >> +{ >> >> + struct mem_cgroup *old_memcg = current->target_memcg; >> >> + current->target_memcg = memcg; >> >> + return old_memcg; >> >> +} >> > >> > So you are relying that the caller will handle the reference counting >> > properly? I do not think this is a good idea. >> >> For the fsnotify use-case, this assumption makes sense as fsnotify has >> an abstraction of fsnotify_group which is created by the >> person/process interested in the events and thus can be used to hold >> the reference to the person/process's memcg. > > OK, but there is not any direct connection between fsnotify_group and > task_struct lifetimes, is it? This makes the API suspectible to > use-after-free bugs. > For fsnotify, whoever is calling [fanotify|inotify]_handle_event() will have a stable reference to fsnotify_group and fsnotify_group has reference to memcg. These allocations happen within [fanotify|inotify]_handle_event(), so, for fsnotify I don't think there will be use-after-free bugs. Basically whoever is calling memcg variant of kmem_cache_alloc or kmalloc should either have stable direct or indirect reference to the memcg. >> Another use-case I have >> in mind is the filesystem mount. Basically attaching a mount with a >> memcg and thus all user pages and kmem allocations (inodes, dentries) >> for that mount will be charged to the attached memcg. > > So you charge page cache to the origin task but metadata to a different > memcg? > No, both page cache and metadata to a different memcg. >> In this use-case >> the super_block is the perfect structure to hold the reference to the >> memcg. >> >> If in future we find a use-case where this assumption does not make >> sense we can evolve the API and since this is kernel internal API, it >> should not be hard to evolve. >> >> > Also do we need some kind >> > of debugging facility to detect unbalanced save/restore scopes? >> > >> >> I am not sure, I didn't find other similar patterns (like PF_MEMALLOC) >> having debugging facility. > > Maybe we need something more generic here. > Please do let me know if you have something in mind. >> Maybe we can add such debugging facility >> when we find more users other than kmalloc & kmem_cache_alloc. Vmalloc >> may be one but I could not think of a use-case for vmalloc for remote >> charging, so, no need to add more code at this time. >> >> > [...] >> >> @@ -2260,7 +2269,10 @@ struct kmem_cache *memcg_kmem_get_cache(struct >> >> kmem_cache *cachep) >> >> if (current->memcg_kmem_skip_account) >> >> return cachep; >> >> >> >> - memcg = get_mem_cgroup_from_mm(current->mm); >> >> + if (current->target_memcg) >> >> + memcg = get_mem_cgroup(current->target_memcg); >> >> + if (!memcg) >> >> + memcg = get_mem_cgroup_from_mm(current->mm); >> >> kmemcg_id = READ_ONCE(memcg->kmemcg_id); >> >> if (kmemcg_id < 0) >> >> goto out; >> > >> > You are also adding one branch for _each_ charge path even though the >> > usecase is rather limited. >> > >> >> I understand the concern but the charging path, IMO, is much complex >> than just one or couple of additional branches. I can run a simple >> microbenchmark to see if there is anything noticeable here. > > Charging path is still a _hot path_. Especially when the kmem accounting > is enabled by default. You cannot simply downplay the overhead. We have > _one_ user but all users should pay the price. This is simply hard to > justify. Maybe we can thing of something that would put the burden on > the charging context? > I will see if I can find out a way for that.
Re: [PATCH v3 1/2] mm: memcg: remote memcg charging for kmem allocations
On Tue 13-03-18 10:55:18, Shakeel Butt wrote: > On Tue, Mar 13, 2018 at 6:49 AM, Michal Hocko wrote: > > On Wed 21-02-18 14:37:56, Shakeel Butt wrote: > > [...] > >> +#ifdef CONFIG_MEMCG > >> +static inline struct mem_cgroup *memalloc_memcg_save(struct mem_cgroup > >> *memcg) > >> +{ > >> + struct mem_cgroup *old_memcg = current->target_memcg; > >> + current->target_memcg = memcg; > >> + return old_memcg; > >> +} > > > > So you are relying that the caller will handle the reference counting > > properly? I do not think this is a good idea. > > For the fsnotify use-case, this assumption makes sense as fsnotify has > an abstraction of fsnotify_group which is created by the > person/process interested in the events and thus can be used to hold > the reference to the person/process's memcg. OK, but there is not any direct connection between fsnotify_group and task_struct lifetimes, is it? This makes the API suspectible to use-after-free bugs. > Another use-case I have > in mind is the filesystem mount. Basically attaching a mount with a > memcg and thus all user pages and kmem allocations (inodes, dentries) > for that mount will be charged to the attached memcg. So you charge page cache to the origin task but metadata to a different memcg? > In this use-case > the super_block is the perfect structure to hold the reference to the > memcg. > > If in future we find a use-case where this assumption does not make > sense we can evolve the API and since this is kernel internal API, it > should not be hard to evolve. > > > Also do we need some kind > > of debugging facility to detect unbalanced save/restore scopes? > > > > I am not sure, I didn't find other similar patterns (like PF_MEMALLOC) > having debugging facility. Maybe we need something more generic here. > Maybe we can add such debugging facility > when we find more users other than kmalloc & kmem_cache_alloc. Vmalloc > may be one but I could not think of a use-case for vmalloc for remote > charging, so, no need to add more code at this time. > > > [...] > >> @@ -2260,7 +2269,10 @@ struct kmem_cache *memcg_kmem_get_cache(struct > >> kmem_cache *cachep) > >> if (current->memcg_kmem_skip_account) > >> return cachep; > >> > >> - memcg = get_mem_cgroup_from_mm(current->mm); > >> + if (current->target_memcg) > >> + memcg = get_mem_cgroup(current->target_memcg); > >> + if (!memcg) > >> + memcg = get_mem_cgroup_from_mm(current->mm); > >> kmemcg_id = READ_ONCE(memcg->kmemcg_id); > >> if (kmemcg_id < 0) > >> goto out; > > > > You are also adding one branch for _each_ charge path even though the > > usecase is rather limited. > > > > I understand the concern but the charging path, IMO, is much complex > than just one or couple of additional branches. I can run a simple > microbenchmark to see if there is anything noticeable here. Charging path is still a _hot path_. Especially when the kmem accounting is enabled by default. You cannot simply downplay the overhead. We have _one_ user but all users should pay the price. This is simply hard to justify. Maybe we can thing of something that would put the burden on the charging context? -- Michal Hocko SUSE Labs
Re: [PATCH v3 1/2] mm: memcg: remote memcg charging for kmem allocations
On Tue, Mar 13, 2018 at 6:49 AM, Michal Hocko wrote: > On Wed 21-02-18 14:37:56, Shakeel Butt wrote: > [...] >> +#ifdef CONFIG_MEMCG >> +static inline struct mem_cgroup *memalloc_memcg_save(struct mem_cgroup >> *memcg) >> +{ >> + struct mem_cgroup *old_memcg = current->target_memcg; >> + current->target_memcg = memcg; >> + return old_memcg; >> +} > > So you are relying that the caller will handle the reference counting > properly? I do not think this is a good idea. For the fsnotify use-case, this assumption makes sense as fsnotify has an abstraction of fsnotify_group which is created by the person/process interested in the events and thus can be used to hold the reference to the person/process's memcg. Another use-case I have in mind is the filesystem mount. Basically attaching a mount with a memcg and thus all user pages and kmem allocations (inodes, dentries) for that mount will be charged to the attached memcg. In this use-case the super_block is the perfect structure to hold the reference to the memcg. If in future we find a use-case where this assumption does not make sense we can evolve the API and since this is kernel internal API, it should not be hard to evolve. > Also do we need some kind > of debugging facility to detect unbalanced save/restore scopes? > I am not sure, I didn't find other similar patterns (like PF_MEMALLOC) having debugging facility. Maybe we can add such debugging facility when we find more users other than kmalloc & kmem_cache_alloc. Vmalloc may be one but I could not think of a use-case for vmalloc for remote charging, so, no need to add more code at this time. > [...] >> @@ -2260,7 +2269,10 @@ struct kmem_cache *memcg_kmem_get_cache(struct >> kmem_cache *cachep) >> if (current->memcg_kmem_skip_account) >> return cachep; >> >> - memcg = get_mem_cgroup_from_mm(current->mm); >> + if (current->target_memcg) >> + memcg = get_mem_cgroup(current->target_memcg); >> + if (!memcg) >> + memcg = get_mem_cgroup_from_mm(current->mm); >> kmemcg_id = READ_ONCE(memcg->kmemcg_id); >> if (kmemcg_id < 0) >> goto out; > > You are also adding one branch for _each_ charge path even though the > usecase is rather limited. > I understand the concern but the charging path, IMO, is much complex than just one or couple of additional branches. I can run a simple microbenchmark to see if there is anything noticeable here. > I will have to think about this approach more. It is clearly less code > than your previous attempt but I cannot say I would be really impressed. > Thanks for your time.
Re: [PATCH v3 1/2] mm: memcg: remote memcg charging for kmem allocations
On Wed 21-02-18 14:37:56, Shakeel Butt wrote: [...] > +#ifdef CONFIG_MEMCG > +static inline struct mem_cgroup *memalloc_memcg_save(struct mem_cgroup > *memcg) > +{ > + struct mem_cgroup *old_memcg = current->target_memcg; > + current->target_memcg = memcg; > + return old_memcg; > +} So you are relying that the caller will handle the reference counting properly? I do not think this is a good idea. Also do we need some kind of debugging facility to detect unbalanced save/restore scopes? [...] > @@ -2260,7 +2269,10 @@ struct kmem_cache *memcg_kmem_get_cache(struct > kmem_cache *cachep) > if (current->memcg_kmem_skip_account) > return cachep; > > - memcg = get_mem_cgroup_from_mm(current->mm); > + if (current->target_memcg) > + memcg = get_mem_cgroup(current->target_memcg); > + if (!memcg) > + memcg = get_mem_cgroup_from_mm(current->mm); > kmemcg_id = READ_ONCE(memcg->kmemcg_id); > if (kmemcg_id < 0) > goto out; You are also adding one branch for _each_ charge path even though the usecase is rather limited. I will have to think about this approach more. It is clearly less code than your previous attempt but I cannot say I would be really impressed. -- Michal Hocko SUSE Labs