Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/15/2012 05:22 PM, Mel Gorman wrote: >> I believe it >> > to be a better and less complicated approach then letting a page appear >> > and then charging it. Besides being consistent with the rest of memcg, >> > it won't create unnecessary disturbance in the page allocator >> > when the allocation is to fail. >> > > I still don't get why you did not just return a mem_cgroup instead of a > handle. > Forgot this one, sorry: The reason is to keep the semantics simple. What should we return if the code is not compiled in? If we return NULL for failure, the test becomes memcg = memcg_kmem_charge_page(gfp, order); if (!memcg) exit; If we're not compiled in, we'd either return positive garbage or we need to wrap it inside an ifdef I personally believe to be a lot more clear to standardize on true to mean "allocation can proceed". the compiled out case becomes: if (!true) exit; which is easily compiled away altogether. Now of course, using struct mem_cgroup makes sense, and I have already changed that here. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
>> >> As for the type, do you think using struct mem_cgroup would be less >> confusing? >> > > Yes and returning the mem_cgroup or NULL instead of bool. Ok. struct mem_cgroup it is. > >> The placeholder is there, but it is later patched >> to the final thing. >> With that explained, if you want me to change it to something else, I >> can do it. Should I ? >> > > Not in this patch anyway. I would have preferred a pattern like this but > that's about it. > > #ifdef CONFIG_MEMCG_KMEM > extern struct static_key memcg_kmem_enabled_key; > static inline int memcg_kmem_enabled(void) > { >return static_key_false(_kmem_enabled_key); > } > #else > > static inline bool memcg_kmem_enabled(void) > { >return false; > } > #endif > humm, I'll have to think about this name. "memcg_kmem_enabled" means it is enabled in this cgroup. It is actually used inside memcontrol.c to denote precisely that. Now the static branch, of course, means it is globally enabled. Or as I called here, "on". > Two reasons. One, it does not use the terms "on" and "enabled" > interchangeably. The other reason is down to taste as I'm copying the > pattern I used myself for sk_memalloc_socks(). Of course I am biased. > > Also, why is the key exported? > Same reason. The slab will now have inline functions that will test against that. The alloc functions themselves, are inside the page allocator, and the exports can go away. But the static branch will still be tested inside inlined functions in the slab. That said, for the sake of simplicity, I can make it go away here, and add that to the right place later. >>> I also find it *very* strange to have a function named as if it is an >>> allocation-style function when it in fact it's looking up a mem_cgroup >>> and charging it (and uncharging it in the error path if necessary). If >>> it was called memcg_kmem_newpage_charge I might have found it a little >>> better. >> >> I don't feel strongly about names in general. I can change it. >> Will update to memcg_kmem_newpage_charge() and memcg_kmem_page_uncharge(). >> > > I would prefer that anyway. Names have meaning and people make assumptions on > the implementation depending on the name. We should try to be as consistent > as possible or maintenance becomes harder. I know there are areas where > we are not consistent at all but we should not compound the problem. memcg_kmem_page_charge() is even better I believe, and that is what I changed this to in my tree. >>> As this thing is called from within the allocator, it's not clear why >>> __memcg_kmem_new_page is exported. I can't imagine why a module would call >>> it directly although maybe you cover that somewhere else in the series. >> >> Okay, more people commented on this, so let me clarify: They shouldn't >> be. They were initially exported when this was about the slab only, >> because they could be called from inlined functions from the allocators. >> Now that the charge/uncharge was moved to the page allocator - which >> already allowed me the big benefit of separating this in two pieces, >> none of this needs to be exported. >> >> Sorry for not noticing this myself, but thanks for the eyes =) >> > > You're welcome. I expect to see all the exports disappear so. If there > are any exports left I think it would be important to document why they > have to be exported. This is particularly true because they are > EXPORT_SYMBOL not EXPORT_SYMBOL_GPL. I think it would be good to know in > advance why a module (particularly an out-of-tree one) would be > interested. > I will remove them all for now. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Wed, Aug 15, 2012 at 01:08:08PM +0400, Glauber Costa wrote: > On 08/14/2012 07:16 PM, Mel Gorman wrote: > > On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: > >> When a process tries to allocate a page with the __GFP_KMEMCG flag, the > >> page allocator will call the corresponding memcg functions to validate > >> the allocation. Tasks in the root memcg can always proceed. > >> > >> To avoid adding markers to the page - and a kmem flag that would > >> necessarily follow, as much as doing page_cgroup lookups for no reason, > > > > As you already guessed, doing a page_cgroup in the page allocator free > > path would be a no-go. > > Specifically yes, but in general, you will be able to observe that I am > taking all the possible measures to make sure existing paths are > disturbed as little as possible. > > Thanks for your review here > > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index b956cec..da341dc 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > >> order, > >>struct page *page = NULL; > >>int migratetype = allocflags_to_migratetype(gfp_mask); > >>unsigned int cpuset_mems_cookie; > >> + void *handle = NULL; > >> > >>gfp_mask &= gfp_allowed_mask; > >> > >> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > >> order, > >>return NULL; > >> > >>/* > >> + * Will only have any effect when __GFP_KMEMCG is set. > >> + * This is verified in the (always inline) callee > >> + */ > >> + if (!memcg_kmem_new_page(gfp_mask, , order)) > > > > memcg_kmem_new_page takes a void * parameter already but here you are > > passing in a void **. This probably happens to work because you do this > > > > struct mem_cgroup **handle = (struct mem_cgroup **)_handle; > > > > but that appears to defeat the purpose of having an opaque type as a > > "handle". You have to treat it different then passing it into the commit > > function because it expects a void *. The motivation for an opaque type > > is completely unclear to me and how it is managed with a mix of void * > > and void ** is very confusing. > > okay. > > The opaque exists because I am doing speculative charging. I do not get why speculative charing would mandate an opaque type or "handle". It looks like like a fairly standard prepare/commit pattern to me. > I believe it > to be a better and less complicated approach then letting a page appear > and then charging it. Besides being consistent with the rest of memcg, > it won't create unnecessary disturbance in the page allocator > when the allocation is to fail. > I still don't get why you did not just return a mem_cgroup instead of a handle. > Now, tasks can move between memcgs, so we can't rely on grabbing it from > current in commit_page, so we pass it around as a handle. You could just as easily passed around the mem_cgroup and it would have been less obscure. Maybe this makes sense from a memcg context and matches some coding pattern there that I'm not aware of. > Also, even if > the task could not move, we already got it once from the task, and that > is not for free. Better save it. > > Aside from the handle needed, the cost is more or less the same compared > to doing it in one pass. All we do by using speculative charging is to > split the cost in two, and doing it from two places. > We'd have to charge + update page_cgroup anyway. > > As for the type, do you think using struct mem_cgroup would be less > confusing? > Yes and returning the mem_cgroup or NULL instead of bool. > > On a similar note I spotted #define memcg_kmem_on 1 . That is also > > different just for the sake of it. The convension is to do something > > like this > > > > /* This helps us to avoid #ifdef CONFIG_NUMA */ > > #ifdef CONFIG_NUMA > > #define NUMA_BUILD 1 > > #else > > #define NUMA_BUILD 0 > > #endif > > For simple defines, yes. But a later patch will turn this into a static > branch test. memcg_kmem_on will be always 0 when compile-disabled, but > when enable will expand to static_branch(&...). > I see. > > > memcg_kmem_on was difficult to guess based on its name. I thought initially > > that it would only be active if a memcg existed or at least something like > > mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set. > > For now. And I thought that adding the static branch in this patch would > only confuse matters. Ah, I see now. I had stopped reading the series once I reached this patch. I don't think it would have mattered much to collapse the two patches together but ok. The static key handling does look a little suspicious. You appear to do reference counting in memcg_update_kmem_limit for every mem_cgroup_write() but decrement it on memcg exit. This does not appear as if it would be symmetric if the memcg files were written to multiple times (maybe that's not allowed?). Either way,
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu 09-08-12 17:01:15, Glauber Costa wrote: [...] > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b956cec..da341dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > struct page *page = NULL; > int migratetype = allocflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > + void *handle = NULL; > > gfp_mask &= gfp_allowed_mask; > > @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > return NULL; > > /* > + * Will only have any effect when __GFP_KMEMCG is set. > + * This is verified in the (always inline) callee > + */ > + if (!memcg_kmem_new_page(gfp_mask, , order)) > + return NULL; When the previous patch introduced this function I thought the handle obfuscantion is to prevent from spreading struct mem_cgroup inside the page allocator but memcg_kmem_commit_page uses the type directly. So why that obfuscation? Even handle as a name sounds unnecessarily confusing. I would go with struct mem_cgroup **memcgp or even return the pointer on success or NULL otherwise. [...] > +EXPORT_SYMBOL(__free_accounted_pages); Why exported? Btw. this is called from call_rcu context but it itself calls call_rcu down the chain in mem_cgroup_put. Is it safe? [...] > +EXPORT_SYMBOL(free_accounted_pages); here again -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/14/2012 07:16 PM, Mel Gorman wrote: > On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: >> When a process tries to allocate a page with the __GFP_KMEMCG flag, the >> page allocator will call the corresponding memcg functions to validate >> the allocation. Tasks in the root memcg can always proceed. >> >> To avoid adding markers to the page - and a kmem flag that would >> necessarily follow, as much as doing page_cgroup lookups for no reason, > > As you already guessed, doing a page_cgroup in the page allocator free > path would be a no-go. Specifically yes, but in general, you will be able to observe that I am taking all the possible measures to make sure existing paths are disturbed as little as possible. Thanks for your review here >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index b956cec..da341dc 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int >> order, >> struct page *page = NULL; >> int migratetype = allocflags_to_migratetype(gfp_mask); >> unsigned int cpuset_mems_cookie; >> +void *handle = NULL; >> >> gfp_mask &= gfp_allowed_mask; >> >> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int >> order, >> return NULL; >> >> /* >> + * Will only have any effect when __GFP_KMEMCG is set. >> + * This is verified in the (always inline) callee >> + */ >> +if (!memcg_kmem_new_page(gfp_mask, , order)) > > memcg_kmem_new_page takes a void * parameter already but here you are > passing in a void **. This probably happens to work because you do this > > struct mem_cgroup **handle = (struct mem_cgroup **)_handle; > > but that appears to defeat the purpose of having an opaque type as a > "handle". You have to treat it different then passing it into the commit > function because it expects a void *. The motivation for an opaque type > is completely unclear to me and how it is managed with a mix of void * > and void ** is very confusing. okay. The opaque exists because I am doing speculative charging. I believe it to be a better and less complicated approach then letting a page appear and then charging it. Besides being consistent with the rest of memcg, it won't create unnecessary disturbance in the page allocator when the allocation is to fail. Now, tasks can move between memcgs, so we can't rely on grabbing it from current in commit_page, so we pass it around as a handle. Also, even if the task could not move, we already got it once from the task, and that is not for free. Better save it. Aside from the handle needed, the cost is more or less the same compared to doing it in one pass. All we do by using speculative charging is to split the cost in two, and doing it from two places. We'd have to charge + update page_cgroup anyway. As for the type, do you think using struct mem_cgroup would be less confusing? > On a similar note I spotted #define memcg_kmem_on 1 . That is also > different just for the sake of it. The convension is to do something > like this > > /* This helps us to avoid #ifdef CONFIG_NUMA */ > #ifdef CONFIG_NUMA > #define NUMA_BUILD 1 > #else > #define NUMA_BUILD 0 > #endif For simple defines, yes. But a later patch will turn this into a static branch test. memcg_kmem_on will be always 0 when compile-disabled, but when enable will expand to static_branch(&...). > memcg_kmem_on was difficult to guess based on its name. I thought initially > that it would only be active if a memcg existed or at least something like > mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set. For now. And I thought that adding the static branch in this patch would only confuse matters. The placeholder is there, but it is later patched to the final thing. With that explained, if you want me to change it to something else, I can do it. Should I ? > I also find it *very* strange to have a function named as if it is an > allocation-style function when it in fact it's looking up a mem_cgroup > and charging it (and uncharging it in the error path if necessary). If > it was called memcg_kmem_newpage_charge I might have found it a little > better. I don't feel strongly about names in general. I can change it. Will update to memcg_kmem_newpage_charge() and memcg_kmem_page_uncharge(). > This whole operation also looks very expensive (cgroup lookups, RCU locks > taken etc) but I guess you're willing to take that cost in the same of > isolating containers from each other. However, I strongly suggest that > this overhead is measured in advance. It should not stop the series being > merged as such but it should be understood because if the cost is high > then this feature will be avoided like the plague. I am skeptical that > distributions would enable this by default, at least not without support > for cgroup_disable=kmem Enabling this feature will bring you nothing, therefore, no (or
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/14/2012 07:16 PM, Mel Gorman wrote: On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, As you already guessed, doing a page_cgroup in the page allocator free path would be a no-go. Specifically yes, but in general, you will be able to observe that I am taking all the possible measures to make sure existing paths are disturbed as little as possible. Thanks for your review here diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; +void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ +if (!memcg_kmem_new_page(gfp_mask, handle, order)) memcg_kmem_new_page takes a void * parameter already but here you are passing in a void **. This probably happens to work because you do this struct mem_cgroup **handle = (struct mem_cgroup **)_handle; but that appears to defeat the purpose of having an opaque type as a handle. You have to treat it different then passing it into the commit function because it expects a void *. The motivation for an opaque type is completely unclear to me and how it is managed with a mix of void * and void ** is very confusing. okay. The opaque exists because I am doing speculative charging. I believe it to be a better and less complicated approach then letting a page appear and then charging it. Besides being consistent with the rest of memcg, it won't create unnecessary disturbance in the page allocator when the allocation is to fail. Now, tasks can move between memcgs, so we can't rely on grabbing it from current in commit_page, so we pass it around as a handle. Also, even if the task could not move, we already got it once from the task, and that is not for free. Better save it. Aside from the handle needed, the cost is more or less the same compared to doing it in one pass. All we do by using speculative charging is to split the cost in two, and doing it from two places. We'd have to charge + update page_cgroup anyway. As for the type, do you think using struct mem_cgroup would be less confusing? On a similar note I spotted #define memcg_kmem_on 1 . That is also different just for the sake of it. The convension is to do something like this /* This helps us to avoid #ifdef CONFIG_NUMA */ #ifdef CONFIG_NUMA #define NUMA_BUILD 1 #else #define NUMA_BUILD 0 #endif For simple defines, yes. But a later patch will turn this into a static branch test. memcg_kmem_on will be always 0 when compile-disabled, but when enable will expand to static_branch(...). memcg_kmem_on was difficult to guess based on its name. I thought initially that it would only be active if a memcg existed or at least something like mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set. For now. And I thought that adding the static branch in this patch would only confuse matters. The placeholder is there, but it is later patched to the final thing. With that explained, if you want me to change it to something else, I can do it. Should I ? I also find it *very* strange to have a function named as if it is an allocation-style function when it in fact it's looking up a mem_cgroup and charging it (and uncharging it in the error path if necessary). If it was called memcg_kmem_newpage_charge I might have found it a little better. I don't feel strongly about names in general. I can change it. Will update to memcg_kmem_newpage_charge() and memcg_kmem_page_uncharge(). This whole operation also looks very expensive (cgroup lookups, RCU locks taken etc) but I guess you're willing to take that cost in the same of isolating containers from each other. However, I strongly suggest that this overhead is measured in advance. It should not stop the series being merged as such but it should be understood because if the cost is high then this feature will be avoided like the plague. I am skeptical that distributions would enable this by default, at least not without support for cgroup_disable=kmem Enabling this feature will bring you nothing, therefore, no (or little) overhead. Nothing of this will be patched in until the first memcg gets kmem limited. The
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu 09-08-12 17:01:15, Glauber Costa wrote: [...] diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ + if (!memcg_kmem_new_page(gfp_mask, handle, order)) + return NULL; When the previous patch introduced this function I thought the handle obfuscantion is to prevent from spreading struct mem_cgroup inside the page allocator but memcg_kmem_commit_page uses the type directly. So why that obfuscation? Even handle as a name sounds unnecessarily confusing. I would go with struct mem_cgroup **memcgp or even return the pointer on success or NULL otherwise. [...] +EXPORT_SYMBOL(__free_accounted_pages); Why exported? Btw. this is called from call_rcu context but it itself calls call_rcu down the chain in mem_cgroup_put. Is it safe? [...] +EXPORT_SYMBOL(free_accounted_pages); here again -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Wed, Aug 15, 2012 at 01:08:08PM +0400, Glauber Costa wrote: On 08/14/2012 07:16 PM, Mel Gorman wrote: On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, As you already guessed, doing a page_cgroup in the page allocator free path would be a no-go. Specifically yes, but in general, you will be able to observe that I am taking all the possible measures to make sure existing paths are disturbed as little as possible. Thanks for your review here diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ + if (!memcg_kmem_new_page(gfp_mask, handle, order)) memcg_kmem_new_page takes a void * parameter already but here you are passing in a void **. This probably happens to work because you do this struct mem_cgroup **handle = (struct mem_cgroup **)_handle; but that appears to defeat the purpose of having an opaque type as a handle. You have to treat it different then passing it into the commit function because it expects a void *. The motivation for an opaque type is completely unclear to me and how it is managed with a mix of void * and void ** is very confusing. okay. The opaque exists because I am doing speculative charging. I do not get why speculative charing would mandate an opaque type or handle. It looks like like a fairly standard prepare/commit pattern to me. I believe it to be a better and less complicated approach then letting a page appear and then charging it. Besides being consistent with the rest of memcg, it won't create unnecessary disturbance in the page allocator when the allocation is to fail. I still don't get why you did not just return a mem_cgroup instead of a handle. Now, tasks can move between memcgs, so we can't rely on grabbing it from current in commit_page, so we pass it around as a handle. You could just as easily passed around the mem_cgroup and it would have been less obscure. Maybe this makes sense from a memcg context and matches some coding pattern there that I'm not aware of. Also, even if the task could not move, we already got it once from the task, and that is not for free. Better save it. Aside from the handle needed, the cost is more or less the same compared to doing it in one pass. All we do by using speculative charging is to split the cost in two, and doing it from two places. We'd have to charge + update page_cgroup anyway. As for the type, do you think using struct mem_cgroup would be less confusing? Yes and returning the mem_cgroup or NULL instead of bool. On a similar note I spotted #define memcg_kmem_on 1 . That is also different just for the sake of it. The convension is to do something like this /* This helps us to avoid #ifdef CONFIG_NUMA */ #ifdef CONFIG_NUMA #define NUMA_BUILD 1 #else #define NUMA_BUILD 0 #endif For simple defines, yes. But a later patch will turn this into a static branch test. memcg_kmem_on will be always 0 when compile-disabled, but when enable will expand to static_branch(...). I see. memcg_kmem_on was difficult to guess based on its name. I thought initially that it would only be active if a memcg existed or at least something like mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set. For now. And I thought that adding the static branch in this patch would only confuse matters. Ah, I see now. I had stopped reading the series once I reached this patch. I don't think it would have mattered much to collapse the two patches together but ok. The static key handling does look a little suspicious. You appear to do reference counting in memcg_update_kmem_limit for every mem_cgroup_write() but decrement it on memcg exit. This does not appear as if it would be symmetric if the memcg files were written to multiple times (maybe that's not allowed?). Either way, the comment says it can never be disabled but as you have static_key_slow_dec calls it would appear that you *do* support them being disabled. Confusing. The placeholder is there, but
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
As for the type, do you think using struct mem_cgroup would be less confusing? Yes and returning the mem_cgroup or NULL instead of bool. Ok. struct mem_cgroup it is. The placeholder is there, but it is later patched to the final thing. With that explained, if you want me to change it to something else, I can do it. Should I ? Not in this patch anyway. I would have preferred a pattern like this but that's about it. #ifdef CONFIG_MEMCG_KMEM extern struct static_key memcg_kmem_enabled_key; static inline int memcg_kmem_enabled(void) { return static_key_false(memcg_kmem_enabled_key); } #else static inline bool memcg_kmem_enabled(void) { return false; } #endif humm, I'll have to think about this name. memcg_kmem_enabled means it is enabled in this cgroup. It is actually used inside memcontrol.c to denote precisely that. Now the static branch, of course, means it is globally enabled. Or as I called here, on. Two reasons. One, it does not use the terms on and enabled interchangeably. The other reason is down to taste as I'm copying the pattern I used myself for sk_memalloc_socks(). Of course I am biased. Also, why is the key exported? Same reason. The slab will now have inline functions that will test against that. The alloc functions themselves, are inside the page allocator, and the exports can go away. But the static branch will still be tested inside inlined functions in the slab. That said, for the sake of simplicity, I can make it go away here, and add that to the right place later. I also find it *very* strange to have a function named as if it is an allocation-style function when it in fact it's looking up a mem_cgroup and charging it (and uncharging it in the error path if necessary). If it was called memcg_kmem_newpage_charge I might have found it a little better. I don't feel strongly about names in general. I can change it. Will update to memcg_kmem_newpage_charge() and memcg_kmem_page_uncharge(). I would prefer that anyway. Names have meaning and people make assumptions on the implementation depending on the name. We should try to be as consistent as possible or maintenance becomes harder. I know there are areas where we are not consistent at all but we should not compound the problem. memcg_kmem_page_charge() is even better I believe, and that is what I changed this to in my tree. As this thing is called from within the allocator, it's not clear why __memcg_kmem_new_page is exported. I can't imagine why a module would call it directly although maybe you cover that somewhere else in the series. Okay, more people commented on this, so let me clarify: They shouldn't be. They were initially exported when this was about the slab only, because they could be called from inlined functions from the allocators. Now that the charge/uncharge was moved to the page allocator - which already allowed me the big benefit of separating this in two pieces, none of this needs to be exported. Sorry for not noticing this myself, but thanks for the eyes =) You're welcome. I expect to see all the exports disappear so. If there are any exports left I think it would be important to document why they have to be exported. This is particularly true because they are EXPORT_SYMBOL not EXPORT_SYMBOL_GPL. I think it would be good to know in advance why a module (particularly an out-of-tree one) would be interested. I will remove them all for now. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/15/2012 05:22 PM, Mel Gorman wrote: I believe it to be a better and less complicated approach then letting a page appear and then charging it. Besides being consistent with the rest of memcg, it won't create unnecessary disturbance in the page allocator when the allocation is to fail. I still don't get why you did not just return a mem_cgroup instead of a handle. Forgot this one, sorry: The reason is to keep the semantics simple. What should we return if the code is not compiled in? If we return NULL for failure, the test becomes memcg = memcg_kmem_charge_page(gfp, order); if (!memcg) exit; If we're not compiled in, we'd either return positive garbage or we need to wrap it inside an ifdef I personally believe to be a lot more clear to standardize on true to mean allocation can proceed. the compiled out case becomes: if (!true) exit; which is easily compiled away altogether. Now of course, using struct mem_cgroup makes sense, and I have already changed that here. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: > When a process tries to allocate a page with the __GFP_KMEMCG flag, the > page allocator will call the corresponding memcg functions to validate > the allocation. Tasks in the root memcg can always proceed. > > To avoid adding markers to the page - and a kmem flag that would > necessarily follow, as much as doing page_cgroup lookups for no reason, As you already guessed, doing a page_cgroup in the page allocator free path would be a no-go. This is my first time glancing at the series and I'm only paying close attention to this patch so pardon me if my observations have been made already. > whoever is marking its allocations with __GFP_KMEMCG flag is responsible > for telling the page allocator that this is such an allocation at > free_pages() time. This is done by the invocation of > __free_accounted_pages() and free_accounted_pages(). > > Signed-off-by: Glauber Costa > CC: Christoph Lameter > CC: Pekka Enberg > CC: Michal Hocko > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Suleiman Souhlal > --- > include/linux/gfp.h | 3 +++ > mm/page_alloc.c | 38 ++ > 2 files changed, 41 insertions(+) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index d8eae4d..029570f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int > order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > +extern void __free_accounted_pages(struct page *page, unsigned int order); > +extern void free_accounted_pages(unsigned long addr, unsigned int order); > + > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b956cec..da341dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > struct page *page = NULL; > int migratetype = allocflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > + void *handle = NULL; > > gfp_mask &= gfp_allowed_mask; > > @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > return NULL; > > /* > + * Will only have any effect when __GFP_KMEMCG is set. > + * This is verified in the (always inline) callee > + */ > + if (!memcg_kmem_new_page(gfp_mask, , order)) memcg_kmem_new_page takes a void * parameter already but here you are passing in a void **. This probably happens to work because you do this struct mem_cgroup **handle = (struct mem_cgroup **)_handle; but that appears to defeat the purpose of having an opaque type as a "handle". You have to treat it different then passing it into the commit function because it expects a void *. The motivation for an opaque type is completely unclear to me and how it is managed with a mix of void * and void ** is very confusing. On a similar note I spotted #define memcg_kmem_on 1 . That is also different just for the sake of it. The convension is to do something like this /* This helps us to avoid #ifdef CONFIG_NUMA */ #ifdef CONFIG_NUMA #define NUMA_BUILD 1 #else #define NUMA_BUILD 0 #endif memcg_kmem_on was difficult to guess based on its name. I thought initially that it would only be active if a memcg existed or at least something like mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set. I also find it *very* strange to have a function named as if it is an allocation-style function when it in fact it's looking up a mem_cgroup and charging it (and uncharging it in the error path if necessary). If it was called memcg_kmem_newpage_charge I might have found it a little better. While I believe you have to take care to avoid confusion with mem_cgroup_newpage_charge, it would be preferable if the APIs were similar. memcg is hard enough as it is to understand without having different APIs. This whole operation also looks very expensive (cgroup lookups, RCU locks taken etc) but I guess you're willing to take that cost in the same of isolating containers from each other. However, I strongly suggest that this overhead is measured in advance. It should not stop the series being merged as such but it should be understood because if the cost is high then this feature will be avoided like the plague. I am skeptical that distributions would enable this by default, at least not without support for cgroup_disable=kmem As this thing is called from within the allocator, it's not clear why __memcg_kmem_new_page is exported. I can't imagine why a module would call it directly although maybe you cover that somewhere else in the series. >From the point of view of a hook, that is acceptable but just barely. I have slammed other hooks
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, As you already guessed, doing a page_cgroup in the page allocator free path would be a no-go. This is my first time glancing at the series and I'm only paying close attention to this patch so pardon me if my observations have been made already. whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ + if (!memcg_kmem_new_page(gfp_mask, handle, order)) memcg_kmem_new_page takes a void * parameter already but here you are passing in a void **. This probably happens to work because you do this struct mem_cgroup **handle = (struct mem_cgroup **)_handle; but that appears to defeat the purpose of having an opaque type as a handle. You have to treat it different then passing it into the commit function because it expects a void *. The motivation for an opaque type is completely unclear to me and how it is managed with a mix of void * and void ** is very confusing. On a similar note I spotted #define memcg_kmem_on 1 . That is also different just for the sake of it. The convension is to do something like this /* This helps us to avoid #ifdef CONFIG_NUMA */ #ifdef CONFIG_NUMA #define NUMA_BUILD 1 #else #define NUMA_BUILD 0 #endif memcg_kmem_on was difficult to guess based on its name. I thought initially that it would only be active if a memcg existed or at least something like mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set. I also find it *very* strange to have a function named as if it is an allocation-style function when it in fact it's looking up a mem_cgroup and charging it (and uncharging it in the error path if necessary). If it was called memcg_kmem_newpage_charge I might have found it a little better. While I believe you have to take care to avoid confusion with mem_cgroup_newpage_charge, it would be preferable if the APIs were similar. memcg is hard enough as it is to understand without having different APIs. This whole operation also looks very expensive (cgroup lookups, RCU locks taken etc) but I guess you're willing to take that cost in the same of isolating containers from each other. However, I strongly suggest that this overhead is measured in advance. It should not stop the series being merged as such but it should be understood because if the cost is high then this feature will be avoided like the plague. I am skeptical that distributions would enable this by default, at least not without support for cgroup_disable=kmem As this thing is called from within the allocator, it's not clear why __memcg_kmem_new_page is exported. I can't imagine why a module would call it directly although maybe you cover that somewhere else in the series. From the point of view
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Mon, Aug 13, 2012 at 12:03:38PM +0400, Glauber Costa wrote: > On 08/10/2012 09:33 PM, Kamezawa Hiroyuki wrote: > > (2012/08/09 22:01), Glauber Costa wrote: > >> When a process tries to allocate a page with the __GFP_KMEMCG flag, the > >> page allocator will call the corresponding memcg functions to validate > >> the allocation. Tasks in the root memcg can always proceed. > >> > >> To avoid adding markers to the page - and a kmem flag that would > >> necessarily follow, as much as doing page_cgroup lookups for no reason, > >> whoever is marking its allocations with __GFP_KMEMCG flag is responsible > >> for telling the page allocator that this is such an allocation at > >> free_pages() time. This is done by the invocation of > >> __free_accounted_pages() and free_accounted_pages(). > >> > >> Signed-off-by: Glauber Costa > >> CC: Christoph Lameter > >> CC: Pekka Enberg > >> CC: Michal Hocko > >> CC: Kamezawa Hiroyuki > >> CC: Johannes Weiner > >> CC: Suleiman Souhlal > > > > Ah, ok. free_accounted_page() seems good. > > > > Acked-by: KAMEZAWA Hiroyuki > > > > I myself is okay with this. But... > > > > Because you add a new hook to alloc_pages(), please get Ack from Mel > > before requesting merge. > > > > Thanks, > > -Kame > > Absolutely. > > Mel, would you mind taking a look at this series and commenting on this? > It'll take me a few days but I'll get around to it. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/10/2012 09:33 PM, Kamezawa Hiroyuki wrote: > (2012/08/09 22:01), Glauber Costa wrote: >> When a process tries to allocate a page with the __GFP_KMEMCG flag, the >> page allocator will call the corresponding memcg functions to validate >> the allocation. Tasks in the root memcg can always proceed. >> >> To avoid adding markers to the page - and a kmem flag that would >> necessarily follow, as much as doing page_cgroup lookups for no reason, >> whoever is marking its allocations with __GFP_KMEMCG flag is responsible >> for telling the page allocator that this is such an allocation at >> free_pages() time. This is done by the invocation of >> __free_accounted_pages() and free_accounted_pages(). >> >> Signed-off-by: Glauber Costa >> CC: Christoph Lameter >> CC: Pekka Enberg >> CC: Michal Hocko >> CC: Kamezawa Hiroyuki >> CC: Johannes Weiner >> CC: Suleiman Souhlal > > Ah, ok. free_accounted_page() seems good. > > Acked-by: KAMEZAWA Hiroyuki > > I myself is okay with this. But... > > Because you add a new hook to alloc_pages(), please get Ack from Mel > before requesting merge. > > Thanks, > -Kame Absolutely. Mel, would you mind taking a look at this series and commenting on this? Thanks in advance. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/10/2012 09:36 PM, Greg Thelen wrote: > On Thu, Aug 09 2012, Glauber Costa wrote: > >> When a process tries to allocate a page with the __GFP_KMEMCG flag, the >> page allocator will call the corresponding memcg functions to validate >> the allocation. Tasks in the root memcg can always proceed. >> >> To avoid adding markers to the page - and a kmem flag that would >> necessarily follow, as much as doing page_cgroup lookups for no reason, >> whoever is marking its allocations with __GFP_KMEMCG flag is responsible >> for telling the page allocator that this is such an allocation at >> free_pages() time. This is done by the invocation of >> __free_accounted_pages() and free_accounted_pages(). >> >> Signed-off-by: Glauber Costa >> CC: Christoph Lameter >> CC: Pekka Enberg >> CC: Michal Hocko >> CC: Kamezawa Hiroyuki >> CC: Johannes Weiner >> CC: Suleiman Souhlal >> --- >> include/linux/gfp.h | 3 +++ >> mm/page_alloc.c | 38 ++ >> 2 files changed, 41 insertions(+) >> >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >> index d8eae4d..029570f 100644 >> --- a/include/linux/gfp.h >> +++ b/include/linux/gfp.h >> @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int >> order); >> extern void free_hot_cold_page(struct page *page, int cold); >> extern void free_hot_cold_page_list(struct list_head *list, int cold); >> >> +extern void __free_accounted_pages(struct page *page, unsigned int order); >> +extern void free_accounted_pages(unsigned long addr, unsigned int order); >> + >> #define __free_page(page) __free_pages((page), 0) >> #define free_page(addr) free_pages((addr), 0) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index b956cec..da341dc 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int >> order, >> struct page *page = NULL; >> int migratetype = allocflags_to_migratetype(gfp_mask); >> unsigned int cpuset_mems_cookie; >> +void *handle = NULL; >> >> gfp_mask &= gfp_allowed_mask; >> >> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int >> order, >> return NULL; >> >> /* >> + * Will only have any effect when __GFP_KMEMCG is set. >> + * This is verified in the (always inline) callee >> + */ >> +if (!memcg_kmem_new_page(gfp_mask, , order)) >> +return NULL; >> + >> +/* >> * Check the zones suitable for the gfp_mask contain at least one >> * valid zone. It's possible to have an empty zonelist as a result >> * of GFP_THISNODE and a memoryless node >> @@ -2583,6 +2591,8 @@ out: >> if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page)) >> goto retry_cpuset; >> >> +memcg_kmem_commit_page(page, handle, order); >> + >> return page; >> } >> EXPORT_SYMBOL(__alloc_pages_nodemask); >> @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int >> order) >> >> EXPORT_SYMBOL(free_pages); >> >> +/* >> + * __free_accounted_pages and free_accounted_pages will free pages allocated >> + * with __GFP_KMEMCG. >> + * >> + * Those pages are accounted to a particular memcg, embedded in the >> + * corresponding page_cgroup. To avoid adding a hit in the allocator to >> search >> + * for that information only to find out that it is NULL for users who have >> no >> + * interest in that whatsoever, we provide these functions. >> + * >> + * The caller knows better which flags it relies on. >> + */ >> +void __free_accounted_pages(struct page *page, unsigned int order) >> +{ >> +memcg_kmem_free_page(page, order); >> +__free_pages(page, order); >> +} >> +EXPORT_SYMBOL(__free_accounted_pages); >> + >> +void free_accounted_pages(unsigned long addr, unsigned int order) >> +{ >> +if (addr != 0) { >> +VM_BUG_ON(!virt_addr_valid((void *)addr)); >> +memcg_kmem_free_page(virt_to_page((void *)addr), order); >> +__free_pages(virt_to_page((void *)addr), order); > > Nit. Is there any reason not to replace the above two lines with: > __free_accounted_pages(virt_to_page((void *)addr), order); > Not any particular reason. If people prefer it this way, I can do that with no problems. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/10/2012 09:36 PM, Greg Thelen wrote: On Thu, Aug 09 2012, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; +void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ +if (!memcg_kmem_new_page(gfp_mask, handle, order)) +return NULL; + +/* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node @@ -2583,6 +2591,8 @@ out: if (unlikely(!put_mems_allowed(cpuset_mems_cookie) !page)) goto retry_cpuset; +memcg_kmem_commit_page(page, handle, order); + return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); +/* + * __free_accounted_pages and free_accounted_pages will free pages allocated + * with __GFP_KMEMCG. + * + * Those pages are accounted to a particular memcg, embedded in the + * corresponding page_cgroup. To avoid adding a hit in the allocator to search + * for that information only to find out that it is NULL for users who have no + * interest in that whatsoever, we provide these functions. + * + * The caller knows better which flags it relies on. + */ +void __free_accounted_pages(struct page *page, unsigned int order) +{ +memcg_kmem_free_page(page, order); +__free_pages(page, order); +} +EXPORT_SYMBOL(__free_accounted_pages); + +void free_accounted_pages(unsigned long addr, unsigned int order) +{ +if (addr != 0) { +VM_BUG_ON(!virt_addr_valid((void *)addr)); +memcg_kmem_free_page(virt_to_page((void *)addr), order); +__free_pages(virt_to_page((void *)addr), order); Nit. Is there any reason not to replace the above two lines with: __free_accounted_pages(virt_to_page((void *)addr), order); Not any particular reason. If people prefer it this way, I can do that with no problems. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/10/2012 09:33 PM, Kamezawa Hiroyuki wrote: (2012/08/09 22:01), Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com Ah, ok. free_accounted_page() seems good. Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com I myself is okay with this. But... Because you add a new hook to alloc_pages(), please get Ack from Mel before requesting merge. Thanks, -Kame Absolutely. Mel, would you mind taking a look at this series and commenting on this? Thanks in advance. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Mon, Aug 13, 2012 at 12:03:38PM +0400, Glauber Costa wrote: On 08/10/2012 09:33 PM, Kamezawa Hiroyuki wrote: (2012/08/09 22:01), Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com Ah, ok. free_accounted_page() seems good. Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com I myself is okay with this. But... Because you add a new hook to alloc_pages(), please get Ack from Mel before requesting merge. Thanks, -Kame Absolutely. Mel, would you mind taking a look at this series and commenting on this? It'll take me a few days but I'll get around to it. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu, Aug 09 2012, Glauber Costa wrote: > When a process tries to allocate a page with the __GFP_KMEMCG flag, the > page allocator will call the corresponding memcg functions to validate > the allocation. Tasks in the root memcg can always proceed. > > To avoid adding markers to the page - and a kmem flag that would > necessarily follow, as much as doing page_cgroup lookups for no reason, > whoever is marking its allocations with __GFP_KMEMCG flag is responsible > for telling the page allocator that this is such an allocation at > free_pages() time. This is done by the invocation of > __free_accounted_pages() and free_accounted_pages(). > > Signed-off-by: Glauber Costa > CC: Christoph Lameter > CC: Pekka Enberg > CC: Michal Hocko > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Suleiman Souhlal > --- > include/linux/gfp.h | 3 +++ > mm/page_alloc.c | 38 ++ > 2 files changed, 41 insertions(+) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index d8eae4d..029570f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int > order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > +extern void __free_accounted_pages(struct page *page, unsigned int order); > +extern void free_accounted_pages(unsigned long addr, unsigned int order); > + > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b956cec..da341dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > struct page *page = NULL; > int migratetype = allocflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > + void *handle = NULL; > > gfp_mask &= gfp_allowed_mask; > > @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > return NULL; > > /* > + * Will only have any effect when __GFP_KMEMCG is set. > + * This is verified in the (always inline) callee > + */ > + if (!memcg_kmem_new_page(gfp_mask, , order)) > + return NULL; > + > + /* >* Check the zones suitable for the gfp_mask contain at least one >* valid zone. It's possible to have an empty zonelist as a result >* of GFP_THISNODE and a memoryless node > @@ -2583,6 +2591,8 @@ out: > if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page)) > goto retry_cpuset; > > + memcg_kmem_commit_page(page, handle, order); > + > return page; > } > EXPORT_SYMBOL(__alloc_pages_nodemask); > @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order) > > EXPORT_SYMBOL(free_pages); > > +/* > + * __free_accounted_pages and free_accounted_pages will free pages allocated > + * with __GFP_KMEMCG. > + * > + * Those pages are accounted to a particular memcg, embedded in the > + * corresponding page_cgroup. To avoid adding a hit in the allocator to > search > + * for that information only to find out that it is NULL for users who have > no > + * interest in that whatsoever, we provide these functions. > + * > + * The caller knows better which flags it relies on. > + */ > +void __free_accounted_pages(struct page *page, unsigned int order) > +{ > + memcg_kmem_free_page(page, order); > + __free_pages(page, order); > +} > +EXPORT_SYMBOL(__free_accounted_pages); > + > +void free_accounted_pages(unsigned long addr, unsigned int order) > +{ > + if (addr != 0) { > + VM_BUG_ON(!virt_addr_valid((void *)addr)); > + memcg_kmem_free_page(virt_to_page((void *)addr), order); > + __free_pages(virt_to_page((void *)addr), order); Nit. Is there any reason not to replace the above two lines with: __free_accounted_pages(virt_to_page((void *)addr), order); > + } > +} > +EXPORT_SYMBOL(free_accounted_pages); > + > static void *make_alloc_exact(unsigned long addr, unsigned order, size_t > size) > { > if (addr) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
(2012/08/09 22:01), Glauber Costa wrote: > When a process tries to allocate a page with the __GFP_KMEMCG flag, the > page allocator will call the corresponding memcg functions to validate > the allocation. Tasks in the root memcg can always proceed. > > To avoid adding markers to the page - and a kmem flag that would > necessarily follow, as much as doing page_cgroup lookups for no reason, > whoever is marking its allocations with __GFP_KMEMCG flag is responsible > for telling the page allocator that this is such an allocation at > free_pages() time. This is done by the invocation of > __free_accounted_pages() and free_accounted_pages(). > > Signed-off-by: Glauber Costa > CC: Christoph Lameter > CC: Pekka Enberg > CC: Michal Hocko > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Suleiman Souhlal Ah, ok. free_accounted_page() seems good. Acked-by: KAMEZAWA Hiroyuki I myself is okay with this. But... Because you add a new hook to alloc_pages(), please get Ack from Mel before requesting merge. Thanks, -Kame > --- > include/linux/gfp.h | 3 +++ > mm/page_alloc.c | 38 ++ > 2 files changed, 41 insertions(+) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index d8eae4d..029570f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int > order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > +extern void __free_accounted_pages(struct page *page, unsigned int order); > +extern void free_accounted_pages(unsigned long addr, unsigned int order); > + > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b956cec..da341dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > struct page *page = NULL; > int migratetype = allocflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > + void *handle = NULL; > > gfp_mask &= gfp_allowed_mask; > > @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > return NULL; > > /* > + * Will only have any effect when __GFP_KMEMCG is set. > + * This is verified in the (always inline) callee > + */ > + if (!memcg_kmem_new_page(gfp_mask, , order)) > + return NULL; > + > + /* >* Check the zones suitable for the gfp_mask contain at least one >* valid zone. It's possible to have an empty zonelist as a result >* of GFP_THISNODE and a memoryless node > @@ -2583,6 +2591,8 @@ out: > if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page)) > goto retry_cpuset; > > + memcg_kmem_commit_page(page, handle, order); > + > return page; > } > EXPORT_SYMBOL(__alloc_pages_nodemask); > @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order) > > EXPORT_SYMBOL(free_pages); > > +/* > + * __free_accounted_pages and free_accounted_pages will free pages allocated > + * with __GFP_KMEMCG. > + * > + * Those pages are accounted to a particular memcg, embedded in the > + * corresponding page_cgroup. To avoid adding a hit in the allocator to > search > + * for that information only to find out that it is NULL for users who have > no > + * interest in that whatsoever, we provide these functions. > + * > + * The caller knows better which flags it relies on. > + */ > +void __free_accounted_pages(struct page *page, unsigned int order) > +{ > + memcg_kmem_free_page(page, order); > + __free_pages(page, order); > +} > +EXPORT_SYMBOL(__free_accounted_pages); > + > +void free_accounted_pages(unsigned long addr, unsigned int order) > +{ > + if (addr != 0) { > + VM_BUG_ON(!virt_addr_valid((void *)addr)); > + memcg_kmem_free_page(virt_to_page((void *)addr), order); > + __free_pages(virt_to_page((void *)addr), order); > + } > +} > +EXPORT_SYMBOL(free_accounted_pages); > + > static void *make_alloc_exact(unsigned long addr, unsigned order, size_t > size) > { > if (addr) { > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
(2012/08/09 22:01), Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com Ah, ok. free_accounted_page() seems good. Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com I myself is okay with this. But... Because you add a new hook to alloc_pages(), please get Ack from Mel before requesting merge. Thanks, -Kame --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ + if (!memcg_kmem_new_page(gfp_mask, handle, order)) + return NULL; + + /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node @@ -2583,6 +2591,8 @@ out: if (unlikely(!put_mems_allowed(cpuset_mems_cookie) !page)) goto retry_cpuset; + memcg_kmem_commit_page(page, handle, order); + return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); +/* + * __free_accounted_pages and free_accounted_pages will free pages allocated + * with __GFP_KMEMCG. + * + * Those pages are accounted to a particular memcg, embedded in the + * corresponding page_cgroup. To avoid adding a hit in the allocator to search + * for that information only to find out that it is NULL for users who have no + * interest in that whatsoever, we provide these functions. + * + * The caller knows better which flags it relies on. + */ +void __free_accounted_pages(struct page *page, unsigned int order) +{ + memcg_kmem_free_page(page, order); + __free_pages(page, order); +} +EXPORT_SYMBOL(__free_accounted_pages); + +void free_accounted_pages(unsigned long addr, unsigned int order) +{ + if (addr != 0) { + VM_BUG_ON(!virt_addr_valid((void *)addr)); + memcg_kmem_free_page(virt_to_page((void *)addr), order); + __free_pages(virt_to_page((void *)addr), order); + } +} +EXPORT_SYMBOL(free_accounted_pages); + static void *make_alloc_exact(unsigned long addr, unsigned order, size_t size) { if (addr) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu, Aug 09 2012, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ + if (!memcg_kmem_new_page(gfp_mask, handle, order)) + return NULL; + + /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node @@ -2583,6 +2591,8 @@ out: if (unlikely(!put_mems_allowed(cpuset_mems_cookie) !page)) goto retry_cpuset; + memcg_kmem_commit_page(page, handle, order); + return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); +/* + * __free_accounted_pages and free_accounted_pages will free pages allocated + * with __GFP_KMEMCG. + * + * Those pages are accounted to a particular memcg, embedded in the + * corresponding page_cgroup. To avoid adding a hit in the allocator to search + * for that information only to find out that it is NULL for users who have no + * interest in that whatsoever, we provide these functions. + * + * The caller knows better which flags it relies on. + */ +void __free_accounted_pages(struct page *page, unsigned int order) +{ + memcg_kmem_free_page(page, order); + __free_pages(page, order); +} +EXPORT_SYMBOL(__free_accounted_pages); + +void free_accounted_pages(unsigned long addr, unsigned int order) +{ + if (addr != 0) { + VM_BUG_ON(!virt_addr_valid((void *)addr)); + memcg_kmem_free_page(virt_to_page((void *)addr), order); + __free_pages(virt_to_page((void *)addr), order); Nit. Is there any reason not to replace the above two lines with: __free_accounted_pages(virt_to_page((void *)addr), order); + } +} +EXPORT_SYMBOL(free_accounted_pages); + static void *make_alloc_exact(unsigned long addr, unsigned order, size_t size) { if (addr) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/09/2012 08:33 PM, Greg Thelen wrote: > On Thu, Aug 09 2012, Glauber Costa wrote: > >> When a process tries to allocate a page with the __GFP_KMEMCG flag, the >> page allocator will call the corresponding memcg functions to validate >> the allocation. Tasks in the root memcg can always proceed. >> >> To avoid adding markers to the page - and a kmem flag that would >> necessarily follow, as much as doing page_cgroup lookups for no reason, >> whoever is marking its allocations with __GFP_KMEMCG flag is responsible >> for telling the page allocator that this is such an allocation at >> free_pages() time. This is done by the invocation of >> __free_accounted_pages() and free_accounted_pages(). >> >> Signed-off-by: Glauber Costa >> CC: Christoph Lameter >> CC: Pekka Enberg >> CC: Michal Hocko >> CC: Kamezawa Hiroyuki >> CC: Johannes Weiner >> CC: Suleiman Souhlal >> --- >> include/linux/gfp.h | 3 +++ >> mm/page_alloc.c | 38 ++ >> 2 files changed, 41 insertions(+) >> >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >> index d8eae4d..029570f 100644 >> --- a/include/linux/gfp.h >> +++ b/include/linux/gfp.h >> @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int >> order); >> extern void free_hot_cold_page(struct page *page, int cold); >> extern void free_hot_cold_page_list(struct list_head *list, int cold); >> >> +extern void __free_accounted_pages(struct page *page, unsigned int order); >> +extern void free_accounted_pages(unsigned long addr, unsigned int order); >> + >> #define __free_page(page) __free_pages((page), 0) >> #define free_page(addr) free_pages((addr), 0) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index b956cec..da341dc 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int >> order, >> struct page *page = NULL; >> int migratetype = allocflags_to_migratetype(gfp_mask); >> unsigned int cpuset_mems_cookie; >> +void *handle = NULL; >> >> gfp_mask &= gfp_allowed_mask; >> >> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int >> order, >> return NULL; >> >> /* >> + * Will only have any effect when __GFP_KMEMCG is set. >> + * This is verified in the (always inline) callee >> + */ >> +if (!memcg_kmem_new_page(gfp_mask, , order)) >> +return NULL; >> + >> +/* >> * Check the zones suitable for the gfp_mask contain at least one >> * valid zone. It's possible to have an empty zonelist as a result >> * of GFP_THISNODE and a memoryless node > > If memcg_kmem_new_page() succeeds then it may have obtained a memcg > reference with mem_cgroup_get(). I think this reference is leaked when > returning below: > > /* >* Check the zones suitable for the gfp_mask contain at least one >* valid zone. It's possible to have an empty zonelist as a result >* of GFP_THISNODE and a memoryless node >*/ > if (unlikely(!zonelist->_zonerefs->zone)) > return NULL; > > I suspect the easiest fix is to swap the call to memcg_kmem_new_page() > and the (!zonelist->_zonerefs->zone) check. > You are right, indeed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu, Aug 09 2012, Glauber Costa wrote: > When a process tries to allocate a page with the __GFP_KMEMCG flag, the > page allocator will call the corresponding memcg functions to validate > the allocation. Tasks in the root memcg can always proceed. > > To avoid adding markers to the page - and a kmem flag that would > necessarily follow, as much as doing page_cgroup lookups for no reason, > whoever is marking its allocations with __GFP_KMEMCG flag is responsible > for telling the page allocator that this is such an allocation at > free_pages() time. This is done by the invocation of > __free_accounted_pages() and free_accounted_pages(). > > Signed-off-by: Glauber Costa > CC: Christoph Lameter > CC: Pekka Enberg > CC: Michal Hocko > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Suleiman Souhlal > --- > include/linux/gfp.h | 3 +++ > mm/page_alloc.c | 38 ++ > 2 files changed, 41 insertions(+) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index d8eae4d..029570f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int > order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > +extern void __free_accounted_pages(struct page *page, unsigned int order); > +extern void free_accounted_pages(unsigned long addr, unsigned int order); > + > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b956cec..da341dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > struct page *page = NULL; > int migratetype = allocflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > + void *handle = NULL; > > gfp_mask &= gfp_allowed_mask; > > @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int > order, > return NULL; > > /* > + * Will only have any effect when __GFP_KMEMCG is set. > + * This is verified in the (always inline) callee > + */ > + if (!memcg_kmem_new_page(gfp_mask, , order)) > + return NULL; > + > + /* >* Check the zones suitable for the gfp_mask contain at least one >* valid zone. It's possible to have an empty zonelist as a result >* of GFP_THISNODE and a memoryless node If memcg_kmem_new_page() succeeds then it may have obtained a memcg reference with mem_cgroup_get(). I think this reference is leaked when returning below: /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node */ if (unlikely(!zonelist->_zonerefs->zone)) return NULL; I suspect the easiest fix is to swap the call to memcg_kmem_new_page() and the (!zonelist->_zonerefs->zone) check. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa CC: Christoph Lameter CC: Pekka Enberg CC: Michal Hocko CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Suleiman Souhlal --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask &= gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* +* Will only have any effect when __GFP_KMEMCG is set. +* This is verified in the (always inline) callee +*/ + if (!memcg_kmem_new_page(gfp_mask, , order)) + return NULL; + + /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node @@ -2583,6 +2591,8 @@ out: if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page)) goto retry_cpuset; + memcg_kmem_commit_page(page, handle, order); + return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); +/* + * __free_accounted_pages and free_accounted_pages will free pages allocated + * with __GFP_KMEMCG. + * + * Those pages are accounted to a particular memcg, embedded in the + * corresponding page_cgroup. To avoid adding a hit in the allocator to search + * for that information only to find out that it is NULL for users who have no + * interest in that whatsoever, we provide these functions. + * + * The caller knows better which flags it relies on. + */ +void __free_accounted_pages(struct page *page, unsigned int order) +{ + memcg_kmem_free_page(page, order); + __free_pages(page, order); +} +EXPORT_SYMBOL(__free_accounted_pages); + +void free_accounted_pages(unsigned long addr, unsigned int order) +{ + if (addr != 0) { + VM_BUG_ON(!virt_addr_valid((void *)addr)); + memcg_kmem_free_page(virt_to_page((void *)addr), order); + __free_pages(virt_to_page((void *)addr), order); + } +} +EXPORT_SYMBOL(free_accounted_pages); + static void *make_alloc_exact(unsigned long addr, unsigned order, size_t size) { if (addr) { -- 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* +* Will only have any effect when __GFP_KMEMCG is set. +* This is verified in the (always inline) callee +*/ + if (!memcg_kmem_new_page(gfp_mask, handle, order)) + return NULL; + + /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node @@ -2583,6 +2591,8 @@ out: if (unlikely(!put_mems_allowed(cpuset_mems_cookie) !page)) goto retry_cpuset; + memcg_kmem_commit_page(page, handle, order); + return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); +/* + * __free_accounted_pages and free_accounted_pages will free pages allocated + * with __GFP_KMEMCG. + * + * Those pages are accounted to a particular memcg, embedded in the + * corresponding page_cgroup. To avoid adding a hit in the allocator to search + * for that information only to find out that it is NULL for users who have no + * interest in that whatsoever, we provide these functions. + * + * The caller knows better which flags it relies on. + */ +void __free_accounted_pages(struct page *page, unsigned int order) +{ + memcg_kmem_free_page(page, order); + __free_pages(page, order); +} +EXPORT_SYMBOL(__free_accounted_pages); + +void free_accounted_pages(unsigned long addr, unsigned int order) +{ + if (addr != 0) { + VM_BUG_ON(!virt_addr_valid((void *)addr)); + memcg_kmem_free_page(virt_to_page((void *)addr), order); + __free_pages(virt_to_page((void *)addr), order); + } +} +EXPORT_SYMBOL(free_accounted_pages); + static void *make_alloc_exact(unsigned long addr, unsigned order, size_t size) { if (addr) { -- 1.7.11.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On Thu, Aug 09 2012, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; + void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ + if (!memcg_kmem_new_page(gfp_mask, handle, order)) + return NULL; + + /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node If memcg_kmem_new_page() succeeds then it may have obtained a memcg reference with mem_cgroup_get(). I think this reference is leaked when returning below: /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node */ if (unlikely(!zonelist-_zonerefs-zone)) return NULL; I suspect the easiest fix is to swap the call to memcg_kmem_new_page() and the (!zonelist-_zonerefs-zone) check. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg
On 08/09/2012 08:33 PM, Greg Thelen wrote: On Thu, Aug 09 2012, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com --- include/linux/gfp.h | 3 +++ mm/page_alloc.c | 38 ++ 2 files changed, 41 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index d8eae4d..029570f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); +extern void __free_accounted_pages(struct page *page, unsigned int order); +extern void free_accounted_pages(unsigned long addr, unsigned int order); + #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; +void *handle = NULL; gfp_mask = gfp_allowed_mask; @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, return NULL; /* + * Will only have any effect when __GFP_KMEMCG is set. + * This is verified in the (always inline) callee + */ +if (!memcg_kmem_new_page(gfp_mask, handle, order)) +return NULL; + +/* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node If memcg_kmem_new_page() succeeds then it may have obtained a memcg reference with mem_cgroup_get(). I think this reference is leaked when returning below: /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It's possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node */ if (unlikely(!zonelist-_zonerefs-zone)) return NULL; I suspect the easiest fix is to swap the call to memcg_kmem_new_page() and the (!zonelist-_zonerefs-zone) check. You are right, indeed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/