Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-10 Thread Michal Hocko
On Wed 10-09-14 09:57:56, Dave Hansen wrote: > On 09/10/2014 09:29 AM, Michal Hocko wrote: > > I do not have a bigger machine to play with unfortunately. I think the > > patch makes sense on its own. I would really appreciate if you could > > give it a try on your machine with !root memcg case to

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-10 Thread Dave Hansen
On 09/10/2014 09:29 AM, Michal Hocko wrote: > I do not have a bigger machine to play with unfortunately. I think the > patch makes sense on its own. I would really appreciate if you could > give it a try on your machine with !root memcg case to see how much it > helped. I would expect similar

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-10 Thread Michal Hocko
On Fri 05-09-14 11:25:37, Michal Hocko wrote: > On Thu 04-09-14 13:27:26, Dave Hansen wrote: > > On 09/04/2014 07:27 AM, Michal Hocko wrote: > > > Ouch. free_pages_and_swap_cache completely kills the uncharge batching > > > because it reduces it to PAGEVEC_SIZE batches. > > > > > > I think we

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-10 Thread Michal Hocko
On Fri 05-09-14 11:25:37, Michal Hocko wrote: On Thu 04-09-14 13:27:26, Dave Hansen wrote: On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching because it reduces it to PAGEVEC_SIZE batches. I think we really do not need

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-10 Thread Dave Hansen
On 09/10/2014 09:29 AM, Michal Hocko wrote: I do not have a bigger machine to play with unfortunately. I think the patch makes sense on its own. I would really appreciate if you could give it a try on your machine with !root memcg case to see how much it helped. I would expect similar results

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-10 Thread Michal Hocko
On Wed 10-09-14 09:57:56, Dave Hansen wrote: On 09/10/2014 09:29 AM, Michal Hocko wrote: I do not have a bigger machine to play with unfortunately. I think the patch makes sense on its own. I would really appreciate if you could give it a try on your machine with !root memcg case to see how

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-09 Thread Dave Hansen
On 09/09/2014 07:50 AM, Johannes Weiner wrote: > The mctz->lock is only taken when there is, or has been, soft limit > excess. However, the soft limit defaults to infinity, so unless you > set it explicitly on the root level, I can't see how this could be > mctz->lock contention. > > It's more

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-09 Thread Johannes Weiner
On Mon, Sep 08, 2014 at 08:47:37AM -0700, Dave Hansen wrote: > On 09/05/2014 05:35 AM, Johannes Weiner wrote: > > On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote: > >> On 09/04/2014 07:27 AM, Michal Hocko wrote: > >>> Ouch. free_pages_and_swap_cache completely kills the uncharge

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-09 Thread Johannes Weiner
On Mon, Sep 08, 2014 at 08:47:37AM -0700, Dave Hansen wrote: On 09/05/2014 05:35 AM, Johannes Weiner wrote: On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote: On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-09 Thread Dave Hansen
On 09/09/2014 07:50 AM, Johannes Weiner wrote: The mctz-lock is only taken when there is, or has been, soft limit excess. However, the soft limit defaults to infinity, so unless you set it explicitly on the root level, I can't see how this could be mctz-lock contention. It's more plausible

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-08 Thread Dave Hansen
On 09/05/2014 05:35 AM, Johannes Weiner wrote: > On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote: >> On 09/04/2014 07:27 AM, Michal Hocko wrote: >>> Ouch. free_pages_and_swap_cache completely kills the uncharge batching >>> because it reduces it to PAGEVEC_SIZE batches. >>> >>> I think

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-08 Thread Dave Hansen
On 09/05/2014 05:35 AM, Johannes Weiner wrote: On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote: On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching because it reduces it to PAGEVEC_SIZE batches. I think we really do

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Fri 05-09-14 10:47:23, Johannes Weiner wrote: > On Fri, Sep 05, 2014 at 11:25:37AM +0200, Michal Hocko wrote: > > @@ -900,10 +900,10 @@ void lru_add_drain_all(void) > > * grabbed the page via the LRU. If it did, give up: > > shrink_inactive_list() > > * will free it. > > */ > > -void

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Johannes Weiner
On Fri, Sep 05, 2014 at 11:25:37AM +0200, Michal Hocko wrote: > @@ -900,10 +900,10 @@ void lru_add_drain_all(void) > * grabbed the page via the LRU. If it did, give up: shrink_inactive_list() > * will free it. > */ > -void release_pages(struct page **pages, int nr, bool cold) > +static void

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Johannes Weiner
On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote: > On 09/04/2014 07:27 AM, Michal Hocko wrote: > > Ouch. free_pages_and_swap_cache completely kills the uncharge batching > > because it reduces it to PAGEVEC_SIZE batches. > > > > I think we really do not need PAGEVEC_SIZE batching

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Thu 04-09-14 15:53:46, Dave Hansen wrote: > On 09/04/2014 01:27 PM, Dave Hansen wrote: > > On 09/04/2014 07:27 AM, Michal Hocko wrote: > >> Ouch. free_pages_and_swap_cache completely kills the uncharge batching > >> because it reduces it to PAGEVEC_SIZE batches. > >> > >> I think we really do

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Thu 04-09-14 13:27:26, Dave Hansen wrote: > On 09/04/2014 07:27 AM, Michal Hocko wrote: > > Ouch. free_pages_and_swap_cache completely kills the uncharge batching > > because it reduces it to PAGEVEC_SIZE batches. > > > > I think we really do not need PAGEVEC_SIZE batching anymore. We are > >

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Thu 04-09-14 11:08:46, Johannes Weiner wrote: [...] > From 6fa7599054868cd0df940d7b0973dd64f8acb0b5 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner > Date: Thu, 4 Sep 2014 10:04:34 -0400 > Subject: [patch] mm: memcontrol: revert use of root_mem_cgroup res_counter > > Dave Hansen reports a

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Thu 04-09-14 11:08:46, Johannes Weiner wrote: [...] From 6fa7599054868cd0df940d7b0973dd64f8acb0b5 Mon Sep 17 00:00:00 2001 From: Johannes Weiner han...@cmpxchg.org Date: Thu, 4 Sep 2014 10:04:34 -0400 Subject: [patch] mm: memcontrol: revert use of root_mem_cgroup res_counter Dave Hansen

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Thu 04-09-14 13:27:26, Dave Hansen wrote: On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching because it reduces it to PAGEVEC_SIZE batches. I think we really do not need PAGEVEC_SIZE batching anymore. We are already

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Thu 04-09-14 15:53:46, Dave Hansen wrote: On 09/04/2014 01:27 PM, Dave Hansen wrote: On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching because it reduces it to PAGEVEC_SIZE batches. I think we really do not need

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Johannes Weiner
On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote: On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching because it reduces it to PAGEVEC_SIZE batches. I think we really do not need PAGEVEC_SIZE batching anymore. We

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Johannes Weiner
On Fri, Sep 05, 2014 at 11:25:37AM +0200, Michal Hocko wrote: @@ -900,10 +900,10 @@ void lru_add_drain_all(void) * grabbed the page via the LRU. If it did, give up: shrink_inactive_list() * will free it. */ -void release_pages(struct page **pages, int nr, bool cold) +static void

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-05 Thread Michal Hocko
On Fri 05-09-14 10:47:23, Johannes Weiner wrote: On Fri, Sep 05, 2014 at 11:25:37AM +0200, Michal Hocko wrote: @@ -900,10 +900,10 @@ void lru_add_drain_all(void) * grabbed the page via the LRU. If it did, give up: shrink_inactive_list() * will free it. */ -void

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Dave Hansen
On 09/04/2014 01:27 PM, Dave Hansen wrote: > On 09/04/2014 07:27 AM, Michal Hocko wrote: >> Ouch. free_pages_and_swap_cache completely kills the uncharge batching >> because it reduces it to PAGEVEC_SIZE batches. >> >> I think we really do not need PAGEVEC_SIZE batching anymore. We are >> already

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Dave Hansen
On 09/04/2014 08:08 AM, Johannes Weiner wrote: > Dave Hansen reports a massive scalability regression in an uncontained > page fault benchmark with more than 30 concurrent threads, which he > bisected down to 05b843012335 ("mm: memcontrol: use root_mem_cgroup > res_counter") and pin-pointed on

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Dave Hansen
On 09/04/2014 07:27 AM, Michal Hocko wrote: > Ouch. free_pages_and_swap_cache completely kills the uncharge batching > because it reduces it to PAGEVEC_SIZE batches. > > I think we really do not need PAGEVEC_SIZE batching anymore. We are > already batching on tlb_gather layer. That one is limited

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Johannes Weiner
On Tue, Sep 02, 2014 at 05:30:38PM -0700, Dave Hansen wrote: > On 09/02/2014 05:10 PM, Johannes Weiner wrote: > > On Tue, Sep 02, 2014 at 03:36:29PM -0700, Dave Hansen wrote: > >> On 09/02/2014 03:18 PM, Johannes Weiner wrote: > >>> Accounting new pages is buffered through per-cpu caches, but

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Michal Hocko
[Sorry to reply so late] On Tue 02-09-14 13:57:22, Dave Hansen wrote: > I, of course, forgot to include the most important detail. This appears > to be pretty run-of-the-mill spinlock contention in the resource counter > code. Nearly 80% of the CPU is spent spinning in the charge or uncharge >

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Dave Hansen
On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching because it reduces it to PAGEVEC_SIZE batches. I think we really do not need PAGEVEC_SIZE batching anymore. We are already batching on tlb_gather layer. That one is limited so I

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Dave Hansen
On 09/04/2014 08:08 AM, Johannes Weiner wrote: Dave Hansen reports a massive scalability regression in an uncontained page fault benchmark with more than 30 concurrent threads, which he bisected down to 05b843012335 (mm: memcontrol: use root_mem_cgroup res_counter) and pin-pointed on

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Dave Hansen
On 09/04/2014 01:27 PM, Dave Hansen wrote: On 09/04/2014 07:27 AM, Michal Hocko wrote: Ouch. free_pages_and_swap_cache completely kills the uncharge batching because it reduces it to PAGEVEC_SIZE batches. I think we really do not need PAGEVEC_SIZE batching anymore. We are already batching on

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Michal Hocko
[Sorry to reply so late] On Tue 02-09-14 13:57:22, Dave Hansen wrote: I, of course, forgot to include the most important detail. This appears to be pretty run-of-the-mill spinlock contention in the resource counter code. Nearly 80% of the CPU is spent spinning in the charge or uncharge

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-04 Thread Johannes Weiner
On Tue, Sep 02, 2014 at 05:30:38PM -0700, Dave Hansen wrote: On 09/02/2014 05:10 PM, Johannes Weiner wrote: On Tue, Sep 02, 2014 at 03:36:29PM -0700, Dave Hansen wrote: On 09/02/2014 03:18 PM, Johannes Weiner wrote: Accounting new pages is buffered through per-cpu caches, but taking them

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 06:33 PM, Johannes Weiner wrote: > kfree isn't eating 56% of "all cpu time" here, and it wasn't clear to > me whether Dave filtered symbols from only memcontrol.o, memory.o, and > mmap.o in a similar way. I'm not arguing against the regression, I'm > just trying to make sense of the

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Johannes Weiner
On Tue, Sep 02, 2014 at 05:20:55PM -0700, Linus Torvalds wrote: > On Tue, Sep 2, 2014 at 5:10 PM, Johannes Weiner wrote: > > > > That looks like a partial profile, where did the page allocator, page > > zeroing etc. go? Because the distribution among these listed symbols > > doesn't seem all

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 05:10 PM, Johannes Weiner wrote: > On Tue, Sep 02, 2014 at 03:36:29PM -0700, Dave Hansen wrote: >> On 09/02/2014 03:18 PM, Johannes Weiner wrote: >>> Accounting new pages is buffered through per-cpu caches, but taking >>> them off the counters on free is not, so I'm guessing that

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Linus Torvalds
On Tue, Sep 2, 2014 at 5:10 PM, Johannes Weiner wrote: > > That looks like a partial profile, where did the page allocator, page > zeroing etc. go? Because the distribution among these listed symbols > doesn't seem all that crazy: Please argue this *after* the commit has been reverted. You guys

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Johannes Weiner
On Tue, Sep 02, 2014 at 03:36:29PM -0700, Dave Hansen wrote: > On 09/02/2014 03:18 PM, Johannes Weiner wrote: > > Accounting new pages is buffered through per-cpu caches, but taking > > them off the counters on free is not, so I'm guessing that above a > > certain allocation rate the cost of

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 03:18 PM, Johannes Weiner wrote: > Accounting new pages is buffered through per-cpu caches, but taking > them off the counters on free is not, so I'm guessing that above a > certain allocation rate the cost of locking and changing the counters > takes over. Is there a chance you

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Johannes Weiner
Hi Dave, On Tue, Sep 02, 2014 at 12:05:41PM -0700, Dave Hansen wrote: > I'm seeing a pretty large regression in 3.17-rc2 vs 3.16 coming from the > memory cgroups code. This is on a kernel with cgroups enabled at > compile time, but not _used_ for anything. See the green lines in the > graph: >

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
I, of course, forgot to include the most important detail. This appears to be pretty run-of-the-mill spinlock contention in the resource counter code. Nearly 80% of the CPU is spent spinning in the charge or uncharge paths in the kernel. It is apparently spinning on res_counter->lock in both

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 12:05 PM, Dave Hansen wrote: > It does not revert cleanly because of the hunks below. The code in > those hunks was removed, so I tried running without properly merging > them and it spews warnings because counter->usage is seen going negative. > > So, it doesn't appear we can

regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
I'm seeing a pretty large regression in 3.17-rc2 vs 3.16 coming from the memory cgroups code. This is on a kernel with cgroups enabled at compile time, but not _used_ for anything. See the green lines in the graph: https://www.sr71.net/~dave/intel/regression-from-05b843012.png The

regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
I'm seeing a pretty large regression in 3.17-rc2 vs 3.16 coming from the memory cgroups code. This is on a kernel with cgroups enabled at compile time, but not _used_ for anything. See the green lines in the graph: https://www.sr71.net/~dave/intel/regression-from-05b843012.png The

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 12:05 PM, Dave Hansen wrote: It does not revert cleanly because of the hunks below. The code in those hunks was removed, so I tried running without properly merging them and it spews warnings because counter-usage is seen going negative. So, it doesn't appear we can quickly

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
I, of course, forgot to include the most important detail. This appears to be pretty run-of-the-mill spinlock contention in the resource counter code. Nearly 80% of the CPU is spent spinning in the charge or uncharge paths in the kernel. It is apparently spinning on res_counter-lock in both the

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Johannes Weiner
Hi Dave, On Tue, Sep 02, 2014 at 12:05:41PM -0700, Dave Hansen wrote: I'm seeing a pretty large regression in 3.17-rc2 vs 3.16 coming from the memory cgroups code. This is on a kernel with cgroups enabled at compile time, but not _used_ for anything. See the green lines in the graph:

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 03:18 PM, Johannes Weiner wrote: Accounting new pages is buffered through per-cpu caches, but taking them off the counters on free is not, so I'm guessing that above a certain allocation rate the cost of locking and changing the counters takes over. Is there a chance you could

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Johannes Weiner
On Tue, Sep 02, 2014 at 03:36:29PM -0700, Dave Hansen wrote: On 09/02/2014 03:18 PM, Johannes Weiner wrote: Accounting new pages is buffered through per-cpu caches, but taking them off the counters on free is not, so I'm guessing that above a certain allocation rate the cost of locking and

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Linus Torvalds
On Tue, Sep 2, 2014 at 5:10 PM, Johannes Weiner han...@cmpxchg.org wrote: That looks like a partial profile, where did the page allocator, page zeroing etc. go? Because the distribution among these listed symbols doesn't seem all that crazy: Please argue this *after* the commit has been

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 05:10 PM, Johannes Weiner wrote: On Tue, Sep 02, 2014 at 03:36:29PM -0700, Dave Hansen wrote: On 09/02/2014 03:18 PM, Johannes Weiner wrote: Accounting new pages is buffered through per-cpu caches, but taking them off the counters on free is not, so I'm guessing that above a

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Johannes Weiner
On Tue, Sep 02, 2014 at 05:20:55PM -0700, Linus Torvalds wrote: On Tue, Sep 2, 2014 at 5:10 PM, Johannes Weiner han...@cmpxchg.org wrote: That looks like a partial profile, where did the page allocator, page zeroing etc. go? Because the distribution among these listed symbols doesn't

Re: regression caused by cgroups optimization in 3.17-rc2

2014-09-02 Thread Dave Hansen
On 09/02/2014 06:33 PM, Johannes Weiner wrote: kfree isn't eating 56% of all cpu time here, and it wasn't clear to me whether Dave filtered symbols from only memcontrol.o, memory.o, and mmap.o in a similar way. I'm not arguing against the regression, I'm just trying to make sense of the