Re: [RFC 0/1] add support for reclaiming priorities per mem cgroup

2017-03-30 Thread Shakeel Butt
> A more useful metric for memory pressure at this point is quantifying > that time you spend thrashing: time the job spends in direct reclaim > and on the flipside time the job waits for recently evicted pages to > come back. Combined, that gives you a good measure of overhead from > memory

Re: [PATCH 7/9] net: use kvmalloc with __GFP_REPEAT rather than open coded variant

2017-03-30 Thread Shakeel Butt
On Mon, Mar 6, 2017 at 2:33 AM, Michal Hocko wrote: > From: Michal Hocko > > fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc > with vmalloc fallback. Use the kvmalloc variant instead. Keep the > __GFP_REPEAT flag based on explanation from

Re: [PATCH] mm/zswap: fix potential deadlock in zswap_frontswap_store()

2017-03-31 Thread Shakeel Butt
On Fri, Mar 31, 2017 at 8:30 AM, Andrey Ryabinin wrote: > zswap_frontswap_store() is called during memory reclaim from > __frontswap_store() from swap_writepage() from shrink_page_list(). > This may happen in NOFS context, thus zswap shouldn't use __GFP_FS, > otherwise we

[PATCH v3] mm: fix condition for throttle_direct_reclaim

2017-03-14 Thread Shakeel Butt
dy to make a forward progress. So, add kswapd_failures check on the throttle_direct_reclaim condition. Signed-off-by: Shakeel Butt <shake...@google.com> Suggested-by: Michal Hocko <mho...@suse.com> Suggested-by: Johannes Weiner <han...@cmpxchg.org> Acked-by: Hillf Danton <hillf...@al

Re: [PATCH v2 RFC] mm/vmscan: more restrictive condition for retry in do_try_to_free_pages

2017-03-16 Thread Shakeel Butt
On Thu, Mar 16, 2017 at 12:57 PM, Johannes Weiner <han...@cmpxchg.org> wrote: > On Sat, Mar 11, 2017 at 09:52:15AM -0800, Shakeel Butt wrote: >> On Sat, Mar 11, 2017 at 5:51 AM, Yisheng Xie <ys...@foxmail.com> wrote: >> > @@ -2808,7 +2826,7 @@ static unsigned lo

Re: [PATCH v2 RFC] mm/vmscan: more restrictive condition for retry in do_try_to_free_pages

2017-03-11 Thread Shakeel Butt
o avoid this time costly and useless retrying, add a stub function > may_thrash and return true when memcg is disabled or on legacy > hierarchy. > > Signed-off-by: Yisheng Xie <xieyishe...@huawei.com> > Suggested-by: Shakeel Butt <shake...@google.com> > --- &

Re: [PATCH v3 RFC] mm/vmscan: more restrictive condition for retry of shrink_zones

2017-03-13 Thread Shakeel Butt
On Mon, Mar 13, 2017 at 1:33 AM, Michal Hocko wrote: > Please do not post new version after a single feedback and try to wait > for more review to accumulate. This is in the 3rd version and it is not > clear why it is still an RFC. > > On Sun 12-03-17 19:06:10, Yisheng Xie

Re: [PATCH] mm: fix condition for throttle_direct_reclaim

2017-03-13 Thread Shakeel Butt
On Mon, Mar 13, 2017 at 2:02 AM, Michal Hocko <mho...@kernel.org> wrote: > On Fri 10-03-17 11:46:20, Shakeel Butt wrote: >> Recently kswapd has been modified to give up after MAX_RECLAIM_RETRIES >> number of unsucessful iterations. Before going to sleep, kswapd thread &g

Re: [PATCH] mm: fix condition for throttle_direct_reclaim

2017-03-13 Thread Shakeel Butt
On Mon, Mar 13, 2017 at 8:46 AM, Michal Hocko <mho...@kernel.org> wrote: > On Mon 13-03-17 08:07:15, Shakeel Butt wrote: >> On Mon, Mar 13, 2017 at 2:02 AM, Michal Hocko <mho...@kernel.org> wrote: >> > On Fri 10-03-17 11:46:20, Shakeel Butt wrote: >> >> Re

Re: [PATCH v3 RFC] mm/vmscan: more restrictive condition for retry of shrink_zones

2017-03-12 Thread Shakeel Butt
o avoid this time costly and useless retrying, add a stub function > mem_cgroup_thrashed() and return true when memcg is disabled or on > legacy hierarchy. > > Signed-off-by: Yisheng Xie <xieyishe...@huawei.com> > Suggested-by: Shakeel Butt <shake...@google.com> Thanks.

[PATCH] mm: fix condition for throttle_direct_reclaim

2017-03-10 Thread Shakeel Butt
-by: Shakeel Butt <shake...@google.com> --- mm/vmscan.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index bae698484e8e..b2d24cc7a161 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2819,6 +2819,12 @@ static bool pfmemalloc_waterm

Re: [PATCH RFC] mm/vmscan: donot retry shrink zones when memcg is disabled

2017-03-10 Thread Shakeel Butt
On Fri, Mar 10, 2017 at 6:19 PM, Yisheng Xie wrote: > From: Yisheng Xie > > When we enter do_try_to_free_pages, the may_thrash is always clear, and > it will retry shrink zones to tap cgroup's reserves memory by setting > may_thrash when the former

[PATCH v2] mm: fix condition for throttle_direct_reclaim

2017-03-13 Thread Shakeel Butt
-by: Shakeel Butt <shake...@google.com> Suggested-by: Michal Hocko <mho...@suse.com> Suggested-by: Johannes Weiner <han...@cmpxchg.org> --- v2: Instead of separate helper function for checking kswapd_failures, added the check into pfmemalloc_watermark_ok() and renamed that funct

Re: [PATCH] mm: fix condition for throttle_direct_reclaim

2017-03-13 Thread Shakeel Butt
On Mon, Mar 13, 2017 at 12:58 PM, Johannes Weiner <han...@cmpxchg.org> wrote: > Hi Shakeel, > > On Fri, Mar 10, 2017 at 11:46:20AM -0800, Shakeel Butt wrote: >> Recently kswapd has been modified to give up after MAX_RECLAIM_RETRIES >> number of unsucessful iterations. Be

Re: [PATCH 1/9] mm: fix 100% CPU kswapd busyloop on unreclaimable nodes

2017-03-02 Thread Shakeel Butt
On Tue, Feb 28, 2017 at 1:39 PM, Johannes Weiner wrote: > Jia He reports a problem with kswapd spinning at 100% CPU when > requesting more hugepages than memory available in the system: > > $ echo 4000 >/proc/sys/vm/nr_hugepages > > top - 13:42:59 up 3:37, 1 user, load

Re: [PATCH 2/9] mm: support __GFP_REPEAT in kvmalloc_node for >32kB

2017-04-06 Thread Shakeel Butt
On Mon, Mar 6, 2017 at 2:30 AM, Michal Hocko wrote: > From: Michal Hocko > > vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp. > vhost_vsock because it would really like to prefer kmalloc to the > vmalloc fallback - see 23cc5a991c7a

Re: [RFC PATCH] mm: fadvise: avoid fadvise for fs without backing device

2017-08-18 Thread Shakeel Butt
On Fri, Aug 18, 2017 at 2:34 PM, Andrew Morton <a...@linux-foundation.org> wrote: > On Thu, 17 Aug 2017 18:20:17 -0700 Shakeel Butt <shake...@google.com> wrote: > >> +linux-mm, linux-kernel >> >> On Thu, Aug 17, 2017 at 6:10 PM, Shakeel Butt <shake...@google

Re: [RFC PATCH] mm: fadvise: avoid fadvise for fs without backing device

2017-08-17 Thread Shakeel Butt
+linux-mm, linux-kernel On Thu, Aug 17, 2017 at 6:10 PM, Shakeel Butt <shake...@google.com> wrote: > The fadvise() manpage is silent on fadvise()'s effect on > memory-based filesystems (shmem, hugetlbfs & ramfs) and pseudo > file systems (procfs, sysfs, kernfs). The cu

Re: [RFC PATCH] mm: fadvise: avoid fadvise for fs without backing device

2017-08-22 Thread Shakeel Butt
>> It doesn't sound like a risky change to me, although perhaps someone is >> depending on the current behaviour for obscure reasons, who knows. >> >> What are the reasons for this change? Is the current behaviour causing >> some sort of problem for someone? > > Yes, one of our generic library

Re: [PATCH] mm: respect the __GFP_NOWARN flag when warning about stalls

2017-09-13 Thread Shakeel Butt
> > We would have to consider (instead of jiffies) the time the process was > either running, or waiting on something that's related to memory > allocation/reclaim (page lock etc.). I.e. deduct the time the process > was runable but there was no available cpu. I expect however that such > level of

Re: [PATCH] fs, mm: account filp and names caches to kmemcg

2017-10-06 Thread Shakeel Butt
>> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0, >> - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); >> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); > > I might be wrong but isn't name cache only holding temporary objects > used for path

Re: [v8 0/4] cgroup-aware OOM killer

2017-10-02 Thread Shakeel Butt
(Replying again as format of previous reply got messed up). On Mon, Oct 2, 2017 at 1:00 PM, Tim Hockin wrote: > In the example above: > >root >/\ > A D > / \ >B C > > Does oom_group allow me to express "compare A and D; if A is chosen

Re: [v8 0/4] cgroup-aware OOM killer

2017-10-02 Thread Shakeel Butt
On Mon, Oct 2, 2017 at 12:56 PM, Michal Hocko <mho...@kernel.org> wrote: > On Mon 02-10-17 12:45:18, Shakeel Butt wrote: >> > I am sorry to cut the rest of your proposal because it simply goes over >> > the scope of the proposed solution while the usecase you are mentio

[PATCH] epoll: account epitem and eppoll_entry to kmemcg

2017-10-02 Thread Shakeel Butt
the epoll references and causing a burst of eventpoll_epi and eventpoll_pwq slab allocations. This patch opt-in the charging of eventpoll_epi and eventpoll_pwq slabs. Signed-off-by: Shakeel Butt <shake...@google.com> --- fs/eventpoll.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

[PATCH] kvm, mm: account kvm related kmem slabs to kmemcg

2017-10-05 Thread Shakeel Butt
by user space applications which has access to kvm and thus a buggy application can leak such memory. So, these caches should be accounted to kmemcg. Signed-off-by: Shakeel Butt <shake...@google.com> --- arch/x86/kvm/mmu.c | 4 ++-- virt/kvm/kvm_main.c | 2 +- 2 files changed, 3 insertions

Re: [PATCH] kvm, mm: account kvm related kmem slabs to kmemcg

2017-10-06 Thread Shakeel Butt
On Thu, Oct 5, 2017 at 9:28 PM, Anshuman Khandual <khand...@linux.vnet.ibm.com> wrote: > On 10/06/2017 06:37 AM, Shakeel Butt wrote: >> The kvm slabs can consume a significant amount of system memory >> and indeed in our production environment we have observed that

Re: [PATCH] mm: memcontrol: use per-cpu stocks for socket memory uncharging

2017-09-07 Thread Shakeel Butt
On Thu, Sep 7, 2017 at 11:47 AM, Roman Gushchin <g...@fb.com> wrote: > On Thu, Sep 07, 2017 at 11:44:12AM -0700, Shakeel Butt wrote: >> >> As far as other types of pages go: page cache and anon are already >> >> batched pretty well, but I think kmem might bene

Re: [PATCH] mm: memcontrol: use per-cpu stocks for socket memory uncharging

2017-09-07 Thread Shakeel Butt
>> As far as other types of pages go: page cache and anon are already >> batched pretty well, but I think kmem might benefit from this >> too. Have you considered using the stock in memcg_kmem_uncharge()? > > Good idea! > I'll try to find an appropriate testcase and check if it really > brings any

Re: [v8 0/4] cgroup-aware OOM killer

2017-10-01 Thread Shakeel Butt
> > Going back to Michal's example, say the user configured the following: > >root > /\ > A D > / \ >B C > > A global OOM event happens and we find this: > - A > D > - B, C, D are oomgroups > > What the user is telling us is that B, C, and D are compound

Re: [v8 0/4] cgroup-aware OOM killer

2017-10-02 Thread Shakeel Butt
> Yes and nobody is disputing that, really. I guess the main disconnect > here is that different people want to have more detailed control over > the victim selection while the patchset tries to handle the most > simplistic scenario when a no userspace control over the selection is > required. And

Re: [v8 0/4] cgroup-aware OOM killer

2017-10-02 Thread Shakeel Butt
> I am sorry to cut the rest of your proposal because it simply goes over > the scope of the proposed solution while the usecase you are mentioning > is still possible. If we want to compare intermediate nodes (which seems > to be the case) then we can always provide a knob to opt-in - be it your

Re: [RFC PATCH] mm: fadvise: avoid fadvise for fs without backing device

2017-08-25 Thread Shakeel Butt
On Fri, Aug 25, 2017 at 2:49 PM, Andrew Morton <a...@linux-foundation.org> wrote: > On Thu, 17 Aug 2017 18:20:17 -0700 Shakeel Butt <shake...@google.com> wrote: > >> +linux-mm, linux-kernel >> >> On Thu, Aug 17, 2017 at 6:10 PM, Shakeel Butt <shake...@google

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-04 Thread Shakeel Butt
On Mon, Sep 4, 2017 at 7:21 AM, Roman Gushchin wrote: > Introducing of cgroup-aware OOM killer changes the victim selection > algorithm used by default: instead of picking the largest process, > it will pick the largest memcg and then the largest process inside. > > This affects only

Re: [PATCH] epoll: account epitem and eppoll_entry to kmemcg

2017-10-04 Thread Shakeel Butt
> > I am not objecting to the patch I would just like to understand the > runaway case. ep_insert seems to limit the maximum number of watches to > max_user_watches which should be ~4% of lowmem if I am following the > code properly. pwq_cache should be bound by the number of watches as > well, or

Re: [v10 3/6] mm, oom: cgroup-aware OOM killer

2017-10-04 Thread Shakeel Butt
> + > +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control > *oc) > +{ > + struct mem_cgroup *iter; > + > + oc->chosen_memcg = NULL; > + oc->chosen_points = 0; > + > + /* > +* The oom_score is calculated for leaf memory cgroups (including >

Re: [v10 3/6] mm, oom: cgroup-aware OOM killer

2017-10-04 Thread Shakeel Butt
>> > + if (memcg_has_children(iter)) >> > + continue; >> >> && iter != root_mem_cgroup ? > > Oh, sure. I had a stupid bug in my test script, which prevented me from > catching this. Thanks! > > This should fix the problem. > -- > diff --git a/mm/memcontrol.c

[PATCH] fs, mm: account filp and names caches to kmemcg

2017-10-05 Thread Shakeel Butt
that a lot of machines spend very significant amount of memory on these caches. So, these caches should be accounted to kmemcg. Signed-off-by: Shakeel Butt <shake...@google.com> --- fs/dcache.c | 2 +- fs/file_table.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/dcac

[PATCH v2] fs, mm: account filp cache to kmemcg

2017-10-11 Thread Shakeel Butt
not specify that. However the man page also discourages to use _sysctl() at all. Signed-off-by: Shakeel Butt <shake...@google.com> --- Changelog since v1: - removed names_cache charging to kmemcg fs/file_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/file_t

Re: [PATCH] fs, mm: account filp and names caches to kmemcg

2017-10-10 Thread Shakeel Butt
On Sun, Oct 8, 2017 at 11:24 PM, Michal Hocko <mho...@kernel.org> wrote: > On Fri 06-10-17 12:33:03, Shakeel Butt wrote: >> >> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0, >> >> -

[PATCH] mm: mlock: remove lru_add_drain_all()

2017-10-18 Thread Shakeel Butt
(). Also there is no need for local lru_add_drain() as it will be called deep inside __mm_populate() (in follow_page_pte()). Signed-off-by: Shakeel Butt <shake...@google.com> --- mm/mlock.c | 5 - 1 file changed, 5 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index dfc6f1

Re: [PATCH] mm: mlock: remove lru_add_drain_all()

2017-10-19 Thread Shakeel Butt
On Wed, Oct 18, 2017 at 8:18 PM, Balbir Singh <bsinghar...@gmail.com> wrote: > On Wed, 18 Oct 2017 16:17:30 -0700 > Shakeel Butt <shake...@google.com> wrote: > >> Recently we have observed high latency in mlock() in our generic >> library and noticed that users ha

Re: [PATCH] mm: mlock: remove lru_add_drain_all()

2017-10-19 Thread Shakeel Butt
> [...] >> >> Sorry for the confusion. I wanted to say that if the pages which are >> being mlocked are on caches of remote cpus then lru_add_drain_all will >> move them to their corresponding LRUs and then remaining functionality >> of mlock will move them again from their evictable LRUs to

Re: [PATCH] mm: mlock: remove lru_add_drain_all()

2017-10-19 Thread Shakeel Butt
On Thu, Oct 19, 2017 at 5:32 AM, Michal Hocko <mho...@kernel.org> wrote: > On Wed 18-10-17 16:17:30, Shakeel Butt wrote: >> Recently we have observed high latency in mlock() in our generic >> library and noticed that users have started using tmpfs files even >> without s

Re: [PATCH] mm: mlock: remove lru_add_drain_all()

2017-10-19 Thread Shakeel Butt
On Wed, Oct 18, 2017 at 11:24 PM, Anshuman Khandual <khand...@linux.vnet.ibm.com> wrote: > On 10/19/2017 04:47 AM, Shakeel Butt wrote: >> Recently we have observed high latency in mlock() in our generic >> library and noticed that users have started using tmpfs files

Re: [PATCH] mm: mlock: remove lru_add_drain_all()

2017-10-19 Thread Shakeel Butt
On Thu, Oct 19, 2017 at 3:18 AM, Kirill A. Shutemov <kir...@shutemov.name> wrote: > On Wed, Oct 18, 2017 at 04:17:30PM -0700, Shakeel Butt wrote: >> Recently we have observed high latency in mlock() in our generic >> library and noticed that users have started using tmpfs

Re: [PATCH] mm: mlock: remove lru_add_drain_all()

2017-10-19 Thread Shakeel Butt
On Thu, Oct 19, 2017 at 1:13 PM, Michal Hocko <mho...@kernel.org> wrote: > On Thu 19-10-17 12:46:50, Shakeel Butt wrote: >> > [...] >> >> >> >> Sorry for the confusion. I wanted to say that if the pages which are >> >> being mlock

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-14 Thread Shakeel Butt
On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim wrote: > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: >> When shrinker_rwsem was introduced, it was assumed that >> register_shrinker()/unregister_shrinker() are really unlikely paths >> which are called during

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Shakeel Butt
On Wed, Nov 15, 2017 at 4:46 PM, Minchan Kim <minc...@kernel.org> wrote: > On Tue, Nov 14, 2017 at 10:28:10PM -0800, Shakeel Butt wrote: >> On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim <minc...@kernel.org> wrote: >> > On Tue, Nov 14, 2017 at 06:37:42AM +0900,

Re: [PATCH] mm, mlock, vmscan: no more skipping pagevecs

2017-11-15 Thread Shakeel Butt
Ping, really appreciate comments on this patch. On Sat, Nov 4, 2017 at 3:43 PM, Shakeel Butt <shake...@google.com> wrote: > When a thread mlocks an address space backed by file, a new > page is allocated (assuming file page is not in memory), added > to the local pagevec (lru

[PATCH] mm, memcg: fix mem_cgroup_swapout() for THPs

2017-11-28 Thread Shakeel Butt
oups whose THPs were swapped out to become zombies on deletion. Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP") Signed-off-by: Shakeel Butt <shake...@google.com> Cc: sta...@vger.kernel.org --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+),

Re: [PATCH] mm, memcg: fix mem_cgroup_swapout() for THPs

2017-11-28 Thread Shakeel Butt
On Tue, Nov 28, 2017 at 12:00 PM, Michal Hocko <mho...@kernel.org> wrote: > On Tue 28-11-17 08:19:41, Shakeel Butt wrote: >> The commit d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() >> support THP") changed mem_cgroup_swapout() to support transpar

Re: XArray documentation

2017-11-24 Thread Shakeel Butt
On Fri, Nov 24, 2017 at 10:01 AM, Martin Steigerwald wrote: > Hi Matthew. > > Matthew Wilcox - 24.11.17, 18:03: >> On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote: >> > Matthew Wilcox - 24.11.17, 02:16: >> > > == >> > > XArray >> > > == >> > > >> >

Re: [PATCH] mm: Make count list_lru_one::nr_items lockless

2017-11-29 Thread Shakeel Butt
On Fri, Sep 29, 2017 at 1:15 AM, Kirill Tkhai wrote: > On 29.09.2017 00:02, Andrew Morton wrote: >> On Thu, 28 Sep 2017 10:48:55 +0300 Kirill Tkhai wrote: >> > This patch aims to make super_cache_count() (and other functions, > which count LRU

Re: [RFC PATCH] mm: memcontrol: memory+swap accounting for cgroup-v2

2017-12-19 Thread Shakeel Butt
On Tue, Dec 19, 2017 at 4:49 AM, Michal Hocko <mho...@kernel.org> wrote: > On Mon 18-12-17 16:01:31, Shakeel Butt wrote: >> The memory controller in cgroup v1 provides the memory+swap (memsw) >> interface to account to the combined usage of memory and swap of the >>

Re: [RFC PATCH] mm: memcontrol: memory+swap accounting for cgroup-v2

2017-12-19 Thread Shakeel Butt
On Tue, Dec 19, 2017 at 7:24 AM, Tejun Heo <t...@kernel.org> wrote: > Hello, > > On Tue, Dec 19, 2017 at 07:12:19AM -0800, Shakeel Butt wrote: >> Yes, there are pros & cons, therefore we should give users the option >> to select the API that is better suited for thei

Re: [RFC PATCH] mm: memcontrol: memory+swap accounting for cgroup-v2

2017-12-19 Thread Shakeel Butt
On Tue, Dec 19, 2017 at 1:41 PM, Tejun Heo <t...@kernel.org> wrote: > Hello, > > On Tue, Dec 19, 2017 at 10:25:12AM -0800, Shakeel Butt wrote: >> Making the runtime environment, an invariant is very critical to make >> the management of a job easier whose instances r

Re: [RFC PATCH] mm: memcontrol: memory+swap accounting for cgroup-v2

2017-12-19 Thread Shakeel Butt
On Tue, Dec 19, 2017 at 9:33 AM, Tejun Heo <t...@kernel.org> wrote: > Hello, > > On Tue, Dec 19, 2017 at 09:23:29AM -0800, Shakeel Butt wrote: >> To provide consistent memory usage history using the current >> cgroup-v2's 'swap' interface, an additional metric exp

[RFC PATCH] mm: memcontrol: memory+swap accounting for cgroup-v2

2017-12-18 Thread Shakeel Butt
if there are no decendants of the root cgroup. When memsw accounting is enabled then "memory.high" is comapred with memory+swap usage. So, when the allocating job's memsw usage hits its high mark, the job will be throttled by triggering memory reclaim. Signed-off-by: Shakeel Butt <shake.

Re: [PATCH] mm/shmem: set default tmpfs size according to memcg limit

2017-11-17 Thread Shakeel Butt
On Fri, Nov 17, 2017 at 9:41 AM, Yafang Shao <laoar.s...@gmail.com> wrote: > 2017-11-18 1:35 GMT+08:00 Shakeel Butt <shake...@google.com>: >> On Fri, Nov 17, 2017 at 9:09 AM, Yafang Shao <laoar.s...@gmail.com> wrote: >>> 2017-11-18 0:45 GMT+08:00 Roman Gushchi

Re: [PATCH] mm/shmem: set default tmpfs size according to memcg limit

2017-11-17 Thread Shakeel Butt
>> > On Thu, Nov 16, 2017 at 08:43:17PM -0800, Shakeel Butt wrote: >>> >> On Thu, Nov 16, 2017 at 7:09 PM, Yafang Shao <laoar.s...@gmail.com> >>> >> wrote: >>> >> > Currently the default tmpfs size is totalram_pages / 2 if mount tmpf

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-17 Thread Shakeel Butt
On Fri, Nov 17, 2017 at 9:41 AM, Shakeel Butt <shake...@google.com> wrote: > On Fri, Nov 17, 2017 at 9:35 AM, Christoph Hellwig <h...@infradead.org> wrote: >> On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: >>> Since do_shrink_slab() can reschedule, w

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-17 Thread Shakeel Butt
On Fri, Nov 17, 2017 at 9:35 AM, Christoph Hellwig wrote: > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: >> Since do_shrink_slab() can reschedule, we cannot protect shrinker_list >> using one RCU section. But using atomic_inc()/atomic_dec() for each >>

Re: [PATCH v2] mm, shrinker: make shrinker_list lockless

2017-11-10 Thread Shakeel Butt
On Thu, Nov 9, 2017 at 1:46 PM, Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> wrote: > Shakeel Butt wrote: >> > If you can accept serialized register_shrinker()/unregister_shrinker(), >> > I think that something like shown below can do it. >> >

Re: [PATCH] mm, mlock, vmscan: no more skipping pagevecs

2017-11-21 Thread Shakeel Butt
On Tue, Nov 21, 2017 at 7:32 AM, Johannes Weiner <han...@cmpxchg.org> wrote: > On Sat, Nov 04, 2017 at 03:43:12PM -0700, Shakeel Butt wrote: >> When a thread mlocks an address space backed by file, a new >> page is allocated (assuming file page is not in memory), added &g

Re: [PATCH] mm, mlock, vmscan: no more skipping pagevecs

2017-11-21 Thread Shakeel Butt
On Tue, Nov 21, 2017 at 7:06 AM, Johannes Weiner <han...@cmpxchg.org> wrote: > On Tue, Nov 21, 2017 at 01:39:57PM +0100, Vlastimil Babka wrote: >> On 11/04/2017 11:43 PM, Shakeel Butt wrote: >> > When a thread mlocks an address space backed by file, a new >> > page

Re: [PATCH] mm, mlock, vmscan: no more skipping pagevecs

2017-11-21 Thread Shakeel Butt
On Tue, Nov 21, 2017 at 7:06 AM, Johannes Weiner <han...@cmpxchg.org> wrote: > On Tue, Nov 21, 2017 at 01:39:57PM +0100, Vlastimil Babka wrote: >> On 11/04/2017 11:43 PM, Shakeel Butt wrote: >> > When a thread mlocks an address space backed by file, a new >> > page

[PATCH v2] mm, mlock, vmscan: no more skipping pagevecs

2017-11-21 Thread Shakeel Butt
out this patch, the pages allocated for System V shared memory segment are added to evictable LRUs even after shmctl(SHM_LOCK) on that segment. This patch will correctly put such pages to unevictable LRU. Signed-off-by: Shakeel Butt <shake...@google.com> Acked-by: Vlastimil Babka <vb

Re: [PATCH] mm/shmem: set default tmpfs size according to memcg limit

2017-11-16 Thread Shakeel Butt
On Thu, Nov 16, 2017 at 7:09 PM, Yafang Shao wrote: > Currently the default tmpfs size is totalram_pages / 2 if mount tmpfs > without "-o size=XXX". > When we mount tmpfs in a container(i.e. docker), it is also > totalram_pages / 2 regardless of the memory limit on this

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-13 Thread Shakeel Butt
expect that > do_shrink_slab() of unregistering shrinker likely returns shortly, and > we can avoid khungtaskd warnings when do_shrink_slab() of unregistering > shrinker unexpectedly took so long. > > Signed-off-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Reviewed-and-

[PATCH] vfs: remove might_sleep() from clear_inode()

2017-11-07 Thread Shakeel Butt
After these patches there is no sleeping operation in clear_inode(). So, remove might_sleep() from it. Signed-off-by: Shakeel Butt <shake...@google.com> --- fs/inode.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index d1e35b53bb23..528f3159b928 100644 --- a/fs/inode.c +++ b/f

Re: [PATCH] mm, shrinker: make shrinker_list lockless

2017-11-07 Thread Shakeel Butt
> if (next_deferred >= scanned) > @@ -468,18 +487,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > if (nr_scanned == 0) > nr_scanned = SWAP_CLUSTER_MAX; > > - if (!down_read_trylock(_rwsem)) { > - /* > -* If we

[PATCH] mm, shrinker: make shrinker_list lockless

2017-11-07 Thread Shakeel Butt
been introduced to avoid synchronize_rcu() call. The fields of struct shrinker has been rearraged to make sure that the size does not increase. Signed-off-by: Shakeel Butt <shake...@google.com> --- include/linux/shrinker.h | 4 +++- mm/vmscan.c

Re: [PATCH v2] mm, shrinker: make shrinker_list lockless

2017-11-09 Thread Shakeel Butt
> > If you can accept serialized register_shrinker()/unregister_shrinker(), > I think that something like shown below can do it. > Thanks. > -- > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h > index 388ff29..e2272dd 100644 > --- a/include/linux/shrinker.h > +++

[PATCH v2] mm, shrinker: make shrinker_list lockless

2017-11-08 Thread Shakeel Butt
lock but then ifdefs has to be used as SRCU is behind CONFIG_SRCU. Another way is to just release the rcu read lock before calling the shrinker and reacquire on the return. The atomic counter will make sure that the shrinker entry will not be freed under us. Signed-off-by: Shakeel Butt <sh

Re: [PATCH v2] mm, shrinker: make shrinker_list lockless

2017-11-08 Thread Shakeel Butt
On Wed, Nov 8, 2017 at 4:07 PM, Minchan Kim <minc...@kernel.org> wrote: > Hi, > > On Wed, Nov 08, 2017 at 09:37:40AM -0800, Shakeel Butt wrote: >> In our production, we have observed that the job loader gets stuck for >> 10s of seconds while doing mount operation. It tur

[PATCH] mm, mlock, vmscan: no more skipping pagevecs

2017-11-04 Thread Shakeel Butt
. Without this patch, the pages allocated for System V shared memory segment are added to evictable LRUs even after shmctl(SHM_LOCK) on that segment. This patch will correctly put such pages to unevictable LRU. Signed-off-by: Shakeel Butt <shake...@google.com> --- include/linux/swap.h | 2 -- mm/

Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Shakeel Butt
On Tue, Oct 31, 2017 at 9:40 AM, Johannes Weiner <han...@cmpxchg.org> wrote: > On Tue, Oct 31, 2017 at 08:04:19AM -0700, Shakeel Butt wrote: >> > + >> > +static void select_victim_memcg(struct mem_cgroup *root, struct >> > oom_control *oc) >>

Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Shakeel Butt
> + > +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control > *oc) > +{ > + struct mem_cgroup *iter; > + > + oc->chosen_memcg = NULL; > + oc->chosen_points = 0; > + > + /* > +* The oom_score is calculated for leaf memory cgroups (including >

Re: [PATCH] fs, mm: account filp and names caches to kmemcg

2017-10-30 Thread Shakeel Butt
On Mon, Oct 30, 2017 at 1:29 AM, Michal Hocko <mho...@kernel.org> wrote: > On Fri 27-10-17 13:50:47, Shakeel Butt wrote: >> > Why is OOM-disabling a thing? Why isn't this simply a "kill everything >> > else before you kill me"? It's crashing the kernel

Re: [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes

2017-12-20 Thread Shakeel Butt
On Wed, Dec 20, 2017 at 3:34 AM, Michal Hocko wrote: > On Wed 20-12-17 14:32:19, Andrey Ryabinin wrote: >> On 12/20/2017 01:33 PM, Michal Hocko wrote: >> > On Wed 20-12-17 13:24:28, Andrey Ryabinin wrote: >> >> mem_cgroup_resize_[memsw]_limit() tries to free only 32

[PATCH] mm: memcontrol: drain stocks on resize limit

2018-05-04 Thread Shakeel Butt
Resizing the memcg limit for cgroup-v2 drains the stocks before triggering the memcg reclaim. Do the same for cgroup-v1 to make the behavior consistent. Signed-off-by: Shakeel Butt <shake...@google.com> --- mm/memcontrol.c | 7 +++ 1 file changed, 7 insertions(+) diff --gi

[PATCH] mm: memcontrol: drain memcg stock on force_empty

2018-05-07 Thread Shakeel Butt
Junaid Shahid <juan...@google.com> Signed-off-by: Shakeel Butt <shake...@google.com> --- mm/memcontrol.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e2d33a37f971..2c3c69524b49 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2841,6 +28

Re: [PATCH] mm: memcontrol: drain memcg stock on force_empty

2018-05-07 Thread Shakeel Butt
On Mon, May 7, 2018 at 1:16 PM Shakeel Butt <shake...@google.com> wrote: > From: Junaid Shahid <juna...@google.com> > The per-cpu memcg stock can retain a charge of upto 32 pages. On a > machine with large number of cpus, this can amount to a decent amount > of memory.

Re: [PATCH v4 01/13] mm: Assign id to every memcg-aware shrinker

2018-05-09 Thread Shakeel Butt
On Wed, May 9, 2018 at 3:55 PM Andrew Morton wrote: > On Wed, 09 May 2018 14:56:55 +0300 Kirill Tkhai wrote: > > The patch introduces shrinker::id number, which is used to enumerate > > memcg-aware shrinkers. The number start from 0, and the

[PATCH] mm: fix race between kmem_cache destroy, create and deactivate

2018-05-21 Thread Shakeel Butt
. This patch make alias count explicit and adds reference counting to the root kmem_caches. The reference of a root kmem cache is elevated on merge and while its memcg kmem_cache is in the process of creation or deactivation. Signed-off-by: Shakeel Butt <shake...@google.com> --- include

Re: [PATCH] mm: fix race between kmem_cache destroy, create and deactivate

2018-05-21 Thread Shakeel Butt
On Mon, May 21, 2018 at 11:42 AM Andrew Morton <a...@linux-foundation.org> wrote: > On Mon, 21 May 2018 10:41:16 -0700 Shakeel Butt <shake...@google.com> wrote: > > The memcg kmem cache creation and deactivation (SLUB only) is > > asynchronous. If a root kmem cache is

[PATCH v2] mm: fix race between kmem_cache destroy, create and deactivate

2018-05-22 Thread Shakeel Butt
kmem_cache is not destroyed in the middle. As the reference of kmem_cache is elevated on sharing, the 'shared_count' does not need any locking protection as at worst it can be out-dated for a small window which is tolerable. Signed-off-by: Shakeel Butt <shake...@google.com> --- Changelog si

[PATCH] memcg: force charge kmem counter too

2018-05-25 Thread Shakeel Butt
m counter is set and reached. Signed-off-by: Shakeel Butt <shake...@google.com> --- mm/memcontrol.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ab5673dbfc4e..0a88f824c550 100644 --- a/mm/memcontrol.c +++ b/

Re: [PATCH] doc: document scope NOFS, NOIO APIs

2018-05-24 Thread Shakeel Butt
On Thu, May 24, 2018 at 4:43 AM, Michal Hocko wrote: > From: Michal Hocko > > Although the api is documented in the source code Ted has pointed out > that there is no mention in the core-api Documentation and there are > people looking there to find answers

Re: [PATCH] mm: save two stranding bit in gfp_mask

2018-05-16 Thread Shakeel Butt
On Wed, May 16, 2018 at 1:41 PM Vlastimil Babka <vba...@suse.cz> wrote: > On 05/16/2018 10:20 PM, Shakeel Butt wrote: > > ___GFP_COLD and ___GFP_OTHER_NODE were removed but their bits were > > stranded. Slide existing gfp masks to make those two bits available. > Well, the

[PATCH] mm: save two stranding bit in gfp_mask

2018-05-16 Thread Shakeel Butt
___GFP_COLD and ___GFP_OTHER_NODE were removed but their bits were stranded. Slide existing gfp masks to make those two bits available. Signed-off-by: Shakeel Butt <shake...@google.com> --- include/linux/gfp.h | 42 +- 1 file changed, 21 insertions(

[PATCH v2] mm: save two stranding bit in gfp_mask

2018-05-16 Thread Shakeel Butt
___GFP_COLD and ___GFP_OTHER_NODE were removed but their bits were stranded. Fill the gaps by moving the existing gfp masks around. Signed-off-by: Shakeel Butt <shake...@google.com> Suggested-by: Vlastimil Babka <vba...@suse.cz> Acked-by: Michal Hocko <mho...@suse.com> ---

Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-10 Thread Shakeel Butt
On Sun, Jun 10, 2018 at 9:32 AM Paul E. McKenney wrote: > > On Sun, Jun 10, 2018 at 07:52:50AM -0700, Shakeel Butt wrote: > > On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov > > wrote: > > > > > > On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt

Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-10 Thread Shakeel Butt
On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov wrote: > > On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt wrote: > > The memcg kmem cache creation and deactivation (SLUB only) is > > asynchronous. If a root kmem cache is destroyed whose memcg cache is in > >

Re: [PATCH] mm: fix null pointer dereference in mem_cgroup_protected

2018-06-08 Thread Shakeel Butt
happens because parent_mem_cgroup() returns a NULL > pointer, which is dereferenced later without a check. > > As cgroup v1 has no memory guarantee support, let's make > mem_cgroup_protected() immediately return MEMCG_PROT_NONE, > if the given cgroup has no parent (non-hierarchi

Re: [PATCH v7 15/17] mm: Generalize shrink_slab() calls in shrink_node()

2018-06-08 Thread Shakeel Butt
On Tue, May 22, 2018 at 3:09 AM Kirill Tkhai wrote: > > From: Vladimir Davydov > > The patch makes shrink_slab() be called for root_mem_cgroup > in the same way as it's called for the rest of cgroups. > This simplifies the logic and improves the readability. > > Signed-off-by: Vladimir Davydov

Re: [PATCH 6/6] Convert intel uncore to struct_size

2018-06-07 Thread Shakeel Butt
On Thu, Jun 7, 2018 at 10:30 AM Ralph Campbell wrote: > > > > On 06/07/2018 07:57 AM, Matthew Wilcox wrote: > > From: Matthew Wilcox > > > > Need to do a bit of rearranging to make this work. > > > > Signed-off-by: Matthew Wilcox > > --- > > arch/x86/events/intel/uncore.c | 19

[PATCH v4] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-11 Thread Shakeel Butt
includes RCU callback and thus make sure all previous registered RCU callbacks have completed as well. Signed-off-by: Shakeel Butt --- Changelog since v3: - Handle the RCU callbacks for SLUB deactivation Changelog since v2: - Rewrote the patch and used workqueue flushing instead of refcount

Re: Possible regression in "slab, slub: skip unnecessary kasan_cache_shutdown()"

2018-06-19 Thread Shakeel Butt
On Tue, Jun 19, 2018 at 8:19 AM Jason A. Donenfeld wrote: > > On Tue, Jun 19, 2018 at 5:08 PM Shakeel Butt wrote: > > > > Are you using SLAB or SLUB? We stress kernel pretty heavily, but with > > > > SLAB, and I suspect Shakeel may also be using SLAB. So

Re: [PATCH v6 0/3] Directed kmem charging

2018-06-19 Thread Shakeel Butt
ils. Please fold patch 1 and introduce API along with the > users. > Thanks a lot for the review. Ack, I will do as you suggested in next version. > On Mon, Jun 18, 2018 at 10:13:24PM -0700, Shakeel Butt wrote: > > This patchset introduces memcg variant memory allocation functio

  1   2   3   4   5   6   7   8   9   10   >