> A more useful metric for memory pressure at this point is quantifying
> that time you spend thrashing: time the job spends in direct reclaim
> and on the flipside time the job waits for recently evicted pages to
> come back. Combined, that gives you a good measure of overhead from
> memory
On Mon, Mar 6, 2017 at 2:33 AM, Michal Hocko wrote:
> From: Michal Hocko
>
> fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc
> with vmalloc fallback. Use the kvmalloc variant instead. Keep the
> __GFP_REPEAT flag based on explanation from
On Fri, Mar 31, 2017 at 8:30 AM, Andrey Ryabinin
wrote:
> zswap_frontswap_store() is called during memory reclaim from
> __frontswap_store() from swap_writepage() from shrink_page_list().
> This may happen in NOFS context, thus zswap shouldn't use __GFP_FS,
> otherwise we
dy to make a forward
progress. So, add kswapd_failures check on the throttle_direct_reclaim
condition.
Signed-off-by: Shakeel Butt <shake...@google.com>
Suggested-by: Michal Hocko <mho...@suse.com>
Suggested-by: Johannes Weiner <han...@cmpxchg.org>
Acked-by: Hillf Danton <hillf...@al
On Thu, Mar 16, 2017 at 12:57 PM, Johannes Weiner <han...@cmpxchg.org> wrote:
> On Sat, Mar 11, 2017 at 09:52:15AM -0800, Shakeel Butt wrote:
>> On Sat, Mar 11, 2017 at 5:51 AM, Yisheng Xie <ys...@foxmail.com> wrote:
>> > @@ -2808,7 +2826,7 @@ static unsigned lo
o avoid this time costly and useless retrying, add a stub function
> may_thrash and return true when memcg is disabled or on legacy
> hierarchy.
>
> Signed-off-by: Yisheng Xie <xieyishe...@huawei.com>
> Suggested-by: Shakeel Butt <shake...@google.com>
> ---
&
On Mon, Mar 13, 2017 at 1:33 AM, Michal Hocko wrote:
> Please do not post new version after a single feedback and try to wait
> for more review to accumulate. This is in the 3rd version and it is not
> clear why it is still an RFC.
>
> On Sun 12-03-17 19:06:10, Yisheng Xie
On Mon, Mar 13, 2017 at 2:02 AM, Michal Hocko <mho...@kernel.org> wrote:
> On Fri 10-03-17 11:46:20, Shakeel Butt wrote:
>> Recently kswapd has been modified to give up after MAX_RECLAIM_RETRIES
>> number of unsucessful iterations. Before going to sleep, kswapd thread
&g
On Mon, Mar 13, 2017 at 8:46 AM, Michal Hocko <mho...@kernel.org> wrote:
> On Mon 13-03-17 08:07:15, Shakeel Butt wrote:
>> On Mon, Mar 13, 2017 at 2:02 AM, Michal Hocko <mho...@kernel.org> wrote:
>> > On Fri 10-03-17 11:46:20, Shakeel Butt wrote:
>> >> Re
o avoid this time costly and useless retrying, add a stub function
> mem_cgroup_thrashed() and return true when memcg is disabled or on
> legacy hierarchy.
>
> Signed-off-by: Yisheng Xie <xieyishe...@huawei.com>
> Suggested-by: Shakeel Butt <shake...@google.com>
Thanks.
-by: Shakeel Butt <shake...@google.com>
---
mm/vmscan.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bae698484e8e..b2d24cc7a161 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2819,6 +2819,12 @@ static bool pfmemalloc_waterm
On Fri, Mar 10, 2017 at 6:19 PM, Yisheng Xie wrote:
> From: Yisheng Xie
>
> When we enter do_try_to_free_pages, the may_thrash is always clear, and
> it will retry shrink zones to tap cgroup's reserves memory by setting
> may_thrash when the former
-by: Shakeel Butt <shake...@google.com>
Suggested-by: Michal Hocko <mho...@suse.com>
Suggested-by: Johannes Weiner <han...@cmpxchg.org>
---
v2:
Instead of separate helper function for checking kswapd_failures,
added the check into pfmemalloc_watermark_ok() and renamed that
funct
On Mon, Mar 13, 2017 at 12:58 PM, Johannes Weiner <han...@cmpxchg.org> wrote:
> Hi Shakeel,
>
> On Fri, Mar 10, 2017 at 11:46:20AM -0800, Shakeel Butt wrote:
>> Recently kswapd has been modified to give up after MAX_RECLAIM_RETRIES
>> number of unsucessful iterations. Be
On Tue, Feb 28, 2017 at 1:39 PM, Johannes Weiner wrote:
> Jia He reports a problem with kswapd spinning at 100% CPU when
> requesting more hugepages than memory available in the system:
>
> $ echo 4000 >/proc/sys/vm/nr_hugepages
>
> top - 13:42:59 up 3:37, 1 user, load
On Mon, Mar 6, 2017 at 2:30 AM, Michal Hocko wrote:
> From: Michal Hocko
>
> vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
> vhost_vsock because it would really like to prefer kmalloc to the
> vmalloc fallback - see 23cc5a991c7a
On Fri, Aug 18, 2017 at 2:34 PM, Andrew Morton
<a...@linux-foundation.org> wrote:
> On Thu, 17 Aug 2017 18:20:17 -0700 Shakeel Butt <shake...@google.com> wrote:
>
>> +linux-mm, linux-kernel
>>
>> On Thu, Aug 17, 2017 at 6:10 PM, Shakeel Butt <shake...@google
+linux-mm, linux-kernel
On Thu, Aug 17, 2017 at 6:10 PM, Shakeel Butt <shake...@google.com> wrote:
> The fadvise() manpage is silent on fadvise()'s effect on
> memory-based filesystems (shmem, hugetlbfs & ramfs) and pseudo
> file systems (procfs, sysfs, kernfs). The cu
>> It doesn't sound like a risky change to me, although perhaps someone is
>> depending on the current behaviour for obscure reasons, who knows.
>>
>> What are the reasons for this change? Is the current behaviour causing
>> some sort of problem for someone?
>
> Yes, one of our generic library
>
> We would have to consider (instead of jiffies) the time the process was
> either running, or waiting on something that's related to memory
> allocation/reclaim (page lock etc.). I.e. deduct the time the process
> was runable but there was no available cpu. I expect however that such
> level of
>> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
>> - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
>> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
>
> I might be wrong but isn't name cache only holding temporary objects
> used for path
(Replying again as format of previous reply got messed up).
On Mon, Oct 2, 2017 at 1:00 PM, Tim Hockin wrote:
> In the example above:
>
>root
>/\
> A D
> / \
>B C
>
> Does oom_group allow me to express "compare A and D; if A is chosen
On Mon, Oct 2, 2017 at 12:56 PM, Michal Hocko <mho...@kernel.org> wrote:
> On Mon 02-10-17 12:45:18, Shakeel Butt wrote:
>> > I am sorry to cut the rest of your proposal because it simply goes over
>> > the scope of the proposed solution while the usecase you are mentio
the epoll references and
causing a burst of eventpoll_epi and eventpoll_pwq slab
allocations. This patch opt-in the charging of eventpoll_epi
and eventpoll_pwq slabs.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
fs/eventpoll.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
by user space applications
which has access to kvm and thus a buggy application can leak
such memory. So, these caches should be accounted to kmemcg.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
arch/x86/kvm/mmu.c | 4 ++--
virt/kvm/kvm_main.c | 2 +-
2 files changed, 3 insertions
On Thu, Oct 5, 2017 at 9:28 PM, Anshuman Khandual
<khand...@linux.vnet.ibm.com> wrote:
> On 10/06/2017 06:37 AM, Shakeel Butt wrote:
>> The kvm slabs can consume a significant amount of system memory
>> and indeed in our production environment we have observed that
On Thu, Sep 7, 2017 at 11:47 AM, Roman Gushchin <g...@fb.com> wrote:
> On Thu, Sep 07, 2017 at 11:44:12AM -0700, Shakeel Butt wrote:
>> >> As far as other types of pages go: page cache and anon are already
>> >> batched pretty well, but I think kmem might bene
>> As far as other types of pages go: page cache and anon are already
>> batched pretty well, but I think kmem might benefit from this
>> too. Have you considered using the stock in memcg_kmem_uncharge()?
>
> Good idea!
> I'll try to find an appropriate testcase and check if it really
> brings any
>
> Going back to Michal's example, say the user configured the following:
>
>root
> /\
> A D
> / \
>B C
>
> A global OOM event happens and we find this:
> - A > D
> - B, C, D are oomgroups
>
> What the user is telling us is that B, C, and D are compound
> Yes and nobody is disputing that, really. I guess the main disconnect
> here is that different people want to have more detailed control over
> the victim selection while the patchset tries to handle the most
> simplistic scenario when a no userspace control over the selection is
> required. And
> I am sorry to cut the rest of your proposal because it simply goes over
> the scope of the proposed solution while the usecase you are mentioning
> is still possible. If we want to compare intermediate nodes (which seems
> to be the case) then we can always provide a knob to opt-in - be it your
On Fri, Aug 25, 2017 at 2:49 PM, Andrew Morton
<a...@linux-foundation.org> wrote:
> On Thu, 17 Aug 2017 18:20:17 -0700 Shakeel Butt <shake...@google.com> wrote:
>
>> +linux-mm, linux-kernel
>>
>> On Thu, Aug 17, 2017 at 6:10 PM, Shakeel Butt <shake...@google
On Mon, Sep 4, 2017 at 7:21 AM, Roman Gushchin wrote:
> Introducing of cgroup-aware OOM killer changes the victim selection
> algorithm used by default: instead of picking the largest process,
> it will pick the largest memcg and then the largest process inside.
>
> This affects only
>
> I am not objecting to the patch I would just like to understand the
> runaway case. ep_insert seems to limit the maximum number of watches to
> max_user_watches which should be ~4% of lowmem if I am following the
> code properly. pwq_cache should be bound by the number of watches as
> well, or
> +
> +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control
> *oc)
> +{
> + struct mem_cgroup *iter;
> +
> + oc->chosen_memcg = NULL;
> + oc->chosen_points = 0;
> +
> + /*
> +* The oom_score is calculated for leaf memory cgroups (including
>
>> > + if (memcg_has_children(iter))
>> > + continue;
>>
>> && iter != root_mem_cgroup ?
>
> Oh, sure. I had a stupid bug in my test script, which prevented me from
> catching this. Thanks!
>
> This should fix the problem.
> --
> diff --git a/mm/memcontrol.c
that a lot of machines spend
very significant amount of memory on these caches. So, these
caches should be accounted to kmemcg.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
fs/dcache.c | 2 +-
fs/file_table.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/dcac
not specify that.
However the man page also discourages to use _sysctl() at all.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
Changelog since v1:
- removed names_cache charging to kmemcg
fs/file_table.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/file_t
On Sun, Oct 8, 2017 at 11:24 PM, Michal Hocko <mho...@kernel.org> wrote:
> On Fri 06-10-17 12:33:03, Shakeel Butt wrote:
>> >> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
>> >> -
(). Also there is no
need for local lru_add_drain() as it will be called deep inside
__mm_populate() (in follow_page_pte()).
Signed-off-by: Shakeel Butt <shake...@google.com>
---
mm/mlock.c | 5 -
1 file changed, 5 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c
index dfc6f1
On Wed, Oct 18, 2017 at 8:18 PM, Balbir Singh <bsinghar...@gmail.com> wrote:
> On Wed, 18 Oct 2017 16:17:30 -0700
> Shakeel Butt <shake...@google.com> wrote:
>
>> Recently we have observed high latency in mlock() in our generic
>> library and noticed that users ha
> [...]
>>
>> Sorry for the confusion. I wanted to say that if the pages which are
>> being mlocked are on caches of remote cpus then lru_add_drain_all will
>> move them to their corresponding LRUs and then remaining functionality
>> of mlock will move them again from their evictable LRUs to
On Thu, Oct 19, 2017 at 5:32 AM, Michal Hocko <mho...@kernel.org> wrote:
> On Wed 18-10-17 16:17:30, Shakeel Butt wrote:
>> Recently we have observed high latency in mlock() in our generic
>> library and noticed that users have started using tmpfs files even
>> without s
On Wed, Oct 18, 2017 at 11:24 PM, Anshuman Khandual
<khand...@linux.vnet.ibm.com> wrote:
> On 10/19/2017 04:47 AM, Shakeel Butt wrote:
>> Recently we have observed high latency in mlock() in our generic
>> library and noticed that users have started using tmpfs files
On Thu, Oct 19, 2017 at 3:18 AM, Kirill A. Shutemov
<kir...@shutemov.name> wrote:
> On Wed, Oct 18, 2017 at 04:17:30PM -0700, Shakeel Butt wrote:
>> Recently we have observed high latency in mlock() in our generic
>> library and noticed that users have started using tmpfs
On Thu, Oct 19, 2017 at 1:13 PM, Michal Hocko <mho...@kernel.org> wrote:
> On Thu 19-10-17 12:46:50, Shakeel Butt wrote:
>> > [...]
>> >>
>> >> Sorry for the confusion. I wanted to say that if the pages which are
>> >> being mlock
On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim wrote:
> On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote:
>> When shrinker_rwsem was introduced, it was assumed that
>> register_shrinker()/unregister_shrinker() are really unlikely paths
>> which are called during
On Wed, Nov 15, 2017 at 4:46 PM, Minchan Kim <minc...@kernel.org> wrote:
> On Tue, Nov 14, 2017 at 10:28:10PM -0800, Shakeel Butt wrote:
>> On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim <minc...@kernel.org> wrote:
>> > On Tue, Nov 14, 2017 at 06:37:42AM +0900,
Ping, really appreciate comments on this patch.
On Sat, Nov 4, 2017 at 3:43 PM, Shakeel Butt <shake...@google.com> wrote:
> When a thread mlocks an address space backed by file, a new
> page is allocated (assuming file page is not in memory), added
> to the local pagevec (lru
oups whose THPs were swapped out to become zombies on
deletion.
Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
Signed-off-by: Shakeel Butt <shake...@google.com>
Cc: sta...@vger.kernel.org
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+),
On Tue, Nov 28, 2017 at 12:00 PM, Michal Hocko <mho...@kernel.org> wrote:
> On Tue 28-11-17 08:19:41, Shakeel Butt wrote:
>> The commit d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout()
>> support THP") changed mem_cgroup_swapout() to support transpar
On Fri, Nov 24, 2017 at 10:01 AM, Martin Steigerwald
wrote:
> Hi Matthew.
>
> Matthew Wilcox - 24.11.17, 18:03:
>> On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
>> > Matthew Wilcox - 24.11.17, 02:16:
>> > > ==
>> > > XArray
>> > > ==
>> > >
>> >
On Fri, Sep 29, 2017 at 1:15 AM, Kirill Tkhai wrote:
> On 29.09.2017 00:02, Andrew Morton wrote:
>> On Thu, 28 Sep 2017 10:48:55 +0300 Kirill Tkhai wrote:
>>
> This patch aims to make super_cache_count() (and other functions,
> which count LRU
On Tue, Dec 19, 2017 at 4:49 AM, Michal Hocko <mho...@kernel.org> wrote:
> On Mon 18-12-17 16:01:31, Shakeel Butt wrote:
>> The memory controller in cgroup v1 provides the memory+swap (memsw)
>> interface to account to the combined usage of memory and swap of the
>>
On Tue, Dec 19, 2017 at 7:24 AM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On Tue, Dec 19, 2017 at 07:12:19AM -0800, Shakeel Butt wrote:
>> Yes, there are pros & cons, therefore we should give users the option
>> to select the API that is better suited for thei
On Tue, Dec 19, 2017 at 1:41 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On Tue, Dec 19, 2017 at 10:25:12AM -0800, Shakeel Butt wrote:
>> Making the runtime environment, an invariant is very critical to make
>> the management of a job easier whose instances r
On Tue, Dec 19, 2017 at 9:33 AM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On Tue, Dec 19, 2017 at 09:23:29AM -0800, Shakeel Butt wrote:
>> To provide consistent memory usage history using the current
>> cgroup-v2's 'swap' interface, an additional metric exp
if there are no decendants of the root cgroup.
When memsw accounting is enabled then "memory.high" is comapred with
memory+swap usage. So, when the allocating job's memsw usage hits its
high mark, the job will be throttled by triggering memory reclaim.
Signed-off-by: Shakeel Butt <shake.
On Fri, Nov 17, 2017 at 9:41 AM, Yafang Shao <laoar.s...@gmail.com> wrote:
> 2017-11-18 1:35 GMT+08:00 Shakeel Butt <shake...@google.com>:
>> On Fri, Nov 17, 2017 at 9:09 AM, Yafang Shao <laoar.s...@gmail.com> wrote:
>>> 2017-11-18 0:45 GMT+08:00 Roman Gushchi
>> > On Thu, Nov 16, 2017 at 08:43:17PM -0800, Shakeel Butt wrote:
>>> >> On Thu, Nov 16, 2017 at 7:09 PM, Yafang Shao <laoar.s...@gmail.com>
>>> >> wrote:
>>> >> > Currently the default tmpfs size is totalram_pages / 2 if mount tmpf
On Fri, Nov 17, 2017 at 9:41 AM, Shakeel Butt <shake...@google.com> wrote:
> On Fri, Nov 17, 2017 at 9:35 AM, Christoph Hellwig <h...@infradead.org> wrote:
>> On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote:
>>> Since do_shrink_slab() can reschedule, w
On Fri, Nov 17, 2017 at 9:35 AM, Christoph Hellwig wrote:
> On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote:
>> Since do_shrink_slab() can reschedule, we cannot protect shrinker_list
>> using one RCU section. But using atomic_inc()/atomic_dec() for each
>>
On Thu, Nov 9, 2017 at 1:46 PM, Tetsuo Handa
<penguin-ker...@i-love.sakura.ne.jp> wrote:
> Shakeel Butt wrote:
>> > If you can accept serialized register_shrinker()/unregister_shrinker(),
>> > I think that something like shown below can do it.
>>
>
On Tue, Nov 21, 2017 at 7:32 AM, Johannes Weiner <han...@cmpxchg.org> wrote:
> On Sat, Nov 04, 2017 at 03:43:12PM -0700, Shakeel Butt wrote:
>> When a thread mlocks an address space backed by file, a new
>> page is allocated (assuming file page is not in memory), added
&g
On Tue, Nov 21, 2017 at 7:06 AM, Johannes Weiner <han...@cmpxchg.org> wrote:
> On Tue, Nov 21, 2017 at 01:39:57PM +0100, Vlastimil Babka wrote:
>> On 11/04/2017 11:43 PM, Shakeel Butt wrote:
>> > When a thread mlocks an address space backed by file, a new
>> > page
On Tue, Nov 21, 2017 at 7:06 AM, Johannes Weiner <han...@cmpxchg.org> wrote:
> On Tue, Nov 21, 2017 at 01:39:57PM +0100, Vlastimil Babka wrote:
>> On 11/04/2017 11:43 PM, Shakeel Butt wrote:
>> > When a thread mlocks an address space backed by file, a new
>> > page
out this patch, the
pages allocated for System V shared memory segment are added to
evictable LRUs even after shmctl(SHM_LOCK) on that segment. This
patch will correctly put such pages to unevictable LRU.
Signed-off-by: Shakeel Butt <shake...@google.com>
Acked-by: Vlastimil Babka <vb
On Thu, Nov 16, 2017 at 7:09 PM, Yafang Shao wrote:
> Currently the default tmpfs size is totalram_pages / 2 if mount tmpfs
> without "-o size=XXX".
> When we mount tmpfs in a container(i.e. docker), it is also
> totalram_pages / 2 regardless of the memory limit on this
expect that
> do_shrink_slab() of unregistering shrinker likely returns shortly, and
> we can avoid khungtaskd warnings when do_shrink_slab() of unregistering
> shrinker unexpectedly took so long.
>
> Signed-off-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp>
Reviewed-and-
After
these patches there is no sleeping operation in clear_inode(). So,
remove might_sleep() from it.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
fs/inode.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c
index d1e35b53bb23..528f3159b928 100644
--- a/fs/inode.c
+++ b/f
> if (next_deferred >= scanned)
> @@ -468,18 +487,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> if (nr_scanned == 0)
> nr_scanned = SWAP_CLUSTER_MAX;
>
> - if (!down_read_trylock(_rwsem)) {
> - /*
> -* If we
been introduced to avoid synchronize_rcu() call. The fields of
struct shrinker has been rearraged to make sure that the size does
not increase.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
include/linux/shrinker.h | 4 +++-
mm/vmscan.c
>
> If you can accept serialized register_shrinker()/unregister_shrinker(),
> I think that something like shown below can do it.
>
Thanks.
> --
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 388ff29..e2272dd 100644
> --- a/include/linux/shrinker.h
> +++
lock but then ifdefs has to be used as SRCU is behind
CONFIG_SRCU. Another way is to just release the rcu read lock before
calling the shrinker and reacquire on the return. The atomic counter
will make sure that the shrinker entry will not be freed under us.
Signed-off-by: Shakeel Butt <sh
On Wed, Nov 8, 2017 at 4:07 PM, Minchan Kim <minc...@kernel.org> wrote:
> Hi,
>
> On Wed, Nov 08, 2017 at 09:37:40AM -0800, Shakeel Butt wrote:
>> In our production, we have observed that the job loader gets stuck for
>> 10s of seconds while doing mount operation. It tur
. Without this patch, the
pages allocated for System V shared memory segment are added to
evictable LRUs even after shmctl(SHM_LOCK) on that segment. This
patch will correctly put such pages to unevictable LRU.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
include/linux/swap.h | 2 --
mm/
On Tue, Oct 31, 2017 at 9:40 AM, Johannes Weiner <han...@cmpxchg.org> wrote:
> On Tue, Oct 31, 2017 at 08:04:19AM -0700, Shakeel Butt wrote:
>> > +
>> > +static void select_victim_memcg(struct mem_cgroup *root, struct
>> > oom_control *oc)
>>
> +
> +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control
> *oc)
> +{
> + struct mem_cgroup *iter;
> +
> + oc->chosen_memcg = NULL;
> + oc->chosen_points = 0;
> +
> + /*
> +* The oom_score is calculated for leaf memory cgroups (including
>
On Mon, Oct 30, 2017 at 1:29 AM, Michal Hocko <mho...@kernel.org> wrote:
> On Fri 27-10-17 13:50:47, Shakeel Butt wrote:
>> > Why is OOM-disabling a thing? Why isn't this simply a "kill everything
>> > else before you kill me"? It's crashing the kernel
On Wed, Dec 20, 2017 at 3:34 AM, Michal Hocko wrote:
> On Wed 20-12-17 14:32:19, Andrey Ryabinin wrote:
>> On 12/20/2017 01:33 PM, Michal Hocko wrote:
>> > On Wed 20-12-17 13:24:28, Andrey Ryabinin wrote:
>> >> mem_cgroup_resize_[memsw]_limit() tries to free only 32
Resizing the memcg limit for cgroup-v2 drains the stocks before
triggering the memcg reclaim. Do the same for cgroup-v1 to make the
behavior consistent.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
mm/memcontrol.c | 7 +++
1 file changed, 7 insertions(+)
diff --gi
Junaid Shahid <juan...@google.com>
Signed-off-by: Shakeel Butt <shake...@google.com>
---
mm/memcontrol.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e2d33a37f971..2c3c69524b49 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2841,6 +28
On Mon, May 7, 2018 at 1:16 PM Shakeel Butt <shake...@google.com> wrote:
> From: Junaid Shahid <juna...@google.com>
> The per-cpu memcg stock can retain a charge of upto 32 pages. On a
> machine with large number of cpus, this can amount to a decent amount
> of memory.
On Wed, May 9, 2018 at 3:55 PM Andrew Morton
wrote:
> On Wed, 09 May 2018 14:56:55 +0300 Kirill Tkhai
wrote:
> > The patch introduces shrinker::id number, which is used to enumerate
> > memcg-aware shrinkers. The number start from 0, and the
.
This patch make alias count explicit and adds reference counting to the
root kmem_caches. The reference of a root kmem cache is elevated on
merge and while its memcg kmem_cache is in the process of creation or
deactivation.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
include
On Mon, May 21, 2018 at 11:42 AM Andrew Morton <a...@linux-foundation.org>
wrote:
> On Mon, 21 May 2018 10:41:16 -0700 Shakeel Butt <shake...@google.com>
wrote:
> > The memcg kmem cache creation and deactivation (SLUB only) is
> > asynchronous. If a root kmem cache is
kmem_cache is not destroyed
in the middle. As the reference of kmem_cache is elevated on sharing,
the 'shared_count' does not need any locking protection as at worst it
can be out-dated for a small window which is tolerable.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
Changelog si
m counter is set and reached.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
mm/memcontrol.c | 21 +++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab5673dbfc4e..0a88f824c550 100644
--- a/mm/memcontrol.c
+++ b/
On Thu, May 24, 2018 at 4:43 AM, Michal Hocko wrote:
> From: Michal Hocko
>
> Although the api is documented in the source code Ted has pointed out
> that there is no mention in the core-api Documentation and there are
> people looking there to find answers
On Wed, May 16, 2018 at 1:41 PM Vlastimil Babka <vba...@suse.cz> wrote:
> On 05/16/2018 10:20 PM, Shakeel Butt wrote:
> > ___GFP_COLD and ___GFP_OTHER_NODE were removed but their bits were
> > stranded. Slide existing gfp masks to make those two bits available.
> Well, the
___GFP_COLD and ___GFP_OTHER_NODE were removed but their bits were
stranded. Slide existing gfp masks to make those two bits available.
Signed-off-by: Shakeel Butt <shake...@google.com>
---
include/linux/gfp.h | 42 +-
1 file changed, 21 insertions(
___GFP_COLD and ___GFP_OTHER_NODE were removed but their bits were
stranded. Fill the gaps by moving the existing gfp masks around.
Signed-off-by: Shakeel Butt <shake...@google.com>
Suggested-by: Vlastimil Babka <vba...@suse.cz>
Acked-by: Michal Hocko <mho...@suse.com>
---
On Sun, Jun 10, 2018 at 9:32 AM Paul E. McKenney
wrote:
>
> On Sun, Jun 10, 2018 at 07:52:50AM -0700, Shakeel Butt wrote:
> > On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov
> > wrote:
> > >
> > > On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt
On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov wrote:
>
> On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt wrote:
> > The memcg kmem cache creation and deactivation (SLUB only) is
> > asynchronous. If a root kmem cache is destroyed whose memcg cache is in
> >
happens because parent_mem_cgroup() returns a NULL
> pointer, which is dereferenced later without a check.
>
> As cgroup v1 has no memory guarantee support, let's make
> mem_cgroup_protected() immediately return MEMCG_PROT_NONE,
> if the given cgroup has no parent (non-hierarchi
On Tue, May 22, 2018 at 3:09 AM Kirill Tkhai wrote:
>
> From: Vladimir Davydov
>
> The patch makes shrink_slab() be called for root_mem_cgroup
> in the same way as it's called for the rest of cgroups.
> This simplifies the logic and improves the readability.
>
> Signed-off-by: Vladimir Davydov
On Thu, Jun 7, 2018 at 10:30 AM Ralph Campbell wrote:
>
>
>
> On 06/07/2018 07:57 AM, Matthew Wilcox wrote:
> > From: Matthew Wilcox
> >
> > Need to do a bit of rearranging to make this work.
> >
> > Signed-off-by: Matthew Wilcox
> > ---
> > arch/x86/events/intel/uncore.c | 19
includes RCU
callback and thus make sure all previous registered RCU callbacks
have completed as well.
Signed-off-by: Shakeel Butt
---
Changelog since v3:
- Handle the RCU callbacks for SLUB deactivation
Changelog since v2:
- Rewrote the patch and used workqueue flushing instead of refcount
On Tue, Jun 19, 2018 at 8:19 AM Jason A. Donenfeld wrote:
>
> On Tue, Jun 19, 2018 at 5:08 PM Shakeel Butt wrote:
> > > > Are you using SLAB or SLUB? We stress kernel pretty heavily, but with
> > > > SLAB, and I suspect Shakeel may also be using SLAB. So
ils. Please fold patch 1 and introduce API along with the
> users.
>
Thanks a lot for the review. Ack, I will do as you suggested in next version.
> On Mon, Jun 18, 2018 at 10:13:24PM -0700, Shakeel Butt wrote:
> > This patchset introduces memcg variant memory allocation functio
1 - 100 of 1184 matches
Mail list logo