ped exponentially.
Low limits don't affect soft reclaim.
Also, it's possible that a cgroup with memory usage under low limit
will be reclaimed slowly on very low scanning priorities.
Signed-off-by: Roman Gushchin
---
include/linux/memcontrol.h |7 +
include/linux/res_counter.h
lso, it can be so, that my preferable cgroup is higher above it's soft
limit than
other cgroups (and it's hard to control), so it will be reclaimed more
intensively than necessary.
>> Signed-off-by: Roman Gushchin
>> ---
>> include/linux/memcontrol.h | 7
27.02.2013, 13:41, "Michal Hocko" :
> Let me restate what I have already mentioned in the private
> communication.
>
> We already have soft limit which can be implemented to achieve the
> same/similar functionality and in fact this is a long term objective (at
> least for me). I hope I will be able
Please find my comments below.
> More comments on the code bellow.
>
> [...]
>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 53b8201..d8e6ee6 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1743,6 +1743,53 @@ static void mem_cgroup_out_of_memory(struct
>> mem_cgr
27.02.2013, 20:14, "Michal Hocko" :
> On Wed 27-02-13 14:39:36, Roman Gushchin wrote:
>
>> 27.02.2013, 13:41, "Michal Hocko" :
>>> Let me restate what I have already mentioned in the private
>>> communication.
>>>
>>> We alrea
Hi Simon,
20.12.2012, 10:21, "Simon Jeons" :
> On Sun, 2012-12-16 at 02:15 +, Eric Wong wrote:
>
>> xtu4 wrote:
>>> resend it, due to format error
>>>
>>> Subject: [PATCH] when system in low memory scenario, imaging there is a mp3
>>> play, ora video play, we need to read mp3 or video fi
cations as soon as memory will be de-fragmented.
Signed-off-by: Roman Gushchin
---
include/linux/gfp.h | 4 +++-
mm/page_alloc.c | 3 +++
mm/slub.c | 3 ++-
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0f615eb..073a90a 10
On 14.06.2013 18:32, Christoph Lameter wrote:
On Fri, 14 Jun 2013, Roman Gushchin wrote:
Slub tries to allocate contiguous pages even if memory is fragmented and
there are no free contiguous pages. In this case it calls direct compaction
to allocate contiguous page. Compaction requires the
On 14.06.2013 20:08, Christoph Lameter wrote:
On Fri, 14 Jun 2013, Roman Gushchin wrote:
But there is an actual problem, that this patch solves.
Sometimes I saw the following issue on some machines:
all CPUs are performing compaction, system time is about 80%,
system is completely unreliable
On 15.06.2013 00:26, David Rientjes wrote:
On Fri, 14 Jun 2013, Christoph Lameter wrote:
It's possible to avoid such problems (or at least to make them less probable)
by avoiding direct compaction. If it's not possible to allocate a contiguous
page without compaction, slub will fall back to ord
On 17.06.2013 18:27, Michal Hocko wrote:
On Mon 17-06-13 16:34:23, Roman Gushchin wrote:
On 15.06.2013 00:26, David Rientjes wrote:
On Fri, 14 Jun 2013, Christoph Lameter wrote:
It's possible to avoid such problems (or at least to make them less probable)
by avoiding direct compactio
Hi!
While investigating some compaction-related problems, I noticed, that many
(even most)
kernel objects are allocated on slabs with order 2 or 3.
This behavior was introduced by commit 9b2cd506e "slub: Calculate min_objects
based on
number of processors." by Christoph Lameter.
As I understan
On 29.05.2013 09:08, Eric Dumazet wrote:
On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote:
On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
On 28.05.2013 04:12, Eric Dumazet wrote:
About your earlier question, I
On 29.05.2013 23:06, Eric Dumazet wrote:
On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
True, these lookup functions are usually structured the same around the
hlist_nulls_for_each_entry_rcu() loop.
A barrier() right before the loop seems to be a benefit, the size of
assembly code is
On 30.05.2013 22:04, Johannes Weiner wrote:
+/*
+ * Monotonic workingset clock for non-resident pages.
+ *
+ * The refault distance of a page is the number of ticks that occurred
+ * between that page's eviction and subsequent refault.
+ *
+ * Every page slot that is taken away from the inactive
On 07.06.2013 18:12, Christoph Lameter wrote:
On Fri, 7 Jun 2013, Roman Gushchin wrote:
As I understand, the idea was to make kernel allocations cheaper by reducing
the total
number of page allocations (allocating 1 page with order 3 is cheaper than
allocating
8 1-ordered pages).
Its also
On 18.06.2013 01:44, David Rientjes wrote:
On Mon, 17 Jun 2013, Roman Gushchin wrote:
They certainly aren't enough, the kernel you're running suffers from a
couple different memory compaction issues that were fixed in 3.7. I
couldn't sympathize with your situation more, I
xconn=-100
error: "Invalid argument" setting key "net.core.somaxconn"
Signed-off-by: Roman Gushchin
---
net/core/sysctl_net_core.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cfdb46a..2ff093b
Original Message
Subject: Re: [PATCH] net: check net.core.somaxconn sysctl values
Date: Wed, 31 Jul 2013 07:37:37 -0700
From: Eric Dumazet
To: Roman Gushchin
CC: David S. Miller , raise.s...@gmail.com,
ebied...@xmission.com, net...@vger.kernel.org, linux-kernel
On 31.07.2013 18:37, Eric Dumazet wrote:
On Wed, 2013-07-31 at 17:57 +0400, Roman Gushchin wrote:
It's possible to assign an invalid value to the net.core.somaxconn
sysctl variable, because there is no checks at all.
The sk_max_ack_backlog field of the sock structure is defined as
uns
On 01.08.2013 04:10, David Miller wrote:
From: Roman Gushchin
Date: Wed, 31 Jul 2013 17:57:35 +0400
---
net/core/sysctl_net_core.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cfdb46a..2ff093b 100644
xconn=-100
error: "Invalid argument" setting key "net.core.somaxconn"
Based on a prior patch from Changli Gao.
Signed-off-by: Roman Gushchin
Reported-by: Changli Gao
Suggested-by: Eric Dumazet
Acked-by: Eric Dumazet
---
net/core/sysctl_net_core.c | 6 +-
1 file changed
On 03.08.2013 02:19, David Miller wrote:
From: Roman Gushchin
Date: Fri, 2 Aug 2013 18:36:40 +0400
It's possible to assign an invalid value to the net.core.somaxconn
sysctl variable, because there is no checks at all.
The sk_max_ack_backlog field of the sock structure is defined as
uns
head->first
value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.
Signed-off-by: Ro
Hi, all!
I think, it's good, but not enough.
We still can't rely on the sk->sk_family field by dereferencing the
inet_sk(sk)->pinet6 field, because we can set the sk_family field to
the PF_INET6 value before setting pinet6 to an appropriate value
(assuming it is NULL just because it was not a
On 21.05.2013 14:40, David Laight wrote:
Some network functions (udp4_lib_lookup2(), for instance) use the
hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
of a loop. In this case, it is strictly necessary to reread the head->first
value from the memory before each scan.
With
On 21.05.2013 16:09, Paul E. McKenney wrote:
On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote:
Hi, all!
This is a fix for a problem described here:
https://lkml.org/lkml/2013/4/16/371 .
---
Some network functions (udp4_lib_lookup2(), for instance) use the
On 21.05.2013 17:44, Eric Dumazet wrote:
On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
-#define hlist_nulls_first_rcu(head) \
- (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+#define hlist_nulls_first_rcu(head)\
+ (*((struct hlist_nu
On 21.05.2013 19:16, Eric Dumazet wrote:
On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
On 21.05.2013 17:44, Eric Dumazet wrote:
On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
-#define hlist_nulls_first_rcu(head) \
- (*((struct hlist_nulls_node __rcu __force
On 21.05.2013 19:38, Eric Dumazet wrote:
On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
This code has the same mistake: it is rcu_dereference_raw(head->first),
so there is nothing that prevents gcc to store the (head->first) value
in a register.
If other rcu accessors have th
ong nulls values and restarting the loop uninterruptedly.
Signed-off-by: Roman Gushchin
Reported-by: Boris Zhmurov
---
include/linux/compiler.h | 6 ++
include/linux/rculist.h | 6 --
include/linux/rculist_nulls.h | 3 ++-
3 files changed, 12 insertions(+), 3 deletions(-)
diff --
On 22.05.2013 01:47, Eric Dumazet wrote:
On Tue, 2013-05-21 at 15:44 +0400, Roman Gushchin wrote:
Hi, all!
I think, it's good, but not enough.
We still can't rely on the sk->sk_family field by dereferencing the
inet_sk(sk)->pinet6 field, because we can set the sk_family field
the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.
Signed-off-by: Roman Gushchin
Reported-by: Boris Zhmurov
---
include/linux/compiler.h | 6 ++
include/linux/rculist.h | 9 +
include/linux/rculist_nulls.h | 2 +-
includ
On 22.05.2013 16:30, Eric Dumazet wrote:
On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
+/*
+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
+ * The main goal is preventing compiler to store &ptr->field in a register.
But &ptr->field is a const
On 22.05.2013 17:27, David Laight wrote:
So yes, the patch appears to fix the bug, but it sounds not logical to
me.
I was confused because the copy of the code I found was different
(it has some checks for reusaddr - which force a function call in the
loop).
The code being compiled is:
begin:
On 22.05.2013 21:45, Paul E. McKenney wrote:
On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
On 22.05.2013 16:30, Eric Dumazet wrote:
On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
+/*
+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
+ * The
On 25.05.2013 15:37, Paul E. McKenney wrote:
2) A problem occurs when restart_condition is true and we jump to the begin
label.
We do not recalculate (head + offsetof(head, first)) address, we just
dereference
again the OLD (head->first) pointer. So, we get a node, that WAS the first node
in a
Hi, Paul!
On 25.05.2013 15:37, Paul E. McKenney wrote:
Again, I believe that your retry logic needs to extend back into the
calling function for your some_func() example above.
And what do you think about the following approach (diff below)?
It seems to me, it's enough clear (especially with
On 28.05.2013 04:12, Eric Dumazet wrote:
On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
Hi, Paul!
On 25.05.2013 15:37, Paul E. McKenney wrote:
Again, I believe that your retry logic needs to extend back into the
calling function for your some_func() example above.
And what do you
On 29.05.2013 04:34, Eric Dumazet wrote:
On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
On 28.05.2013 04:12, Eric Dumazet wrote:
Adding a barrier() is probably what we want.
I agree, inserting barrier() is also a correct and working fix.
Yeah, but I can not find a clean way to
On Tue, Aug 15, 2017 at 01:56:24PM -0700, David Rientjes wrote:
> On Tue, 15 Aug 2017, Roman Gushchin wrote:
>
> > > > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> > > > index dec5afdaa36d..22108f31e09d 100644
> > > > --- a
On Tue, Aug 15, 2017 at 02:47:10PM -0700, David Rientjes wrote:
> On Tue, 15 Aug 2017, Roman Gushchin wrote:
>
> > > I'm curious about the decision made in this conditional and how
> > > oom_kill_memcg_member() ignores task->signal->oom_score_adj. It means
&
Hi Tejun!
On Fri, Aug 11, 2017 at 09:37:54AM -0700, Tejun Heo wrote:
> In cgroup1, while cpuacct isn't actually controlling any resources, it
> is a separate controller due to combinaton of two factors -
s/combinaton/combination
> @@ -4466,6 +4470,8 @@ static void css_free_work_fn(struct work_st
Hi David!
Please, find an updated version of docs patch below.
Thanks!
Roman
--
>From 97805b3dcccb9420d2c4380e88e202164ead0e45 Mon Sep 17 00:00:00 2001
From: Roman Gushchin
Date: Fri, 2 Jun 2017 11:29:14 +0100
Subject: [PATCH 4/4] mm, oom, docs: describe the cgroup-aware OOM killer
Upd
Hi Andrew!
Can you, please, pull this patch?
Thank you!
Roman
On Fri, Jun 02, 2017 at 10:13:38AM +0200, Michal Hocko wrote:
> On Thu 01-06-17 19:41:13, Roman Gushchin wrote:
> > On Wed, May 31, 2017 at 06:39:29PM +0200, Michal Hocko wrote:
> > > On Tue 30-05-17 19:52:31, Ro
oot cgroup are treated as independent memory consumers,
and are compared with other memory consumers (e.g. leaf cgroups).
The root cgroup doesn't support the oom_kill_all_tasks feature.
Signed-off-by: Roman Gushchin
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: Tetsuo Handa
Introduce a per-memory-cgroup oom_score_adj setting.
A read-write single value file which exits on non-root
cgroups. The default is "0".
It will have a similar meaning to a per-process value,
available via /proc//oom_score_adj.
Should be in a range [-1000, 1000].
Signed-off-by: Roma
issue,
as we have oom_mm pointer/tsk_is_oom_victim(), which are just better.
Signed-off-by: Roman Gushchin
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: Tejun Heo
Cc: Tetsuo Handa
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@
Update cgroups v2 docs.
Signed-off-by: Roman Gushchin
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: Tetsuo Handa
Cc: David Rientjes
Cc: Tejun Heo
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux
oking for a new victim.
Signed-off-by: Roman Gushchin
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: Tejun Heo
Cc: Tetsuo Handa
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
mm
] Cgroup /A2/B4: 272969
[ 18.830800] Cgroup /A2/B5: 52
[ 18.831890] Chosen cgroup /A2/B4: 272969
Signed-off-by: Roman Gushchin
Cc: Tejun Heo
Cc: Johannes Weiner
Cc: Li Zefan
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Tetsuo Handa
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc
nding global counters, what can be confusing.
So, make PGSTEAL*/PGSCAN*/ALLOCSTALL counters reflect sum of any
reclaim activity in the system.
Signed-off-by: Roman Gushchin
Cc: Balbir Singh
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Vladimir Davydov
Cc: kernel-t...@fb.com
Cc: linux...@kvack.org
Cc:
aping.
Signed-off-by: Roman Gushchin
Cc: Michal Hocko
Cc: Tetsuo Handa
Cc: Johannes Weiner
Cc: Vladimir Davydov
Cc: kernel-t...@fb.com
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
include/trace/events/oom.h | 80 ++
mm/oom_k
On Tue, May 30, 2017 at 02:24:36PM +0200, Michal Hocko wrote:
> On Mon 29-05-17 14:01:41, Roman Gushchin wrote:
> > Historically, PGSTEAL*/PGSCAN*/ALLOCSTALL counters were used to
> > account only for global reclaim events, memory cgroup targeted reclaim
> > was ignored.
>
On Tue, May 30, 2017 at 02:34:16PM +0200, Michal Hocko wrote:
> On Tue 30-05-17 13:05:32, Roman Gushchin wrote:
> > Add tracepoints to simplify the debugging of the oom reaper code.
> >
> > Trace the following events:
> > 1) a process is marked as an oom victim,
>
>From c57e3674efc609f8364f5e228a2c1309cfe99901 Mon Sep 17 00:00:00 2001
From: Roman Gushchin
Date: Tue, 23 May 2017 17:37:55 +0100
Subject: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events
During the debugging of the problem described in
https://lkml.org/lkml/2017/5/17/542
(3) moves cgroup_helpers.c/h to use them by patch (4).
Patch (4) implements an example of eBPF program which controls access
to device files and corresponding userspace test.
Roman Gushchin (4):
device_cgroup: prepare code for bpf-based device controller
bpf, cgroup: implement eBPF-based
) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
by both existing and new bpf-based implementations.
Signed-off-by: Roman Gushchin
Acked-by: Tejun Heo
Acked-by: Alexei Starovoitov
---
include/linux/device_cgroup.h | 61
BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin
/zero (should fail)
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/dev_cgroup.c | 60 +
tools/testing/selftests/bpf/test_dev_cgroup.c
The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
samples/bpf/Makefile | 5 +++--
tools/testing/selftests/bpf/Makefile
) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
by both existing and new bpf-based implementations.
Signed-off-by: Roman Gushchin
Acked-by: Tejun Heo
Acked-by: Alexei Starovoitov
---
include/linux/device_cgroup.h | 61
The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
samples/bpf/Makefile | 5 +++--
tools/testing/selftests/bpf/Makefile
Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.
The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.
Signed-off-by: Roman Gushchin
Cc
/zero (should fail)
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/dev_cgroup.c | 60 +
tools/testing/selftests/bpf/test_dev_cgroup.c
BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin
infrastructure.
Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
Patch (5) implements an example of eBPF program which controls access
to device files and corresponding userspace test.
v2:
Added patch (1).
v1:
https://lkml.org/lkml/2017/11/1/363
Roman Gushchin (5):
device_cgroup: add
On Thu, Nov 02, 2017 at 08:11:07AM -0700, Alexei Starovoitov wrote:
> On 11/2/17 7:54 AM, Roman Gushchin wrote:
> > +#define DEV_BPF_ACC_MKNOD (1ULL << 0)
> > +#define DEV_BPF_ACC_READ (1ULL << 1)
> > +#define DEV_BPF_ACC_WRITE (1ULL << 2)
> >
/lkml/2017/11/1/363
Roman Gushchin (5):
device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants
device_cgroup: prepare code for bpf-based device controller
bpf, cgroup: implement eBPF-based device controller for cgroup v2
bpf: move cgroup_helpers from samples/bpf/ to
tools/testing
/zero (should fail)
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/dev_cgroup.c | 60 +
tools/testing/selftests/bpf/test_dev_cgroup.c
Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.
The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.
Signed-off-by: Roman Gushchin
Cc
) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
by both existing and new bpf-based implementations.
Signed-off-by: Roman Gushchin
Acked-by: Tejun Heo
Acked-by: Alexei Starovoitov
---
include/linux/device_cgroup.h | 61
BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin
The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
samples/bpf/Makefile | 5 +++--
tools/testing/selftests/bpf/Makefile
On Thu, Nov 02, 2017 at 10:54:12AM -0700, Joe Perches wrote:
> On Thu, 2017-11-02 at 13:15 -0400, Roman Gushchin wrote:
> > Rename device type and access type constants defined in
> > security/device_cgroup.c by adding the DEVCG_ prefix.
> >
> > The reason behind th
G_Surp:0
Hugepagesize_1G:1048576 kB
HugePages_Total: 100
HugePages_Free: 100
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize: 2048 kB
DirectMap4k: 30584 kB
DirectMap2M: 3115008 kB
DirectMap1G: 7340032 kB
Signed-off-by: Roman Gushchin
C
On Mon, Nov 13, 2017 at 05:11:02PM +0100, Michal Hocko wrote:
> On Mon 13-11-17 16:03:02, Roman Gushchin wrote:
> > Currently we display some hugepage statistics (total, free, etc)
> > in /proc/meminfo, but only for default hugepage size (e.g. 2Mb).
> >
> > If hugep
On Mon, Nov 13, 2017 at 09:06:32AM -0800, Dave Hansen wrote:
> On 11/13/2017 08:03 AM, Roman Gushchin wrote:
> > To solve this problem, let's display stats for all hugepage sizes.
> > To provide the backward compatibility let's save the existing format
> > for the
On Mon, Nov 13, 2017 at 10:30:10AM -0800, Mike Kravetz wrote:
> On 11/13/2017 10:17 AM, Dave Hansen wrote:
> > On 11/13/2017 10:11 AM, Roman Gushchin wrote:
> >> On Mon, Nov 13, 2017 at 09:06:32AM -0800, Dave Hansen wrote:
> >>> On 11/13/2017 08:03 AM, Roman Gushch
On Mon, Nov 13, 2017 at 11:25:21AM -0800, Mike Kravetz wrote:
> On 11/13/2017 11:10 AM, Johannes Weiner wrote:
> > On Mon, Nov 13, 2017 at 06:45:01PM +0000, Roman Gushchin wrote:
> >> Or, at least, some total counter, e.g. how much memory is consumed
> >> by hugetlb pa
p:0
Hugepagesize: 2048 kB
Hugetlb: 4194304 kB
DirectMap4k: 32632 kB
DirectMap2M: 4161536 kB
DirectMap1G: 6291456 kB
Signed-off-by: Roman Gushchin
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Mike Kravetz
Cc: "Aneesh Kumar K.V&q
slab_unreclaimable 454656
hugetlb 1073741824
pgfault 4580
pgmajfault 13
...
Signed-off-by: Roman Gushchin
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Andrew Morton
Cc: Tejun Heo
Cc: Mike Kravetz
Cc: Dave Hansen
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc: lin
7;s export the list of such features
using /sys/kernel/cgroup/features pseudo-file.
The list is hardcoded and has to be extended when new functionality
is added. Each feature is printed on a new line.
Example:
$ cat /sys/kernel/cgroup/features
nsdelegate
Signed-off-by: Roman Gushchin
Cc: Tej
x27;s export the list
via /sys/kernel/cgroup/delegates pseudo-file.
Format is siple: each control file name is printed on a new line.
Example:
$ cat /sys/kernel/cgroup/delegates
cgroup.procs
cgroup.subtree_control
Signed-off-by: Roman Gushchin
Cc: Tejun Heo
Cc: kernel-t...@fb.com
---
kernel/c
The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
samples/bpf/Makefile | 5 +++--
tools/testing/selftests/bpf/Makefile
Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.
The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.
Signed-off-by: Roman Gushchin
Cc
/zero (should fail)
Signed-off-by: Roman Gushchin
Acked-by: Alexei Starovoitov
Acked-by: Tejun Heo
Cc: Daniel Borkmann
---
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/dev_cgroup.c | 60 +
tools/testing/selftests/bpf/test_dev_cgroup.c
) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
by both existing and new bpf-based implementations.
Signed-off-by: Roman Gushchin
Acked-by: Tejun Heo
Acked-by: Alexei Starovoitov
---
include/linux/device_cgroup.h | 61
/lkml/2017/11/1/363
Roman Gushchin (5):
device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants
device_cgroup: prepare code for bpf-based device controller
bpf, cgroup: implement eBPF-based device controller for cgroup v2
bpf: move cgroup_helpers from samples/bpf/ to
tools/testing
BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin
7;s export the list of such features
using /sys/kernel/cgroup/features pseudo-file.
The list is hardcoded and has to be extended when new functionality
is added. Each feature is printed on a new line.
Example:
$ cat /sys/kernel/cgroup/features
nsdelegate
Signed-off-by: Roman Gushchin
Cc: Tej
x27;s export the list
via /sys/kernel/cgroup/delegate pseudo-file.
Format is siple: each control file name is printed on a new line.
Example:
$ cat /sys/kernel/cgroup/delegate
cgroup.procs
cgroup.subtree_control
Signed-off-by: Roman Gushchin
Cc: Tejun Heo
Cc: kernel-t...@fb.com
---
kernel/c
Document the cgroup-aware OOM killer.
Signed-off-by: Roman Gushchin
Acked-by: Johannes Weiner
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Tetsuo Handa
Cc: Andrew Morton
Cc: David Rientjes
Cc: Tejun Heo
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc
rg
Cc: linux...@kvack.org
Roman Gushchin (6):
mm, oom: refactor the oom_kill_process() function
mm: implement mem_cgroup_scan_tasks() for the root memory cgroup
mm, oom: cgroup-aware OOM killer
mm, oom: introduce memory.oom_group
mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer
with task selection (considering task's children),
so we can't use the existing oom_kill_process().
Signed-off-by: Roman Gushchin
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Acked-by: David Rientjes
Cc: Vladimir Davydov
Cc: Tetsuo Handa
Cc: David Rientjes
Cc: Andrew Morton
Cc: Tej
Add a "groupoom" cgroup v2 mount option to enable the cgroup-aware
OOM killer. If not set, the OOM selection is performed in
a "traditional" per-process way.
The behavior can be changed dynamically by remounting the cgroupfs.
Signed-off-by: Roman Gushchin
Cc: Michal Hocko
cgroup are iterated over.
This patch doesn't introduce any functional change as
mem_cgroup_scan_tasks() is never called for the root memcg.
This is preparatory work for the cgroup-aware OOM killer,
which will use this function to iterate over tasks belonging
to the root memcg.
Signed-off-by:
ned-off-by: Roman Gushchin
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Cc: Vladimir Davydov
Cc: Tetsuo Handa
Cc: David Rientjes
Cc: Andrew Morton
Cc: Tejun Heo
Cc: kernel-t...@fb.com
Cc: cgro...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...
established way to protect a particular process
from seeing an unexpected SIGKILL from the OOM killer. Ignoring this
user defined configuration might lead to data corruptions or other
misbehavior.
The default value is 0.
Signed-off-by: Roman Gushchin
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Cc
On Thu, Oct 26, 2017 at 02:03:41PM -0700, David Rientjes wrote:
> On Thu, 26 Oct 2017, Johannes Weiner wrote:
>
> > > The nack is for three reasons:
> > >
> > > (1) unfair comparison of root mem cgroup usage to bias against that mem
> > > cgroup from oom kill in system oom conditions,
> >
1 - 100 of 2132 matches
Mail list logo