[PATCH] memcg: implement low limits

2013-02-27 Thread Roman Gushchin
ped exponentially. Low limits don't affect soft reclaim. Also, it's possible that a cgroup with memory usage under low limit will be reclaimed slowly on very low scanning priorities. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h |7 + include/linux/res_counter.h

Re: [PATCH] memcg: implement low limits

2013-02-27 Thread Roman Gushchin
lso, it can be so, that my preferable cgroup is higher above it's soft limit than other cgroups (and it's hard to control), so it will be reclaimed more intensively than necessary. >>  Signed-off-by: Roman Gushchin >>  --- >>   include/linux/memcontrol.h  |    7

Re: [PATCH] memcg: implement low limits

2013-02-27 Thread Roman Gushchin
27.02.2013, 13:41, "Michal Hocko" : > Let me restate what I have already mentioned in the private > communication. > > We already have soft limit which can be implemented to achieve the > same/similar functionality and in fact this is a long term objective (at > least for me). I hope I will be able

Re: [PATCH] memcg: implement low limits

2013-02-27 Thread Roman Gushchin
Please find my comments below. > More comments on the code bellow. > > [...] > >>  diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>  index 53b8201..d8e6ee6 100644 >>  --- a/mm/memcontrol.c >>  +++ b/mm/memcontrol.c >>  @@ -1743,6 +1743,53 @@ static void mem_cgroup_out_of_memory(struct >> mem_cgr

Re: [PATCH] memcg: implement low limits

2013-02-28 Thread Roman Gushchin
27.02.2013, 20:14, "Michal Hocko" : > On Wed 27-02-13 14:39:36, Roman Gushchin wrote: > >>  27.02.2013, 13:41, "Michal Hocko" : >>>  Let me restate what I have already mentioned in the private >>>  communication. >>> >>>  We alrea

Re: resend--[PATCH] improve read ahead in kernel

2012-12-20 Thread Roman Gushchin
Hi Simon, 20.12.2012, 10:21, "Simon Jeons" : > On Sun, 2012-12-16 at 02:15 +, Eric Wong wrote: > >>  xtu4 wrote: >>>  resend it, due to format error >>> >>>  Subject: [PATCH] when system in low memory scenario, imaging there is a mp3 >>>   play, ora video play, we need to read mp3 or video fi

[PATCH] slub: Avoid direct compaction if possible

2013-06-14 Thread Roman Gushchin
cations as soon as memory will be de-fragmented. Signed-off-by: Roman Gushchin --- include/linux/gfp.h | 4 +++- mm/page_alloc.c | 3 +++ mm/slub.c | 3 ++- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 0f615eb..073a90a 10

Re: [PATCH] slub: Avoid direct compaction if possible

2013-06-14 Thread Roman Gushchin
On 14.06.2013 18:32, Christoph Lameter wrote: On Fri, 14 Jun 2013, Roman Gushchin wrote: Slub tries to allocate contiguous pages even if memory is fragmented and there are no free contiguous pages. In this case it calls direct compaction to allocate contiguous page. Compaction requires the

Re: [PATCH] slub: Avoid direct compaction if possible

2013-06-14 Thread Roman Gushchin
On 14.06.2013 20:08, Christoph Lameter wrote: On Fri, 14 Jun 2013, Roman Gushchin wrote: But there is an actual problem, that this patch solves. Sometimes I saw the following issue on some machines: all CPUs are performing compaction, system time is about 80%, system is completely unreliable

Re: [PATCH] slub: Avoid direct compaction if possible

2013-06-17 Thread Roman Gushchin
On 15.06.2013 00:26, David Rientjes wrote: On Fri, 14 Jun 2013, Christoph Lameter wrote: It's possible to avoid such problems (or at least to make them less probable) by avoiding direct compaction. If it's not possible to allocate a contiguous page without compaction, slub will fall back to ord

Re: [PATCH] slub: Avoid direct compaction if possible

2013-06-17 Thread Roman Gushchin
On 17.06.2013 18:27, Michal Hocko wrote: On Mon 17-06-13 16:34:23, Roman Gushchin wrote: On 15.06.2013 00:26, David Rientjes wrote: On Fri, 14 Jun 2013, Christoph Lameter wrote: It's possible to avoid such problems (or at least to make them less probable) by avoiding direct compactio

slub: slab order on multi-processor machines

2013-06-07 Thread Roman Gushchin
Hi! While investigating some compaction-related problems, I noticed, that many (even most) kernel objects are allocated on slabs with order 2 or 3. This behavior was introduced by commit 9b2cd506e "slub: Calculate min_objects based on number of processors." by Christoph Lameter. As I understan

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-29 Thread Roman Gushchin
On 29.05.2013 09:08, Eric Dumazet wrote: On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote: On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote: On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote: On 28.05.2013 04:12, Eric Dumazet wrote: About your earlier question, I

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-30 Thread Roman Gushchin
On 29.05.2013 23:06, Eric Dumazet wrote: On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote: True, these lookup functions are usually structured the same around the hlist_nulls_for_each_entry_rcu() loop. A barrier() right before the loop seems to be a benefit, the size of assembly code is

Re: [patch 09/10] mm: thrash detection-based file cache sizing

2013-06-07 Thread Roman Gushchin
On 30.05.2013 22:04, Johannes Weiner wrote: +/* + * Monotonic workingset clock for non-resident pages. + * + * The refault distance of a page is the number of ticks that occurred + * between that page's eviction and subsequent refault. + * + * Every page slot that is taken away from the inactive

Re: slub: slab order on multi-processor machines

2013-06-07 Thread Roman Gushchin
On 07.06.2013 18:12, Christoph Lameter wrote: On Fri, 7 Jun 2013, Roman Gushchin wrote: As I understand, the idea was to make kernel allocations cheaper by reducing the total number of page allocations (allocating 1 page with order 3 is cheaper than allocating 8 1-ordered pages). Its also

Re: [PATCH] slub: Avoid direct compaction if possible

2013-06-27 Thread Roman Gushchin
On 18.06.2013 01:44, David Rientjes wrote: On Mon, 17 Jun 2013, Roman Gushchin wrote: They certainly aren't enough, the kernel you're running suffers from a couple different memory compaction issues that were fixed in 3.7. I couldn't sympathize with your situation more, I

[PATCH] net: check net.core.somaxconn sysctl values

2013-07-31 Thread Roman Gushchin
xconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Signed-off-by: Roman Gushchin --- net/core/sysctl_net_core.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index cfdb46a..2ff093b

FYI: Re: [PATCH] net: check net.core.somaxconn sysctl values

2013-07-31 Thread Roman Gushchin
Original Message Subject: Re: [PATCH] net: check net.core.somaxconn sysctl values Date: Wed, 31 Jul 2013 07:37:37 -0700 From: Eric Dumazet To: Roman Gushchin CC: David S. Miller , raise.s...@gmail.com, ebied...@xmission.com, net...@vger.kernel.org, linux-kernel

Re: [PATCH] net: check net.core.somaxconn sysctl values

2013-07-31 Thread Roman Gushchin
On 31.07.2013 18:37, Eric Dumazet wrote: On Wed, 2013-07-31 at 17:57 +0400, Roman Gushchin wrote: It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as uns

Re: [PATCH] net: check net.core.somaxconn sysctl values

2013-08-01 Thread Roman Gushchin
On 01.08.2013 04:10, David Miller wrote: From: Roman Gushchin Date: Wed, 31 Jul 2013 17:57:35 +0400 --- net/core/sysctl_net_core.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index cfdb46a..2ff093b 100644

[PATCH] net: check net.core.somaxconn sysctl values

2013-08-02 Thread Roman Gushchin
xconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by: Roman Gushchin Reported-by: Changli Gao Suggested-by: Eric Dumazet Acked-by: Eric Dumazet --- net/core/sysctl_net_core.c | 6 +- 1 file changed

Re: [PATCH] net: check net.core.somaxconn sysctl values

2013-08-03 Thread Roman Gushchin
On 03.08.2013 02:19, David Miller wrote: From: Roman Gushchin Date: Fri, 2 Aug 2013 18:36:40 +0400 It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as uns

[PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Roman Gushchin
head->first value from the memory before each scan. Without additional hints, gcc caches this value in a register. In this case, if a cached node is moved to another chain during the scan, we can loop forever getting wrong nulls values and restarting the loop uninterruptedly. Signed-off-by: Ro

Re: [ 072/102] ipv6: do not clear pinet6 field

2013-05-21 Thread Roman Gushchin
Hi, all! I think, it's good, but not enough. We still can't rely on the sk->sk_family field by dereferencing the inet_sk(sk)->pinet6 field, because we can set the sk_family field to the PF_INET6 value before setting pinet6 to an appropriate value (assuming it is NULL just because it was not a

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Roman Gushchin
On 21.05.2013 14:40, David Laight wrote: Some network functions (udp4_lib_lookup2(), for instance) use the hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting of a loop. In this case, it is strictly necessary to reread the head->first value from the memory before each scan. With

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Roman Gushchin
On 21.05.2013 16:09, Paul E. McKenney wrote: On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote: Hi, all! This is a fix for a problem described here: https://lkml.org/lkml/2013/4/16/371 . --- Some network functions (udp4_lib_lookup2(), for instance) use the

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Roman Gushchin
On 21.05.2013 17:44, Eric Dumazet wrote: On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote: -#define hlist_nulls_first_rcu(head) \ - (*((struct hlist_nulls_node __rcu __force **)&(head)->first)) +#define hlist_nulls_first_rcu(head)\ + (*((struct hlist_nu

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Roman Gushchin
On 21.05.2013 19:16, Eric Dumazet wrote: On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote: On 21.05.2013 17:44, Eric Dumazet wrote: On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote: -#define hlist_nulls_first_rcu(head) \ - (*((struct hlist_nulls_node __rcu __force

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Roman Gushchin
On 21.05.2013 19:38, Eric Dumazet wrote: On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote: This code has the same mistake: it is rcu_dereference_raw(head->first), so there is nothing that prevents gcc to store the (head->first) value in a register. If other rcu accessors have th

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Roman Gushchin
ong nulls values and restarting the loop uninterruptedly. Signed-off-by: Roman Gushchin Reported-by: Boris Zhmurov --- include/linux/compiler.h | 6 ++ include/linux/rculist.h | 6 -- include/linux/rculist_nulls.h | 3 ++- 3 files changed, 12 insertions(+), 3 deletions(-) diff --

Re: [ 072/102] ipv6: do not clear pinet6 field

2013-05-22 Thread Roman Gushchin
On 22.05.2013 01:47, Eric Dumazet wrote: On Tue, 2013-05-21 at 15:44 +0400, Roman Gushchin wrote: Hi, all! I think, it's good, but not enough. We still can't rely on the sk->sk_family field by dereferencing the inet_sk(sk)->pinet6 field, because we can set the sk_family field

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Roman Gushchin
the scan, we can loop forever getting wrong nulls values and restarting the loop uninterruptedly. Signed-off-by: Roman Gushchin Reported-by: Boris Zhmurov --- include/linux/compiler.h | 6 ++ include/linux/rculist.h | 9 + include/linux/rculist_nulls.h | 2 +- includ

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Roman Gushchin
On 22.05.2013 16:30, Eric Dumazet wrote: On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote: +/* + * Same as ACCESS_ONCE(), but used for accessing field of a structure. + * The main goal is preventing compiler to store &ptr->field in a register. But &ptr->field is a const

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Roman Gushchin
On 22.05.2013 17:27, David Laight wrote: So yes, the patch appears to fix the bug, but it sounds not logical to me. I was confused because the copy of the code I found was different (it has some checks for reusaddr - which force a function call in the loop). The code being compiled is: begin:

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Roman Gushchin
On 22.05.2013 21:45, Paul E. McKenney wrote: On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote: On 22.05.2013 16:30, Eric Dumazet wrote: On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote: +/* + * Same as ACCESS_ONCE(), but used for accessing field of a structure. + * The

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-27 Thread Roman Gushchin
On 25.05.2013 15:37, Paul E. McKenney wrote: 2) A problem occurs when restart_condition is true and we jump to the begin label. We do not recalculate (head + offsetof(head, first)) address, we just dereference again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-27 Thread Roman Gushchin
Hi, Paul! On 25.05.2013 15:37, Paul E. McKenney wrote: Again, I believe that your retry logic needs to extend back into the calling function for your some_func() example above. And what do you think about the following approach (diff below)? It seems to me, it's enough clear (especially with

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-28 Thread Roman Gushchin
On 28.05.2013 04:12, Eric Dumazet wrote: On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote: Hi, Paul! On 25.05.2013 15:37, Paul E. McKenney wrote: Again, I believe that your retry logic needs to extend back into the calling function for your some_func() example above. And what do you

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-29 Thread Roman Gushchin
On 29.05.2013 04:34, Eric Dumazet wrote: On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote: On 28.05.2013 04:12, Eric Dumazet wrote: Adding a barrier() is probably what we want. I agree, inserting barrier() is also a correct and working fix. Yeah, but I can not find a clean way to

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-16 Thread Roman Gushchin
On Tue, Aug 15, 2017 at 01:56:24PM -0700, David Rientjes wrote: > On Tue, 15 Aug 2017, Roman Gushchin wrote: > > > > > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt > > > > index dec5afdaa36d..22108f31e09d 100644 > > > > --- a

Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

2017-08-16 Thread Roman Gushchin
On Tue, Aug 15, 2017 at 02:47:10PM -0700, David Rientjes wrote: > On Tue, 15 Aug 2017, Roman Gushchin wrote: > > > > I'm curious about the decision made in this conditional and how > > > oom_kill_memcg_member() ignores task->signal->oom_score_adj. It means &

Re: [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting

2017-08-17 Thread Roman Gushchin
Hi Tejun! On Fri, Aug 11, 2017 at 09:37:54AM -0700, Tejun Heo wrote: > In cgroup1, while cpuacct isn't actually controlling any resources, it > is a separate controller due to combinaton of two factors - s/combinaton/combination > @@ -4466,6 +4470,8 @@ static void css_free_work_fn(struct work_st

Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer

2017-08-17 Thread Roman Gushchin
Hi David! Please, find an updated version of docs patch below. Thanks! Roman -- >From 97805b3dcccb9420d2c4380e88e202164ead0e45 Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Fri, 2 Jun 2017 11:29:14 +0100 Subject: [PATCH 4/4] mm, oom, docs: describe the cgroup-aware OOM killer Upd

Re: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events

2017-06-20 Thread Roman Gushchin
Hi Andrew! Can you, please, pull this patch? Thank you! Roman On Fri, Jun 02, 2017 at 10:13:38AM +0200, Michal Hocko wrote: > On Thu 01-06-17 19:41:13, Roman Gushchin wrote: > > On Wed, May 31, 2017 at 06:39:29PM +0200, Michal Hocko wrote: > > > On Tue 30-05-17 19:52:31, Ro

[v3 2/6] mm, oom: cgroup-aware OOM killer

2017-06-21 Thread Roman Gushchin
oot cgroup are treated as independent memory consumers, and are compared with other memory consumers (e.g. leaf cgroups). The root cgroup doesn't support the oom_kill_all_tasks feature. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tetsuo Handa

[v3 4/6] mm, oom: introduce oom_score_adj for memory cgroups

2017-06-21 Thread Roman Gushchin
Introduce a per-memory-cgroup oom_score_adj setting. A read-write single value file which exits on non-root cgroups. The default is "0". It will have a similar meaning to a per-process value, available via /proc//oom_score_adj. Should be in a range [-1000, 1000]. Signed-off-by: Roma

[v3 5/6] mm, oom: don't mark all oom victims tasks with TIF_MEMDIE

2017-06-21 Thread Roman Gushchin
issue, as we have oom_mm pointer/tsk_is_oom_victim(), which are just better. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tejun Heo Cc: Tetsuo Handa Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@

[v3 6/6] mm,oom,docs: describe the cgroup-aware OOM killer

2017-06-21 Thread Roman Gushchin
Update cgroups v2 docs. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tetsuo Handa Cc: David Rientjes Cc: Tejun Heo Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux

[v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

2017-06-21 Thread Roman Gushchin
oking for a new victim. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tejun Heo Cc: Tetsuo Handa Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- mm

[v3 3/6] mm, oom: cgroup-aware OOM killer debug info

2017-06-21 Thread Roman Gushchin
] Cgroup /A2/B4: 272969 [ 18.830800] Cgroup /A2/B5: 52 [ 18.831890] Chosen cgroup /A2/B4: 272969 Signed-off-by: Roman Gushchin Cc: Tejun Heo Cc: Johannes Weiner Cc: Li Zefan Cc: Michal Hocko Cc: Vladimir Davydov Cc: Tetsuo Handa Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc

[PATCH] mm: bump PGSTEAL*/PGSCAN*/ALLOCSTALL counters in memcg reclaim

2017-05-29 Thread Roman Gushchin
nding global counters, what can be confusing. So, make PGSTEAL*/PGSCAN*/ALLOCSTALL counters reflect sum of any reclaim activity in the system. Signed-off-by: Roman Gushchin Cc: Balbir Singh Cc: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: kernel-t...@fb.com Cc: linux...@kvack.org Cc:

[PATCH] mm,oom: add tracepoints for oom reaper-related events

2017-05-30 Thread Roman Gushchin
aping. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Tetsuo Handa Cc: Johannes Weiner Cc: Vladimir Davydov Cc: kernel-t...@fb.com Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- include/trace/events/oom.h | 80 ++ mm/oom_k

Re: [PATCH] mm: bump PGSTEAL*/PGSCAN*/ALLOCSTALL counters in memcg reclaim

2017-05-30 Thread Roman Gushchin
On Tue, May 30, 2017 at 02:24:36PM +0200, Michal Hocko wrote: > On Mon 29-05-17 14:01:41, Roman Gushchin wrote: > > Historically, PGSTEAL*/PGSCAN*/ALLOCSTALL counters were used to > > account only for global reclaim events, memory cgroup targeted reclaim > > was ignored. >

Re: [PATCH] mm,oom: add tracepoints for oom reaper-related events

2017-05-30 Thread Roman Gushchin
On Tue, May 30, 2017 at 02:34:16PM +0200, Michal Hocko wrote: > On Tue 30-05-17 13:05:32, Roman Gushchin wrote: > > Add tracepoints to simplify the debugging of the oom reaper code. > > > > Trace the following events: > > 1) a process is marked as an oom victim, >

Re: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events

2017-05-30 Thread Roman Gushchin
>From c57e3674efc609f8364f5e228a2c1309cfe99901 Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Tue, 23 May 2017 17:37:55 +0100 Subject: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events During the debugging of the problem described in https://lkml.org/lkml/2017/5/17/542

[PATCH net-next 0/4] eBPF-based device cgroup controller

2017-11-01 Thread Roman Gushchin
(3) moves cgroup_helpers.c/h to use them by patch (4). Patch (4) implements an example of eBPF program which controls access to device files and corresponding userspace test. Roman Gushchin (4): device_cgroup: prepare code for bpf-based device controller bpf, cgroup: implement eBPF-based

[PATCH net-next 1/4] device_cgroup: prepare code for bpf-based device controller

2017-11-01 Thread Roman Gushchin
) __devcgroup_check_permission() is exported. 3) devcgroup_check_permission() wrapper is introduced to be used by both existing and new bpf-based implementations. Signed-off-by: Roman Gushchin Acked-by: Tejun Heo Acked-by: Alexei Starovoitov --- include/linux/device_cgroup.h | 61

[PATCH net-next 2/4] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-01 Thread Roman Gushchin
BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin

[PATCH net-next 4/4] selftests/bpf: add a test for device cgroup controller

2017-11-01 Thread Roman Gushchin
/zero (should fail) Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- tools/testing/selftests/bpf/Makefile | 4 +- tools/testing/selftests/bpf/dev_cgroup.c | 60 + tools/testing/selftests/bpf/test_dev_cgroup.c

[PATCH net-next 3/4] bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/

2017-11-01 Thread Roman Gushchin
The purpose of this move is to use these files in bpf tests. Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- samples/bpf/Makefile | 5 +++-- tools/testing/selftests/bpf/Makefile

[PATCH v2 net-next 2/5] device_cgroup: prepare code for bpf-based device controller

2017-11-02 Thread Roman Gushchin
) __devcgroup_check_permission() is exported. 3) devcgroup_check_permission() wrapper is introduced to be used by both existing and new bpf-based implementations. Signed-off-by: Roman Gushchin Acked-by: Tejun Heo Acked-by: Alexei Starovoitov --- include/linux/device_cgroup.h | 61

[PATCH v2 net-next 4/5] bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/

2017-11-02 Thread Roman Gushchin
The purpose of this move is to use these files in bpf tests. Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- samples/bpf/Makefile | 5 +++-- tools/testing/selftests/bpf/Makefile

[PATCH v2 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-02 Thread Roman Gushchin
Rename device type and access type constants defined in security/device_cgroup.c by adding the DEVCG_ prefix. The reason behind this renaming is to make them global namespace friendly, as they will be moved to the corresponding header file by following patches. Signed-off-by: Roman Gushchin Cc

[PATCH v2 net-next 5/5] selftests/bpf: add a test for device cgroup controller

2017-11-02 Thread Roman Gushchin
/zero (should fail) Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- tools/testing/selftests/bpf/Makefile | 4 +- tools/testing/selftests/bpf/dev_cgroup.c | 60 + tools/testing/selftests/bpf/test_dev_cgroup.c

[PATCH v2 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-02 Thread Roman Gushchin
BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin

[PATCH v2 net-next 0/5] eBPF-based device cgroup controller

2017-11-02 Thread Roman Gushchin
infrastructure. Patch (4) moves cgroup_helpers.c/h to use them by patch (4). Patch (5) implements an example of eBPF program which controls access to device files and corresponding userspace test. v2: Added patch (1). v1: https://lkml.org/lkml/2017/11/1/363 Roman Gushchin (5): device_cgroup: add

Re: [PATCH v2 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-02 Thread Roman Gushchin
On Thu, Nov 02, 2017 at 08:11:07AM -0700, Alexei Starovoitov wrote: > On 11/2/17 7:54 AM, Roman Gushchin wrote: > > +#define DEV_BPF_ACC_MKNOD (1ULL << 0) > > +#define DEV_BPF_ACC_READ (1ULL << 1) > > +#define DEV_BPF_ACC_WRITE (1ULL << 2) > >

[PATCH v3 net-next 0/5] eBPF-based device cgroup controller

2017-11-02 Thread Roman Gushchin
/lkml/2017/11/1/363 Roman Gushchin (5): device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants device_cgroup: prepare code for bpf-based device controller bpf, cgroup: implement eBPF-based device controller for cgroup v2 bpf: move cgroup_helpers from samples/bpf/ to tools/testing

[PATCH v3 net-next 5/5] selftests/bpf: add a test for device cgroup controller

2017-11-02 Thread Roman Gushchin
/zero (should fail) Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- tools/testing/selftests/bpf/Makefile | 4 +- tools/testing/selftests/bpf/dev_cgroup.c | 60 + tools/testing/selftests/bpf/test_dev_cgroup.c

[PATCH v3 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-02 Thread Roman Gushchin
Rename device type and access type constants defined in security/device_cgroup.c by adding the DEVCG_ prefix. The reason behind this renaming is to make them global namespace friendly, as they will be moved to the corresponding header file by following patches. Signed-off-by: Roman Gushchin Cc

[PATCH v3 net-next 2/5] device_cgroup: prepare code for bpf-based device controller

2017-11-02 Thread Roman Gushchin
) __devcgroup_check_permission() is exported. 3) devcgroup_check_permission() wrapper is introduced to be used by both existing and new bpf-based implementations. Signed-off-by: Roman Gushchin Acked-by: Tejun Heo Acked-by: Alexei Starovoitov --- include/linux/device_cgroup.h | 61

[PATCH v3 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-02 Thread Roman Gushchin
BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin

[PATCH v3 net-next 4/5] bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/

2017-11-02 Thread Roman Gushchin
The purpose of this move is to use these files in bpf tests. Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- samples/bpf/Makefile | 5 +++-- tools/testing/selftests/bpf/Makefile

Re: [PATCH v3 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-02 Thread Roman Gushchin
On Thu, Nov 02, 2017 at 10:54:12AM -0700, Joe Perches wrote: > On Thu, 2017-11-02 at 13:15 -0400, Roman Gushchin wrote: > > Rename device type and access type constants defined in > > security/device_cgroup.c by adding the DEVCG_ prefix. > > > > The reason behind th

[PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Roman Gushchin
G_Surp:0 Hugepagesize_1G:1048576 kB HugePages_Total: 100 HugePages_Free: 100 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k: 30584 kB DirectMap2M: 3115008 kB DirectMap1G: 7340032 kB Signed-off-by: Roman Gushchin C

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Roman Gushchin
On Mon, Nov 13, 2017 at 05:11:02PM +0100, Michal Hocko wrote: > On Mon 13-11-17 16:03:02, Roman Gushchin wrote: > > Currently we display some hugepage statistics (total, free, etc) > > in /proc/meminfo, but only for default hugepage size (e.g. 2Mb). > > > > If hugep

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Roman Gushchin
On Mon, Nov 13, 2017 at 09:06:32AM -0800, Dave Hansen wrote: > On 11/13/2017 08:03 AM, Roman Gushchin wrote: > > To solve this problem, let's display stats for all hugepage sizes. > > To provide the backward compatibility let's save the existing format > > for the

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Roman Gushchin
On Mon, Nov 13, 2017 at 10:30:10AM -0800, Mike Kravetz wrote: > On 11/13/2017 10:17 AM, Dave Hansen wrote: > > On 11/13/2017 10:11 AM, Roman Gushchin wrote: > >> On Mon, Nov 13, 2017 at 09:06:32AM -0800, Dave Hansen wrote: > >>> On 11/13/2017 08:03 AM, Roman Gushch

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-14 Thread Roman Gushchin
On Mon, Nov 13, 2017 at 11:25:21AM -0800, Mike Kravetz wrote: > On 11/13/2017 11:10 AM, Johannes Weiner wrote: > > On Mon, Nov 13, 2017 at 06:45:01PM +0000, Roman Gushchin wrote: > >> Or, at least, some total counter, e.g. how much memory is consumed > >> by hugetlb pa

[PATCH] mm: show total hugetlb memory consumption in /proc/meminfo

2017-11-14 Thread Roman Gushchin
p:0 Hugepagesize: 2048 kB Hugetlb: 4194304 kB DirectMap4k: 32632 kB DirectMap2M: 4161536 kB DirectMap1G: 6291456 kB Signed-off-by: Roman Gushchin Cc: Andrew Morton Cc: Michal Hocko Cc: Johannes Weiner Cc: Mike Kravetz Cc: "Aneesh Kumar K.V&q

[PATCH] memcg: hugetlbfs basic usage accounting

2017-11-14 Thread Roman Gushchin
slab_unreclaimable 454656 hugetlb 1073741824 pgfault 4580 pgmajfault 13 ... Signed-off-by: Roman Gushchin Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: Tejun Heo Cc: Mike Kravetz Cc: Dave Hansen Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: lin

[PATCH 2/2] cgroup: export list of cgroups v2 features using sysfs

2017-11-03 Thread Roman Gushchin
7;s export the list of such features using /sys/kernel/cgroup/features pseudo-file. The list is hardcoded and has to be extended when new functionality is added. Each feature is printed on a new line. Example: $ cat /sys/kernel/cgroup/features nsdelegate Signed-off-by: Roman Gushchin Cc: Tej

[PATCH 1/2] cgroup: export list of delegatable control files using sysfs

2017-11-03 Thread Roman Gushchin
x27;s export the list via /sys/kernel/cgroup/delegates pseudo-file. Format is siple: each control file name is printed on a new line. Example: $ cat /sys/kernel/cgroup/delegates cgroup.procs cgroup.subtree_control Signed-off-by: Roman Gushchin Cc: Tejun Heo Cc: kernel-t...@fb.com --- kernel/c

[PATCH v3 net-next 4/5] bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/

2017-11-05 Thread Roman Gushchin
The purpose of this move is to use these files in bpf tests. Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- samples/bpf/Makefile | 5 +++-- tools/testing/selftests/bpf/Makefile

[PATCH v3 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-05 Thread Roman Gushchin
Rename device type and access type constants defined in security/device_cgroup.c by adding the DEVCG_ prefix. The reason behind this renaming is to make them global namespace friendly, as they will be moved to the corresponding header file by following patches. Signed-off-by: Roman Gushchin Cc

[PATCH v3 net-next 5/5] selftests/bpf: add a test for device cgroup controller

2017-11-05 Thread Roman Gushchin
/zero (should fail) Signed-off-by: Roman Gushchin Acked-by: Alexei Starovoitov Acked-by: Tejun Heo Cc: Daniel Borkmann --- tools/testing/selftests/bpf/Makefile | 4 +- tools/testing/selftests/bpf/dev_cgroup.c | 60 + tools/testing/selftests/bpf/test_dev_cgroup.c

[PATCH v3 net-next 2/5] device_cgroup: prepare code for bpf-based device controller

2017-11-05 Thread Roman Gushchin
) __devcgroup_check_permission() is exported. 3) devcgroup_check_permission() wrapper is introduced to be used by both existing and new bpf-based implementations. Signed-off-by: Roman Gushchin Acked-by: Tejun Heo Acked-by: Alexei Starovoitov --- include/linux/device_cgroup.h | 61

[PATCH v3 net-next 0/5] eBPF-based device cgroup controller

2017-11-05 Thread Roman Gushchin
/lkml/2017/11/1/363 Roman Gushchin (5): device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants device_cgroup: prepare code for bpf-based device controller bpf, cgroup: implement eBPF-based device controller for cgroup v2 bpf: move cgroup_helpers from samples/bpf/ to tools/testing

[PATCH v3 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-05 Thread Roman Gushchin
BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin

[PATCH v2 2/2] cgroup: export list of cgroups v2 features using sysfs

2017-11-06 Thread Roman Gushchin
7;s export the list of such features using /sys/kernel/cgroup/features pseudo-file. The list is hardcoded and has to be extended when new functionality is added. Each feature is printed on a new line. Example: $ cat /sys/kernel/cgroup/features nsdelegate Signed-off-by: Roman Gushchin Cc: Tej

[PATCH v2 1/2] cgroup: export list of delegatable control files using sysfs

2017-11-06 Thread Roman Gushchin
x27;s export the list via /sys/kernel/cgroup/delegate pseudo-file. Format is siple: each control file name is printed on a new line. Example: $ cat /sys/kernel/cgroup/delegate cgroup.procs cgroup.subtree_control Signed-off-by: Roman Gushchin Cc: Tejun Heo Cc: kernel-t...@fb.com --- kernel/c

[RESEND v12 6/6] mm, oom, docs: describe the cgroup-aware OOM killer

2017-10-19 Thread Roman Gushchin
Document the cgroup-aware OOM killer. Signed-off-by: Roman Gushchin Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Tetsuo Handa Cc: Andrew Morton Cc: David Rientjes Cc: Tejun Heo Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc

[RESEND v12 0/6] cgroup-aware OOM killer

2017-10-19 Thread Roman Gushchin
rg Cc: linux...@kvack.org Roman Gushchin (6): mm, oom: refactor the oom_kill_process() function mm: implement mem_cgroup_scan_tasks() for the root memory cgroup mm, oom: cgroup-aware OOM killer mm, oom: introduce memory.oom_group mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer

[RESEND v12 1/6] mm, oom: refactor the oom_kill_process() function

2017-10-19 Thread Roman Gushchin
with task selection (considering task's children), so we can't use the existing oom_kill_process(). Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Johannes Weiner Acked-by: David Rientjes Cc: Vladimir Davydov Cc: Tetsuo Handa Cc: David Rientjes Cc: Andrew Morton Cc: Tej

[RESEND v12 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer

2017-10-19 Thread Roman Gushchin
Add a "groupoom" cgroup v2 mount option to enable the cgroup-aware OOM killer. If not set, the OOM selection is performed in a "traditional" per-process way. The behavior can be changed dynamically by remounting the cgroupfs. Signed-off-by: Roman Gushchin Cc: Michal Hocko

[RESEND v12 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup

2017-10-19 Thread Roman Gushchin
cgroup are iterated over. This patch doesn't introduce any functional change as mem_cgroup_scan_tasks() is never called for the root memcg. This is preparatory work for the cgroup-aware OOM killer, which will use this function to iterate over tasks belonging to the root memcg. Signed-off-by:

[RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-19 Thread Roman Gushchin
ned-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Johannes Weiner Cc: Vladimir Davydov Cc: Tetsuo Handa Cc: David Rientjes Cc: Andrew Morton Cc: Tejun Heo Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux...

[RESEND v12 4/6] mm, oom: introduce memory.oom_group

2017-10-19 Thread Roman Gushchin
established way to protect a particular process from seeing an unexpected SIGKILL from the OOM killer. Ignoring this user defined configuration might lead to data corruptions or other misbehavior. The default value is 0. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko Acked-by: Johannes Weiner Cc

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-27 Thread Roman Gushchin
On Thu, Oct 26, 2017 at 02:03:41PM -0700, David Rientjes wrote: > On Thu, 26 Oct 2017, Johannes Weiner wrote: > > > > The nack is for three reasons: > > > > > > (1) unfair comparison of root mem cgroup usage to bias against that mem > > > cgroup from oom kill in system oom conditions, > >

  1   2   3   4   5   6   7   8   9   10   >