Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
On Mon, Nov 23, 2015 at 03:45:23PM -0500, David Miller wrote: > > * Refreshed on top of Nina's net_cls dynamic config update fix patch. > > I included the fix patch as part of this series to ease reviewing. > > I put this into the 'net' tree as it's a bug fix, so can you respin > this after I next merge 'net' into 'net-next'? I'll let you know. Sure thing. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
From: Tejun Heo Date: Sat, 21 Nov 2015 11:13:52 -0500 > * Refreshed on top of Nina's net_cls dynamic config update fix patch. > I included the fix patch as part of this series to ease reviewing. I put this into the 'net' tree as it's a bug fix, so can you respin this after I next merge 'net' into 'net-next'? I'll let you know. There'll probably be at least some minor feedback meanwhile anyways. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
Hello, On Mon, Nov 23, 2015 at 10:53:46AM -0500, Tejun Heo wrote: > > [ 11.594536] [ cut here ] > > [ 11.595274] WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 > > pids_cancel.constprop.6+0x31/0x40() > > [ 11.595958] Modules linked in: > > [ 11.596199] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #196 > > [ 11.596689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 > > [ 11.597632] 81f66d8b 88007c04bb90 8155ccdc > > > > [ 11.598234] 88007c04bbc8 810de202 8800793dda00 > > 88007a096800 > > [ 11.598877] 88007c04bc80 88007a6b6200 0001 > > 88007c04bbd8 > > [ 11.599547] Call Trace: > > [ 11.599784] [] dump_stack+0x4e/0x82 > > [ 11.600197] [] warn_slowpath_common+0x82/0xc0 > > [ 11.600705] [] warn_slowpath_null+0x1a/0x20 > > [ 11.601208] [] pids_cancel.constprop.6+0x31/0x40 > > [ 11.601764] [] pids_can_attach+0x6d/0xf0 > > Yeah, this is a known problem regarding css's lifetime. Working on > it. The earlier dump, I think, is likely to have been caused by the > same issue. Just posted the fix for this issue. Can you please verify the fix? http://lkml.kernel.org/g/20151123195541.ga19...@mtj.duckdns.org Thanks a lot! -- tejun -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
On 11/23/2015 04:53 PM, Tejun Heo wrote: > On Mon, Nov 23, 2015 at 09:54:32AM +0100, Daniel Wagner wrote: > ... >>> [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1 >>> [3.225653] lock: cgroup_sk_update_lock+0x0/0x60, .magic: , >>> .owner: systemd/1, .owner_cpu: 1 >>> [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195 >>> [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 >>> [3.228906] 834a2160 88007c043ad0 81551edc >>> 88007c028000 >>> [3.229512] 88007c043af0 81136868 834a2160 >>> 88007aff5940 >>> [3.230105] 88007c043b08 81136b05 834a2160 >>> 88007c043b20 >>> [3.230716] Call Trace: >>> [3.230906] [] dump_stack+0x4e/0x82 >>> [3.231289] [] spin_dump+0x78/0xc0 >>> [3.231642] [] do_raw_spin_unlock+0x75/0xd0 >>> [3.232039] [] _raw_spin_unlock+0x27/0x50 >>> [3.232431] [] update_classid_sock+0x68/0x80 >>> [3.232836] [] iterate_fd+0x71/0x150 >>> [3.233197] [] update_classid+0x47/0x80 >>> [3.233571] [] cgrp_attach+0x14/0x20 >>> [3.233929] [] cgroup_taskset_migrate+0x1e1/0x330 >>> [3.234366] [] cgroup_migrate+0xf5/0x190 >>> [3.235130] [] cgroup_attach_task+0x176/0x200 >>> [3.235953] [] __cgroup_procs_write+0x2ad/0x460 >>> [3.236805] [] cgroup_procs_write+0x14/0x20 >>> [3.237205] [] cgroup_file_write+0x35/0x1c0 >>> [3.237600] [] kernfs_fop_write+0x141/0x190 >>> [3.237998] [] __vfs_write+0x28/0xe0 >>> [3.239554] [] vfs_write+0xac/0x1a0 >>> [3.240308] [] SyS_write+0x49/0xb0 >>> [3.240656] [] entry_SYSCALL_64_fastpath+0x12/0x76 >> >> I have enabled a few additional cgroup controllers as well, because I was >> trying to figure out why I only see the 'memory' cgroup controller in >> cgroup.controllers. pid and io show up but not net_prio or net_cls. >> Not sure why systemd (v227) is not mounting them. > > net_prio and net_cls aren't gonna be on the v2 hierarchy. The match > in this patchset is being introduced to replace them; however, you can > mount them separately on a v1 hierarchy and use the same as before. Okay, I could have figured that myself I guess. I mounted the v1 hierarchy and it works as you have described it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
On Mon, Nov 23, 2015 at 09:54:32AM +0100, Daniel Wagner wrote: ... > > [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1 > > [3.225653] lock: cgroup_sk_update_lock+0x0/0x60, .magic: , > > .owner: systemd/1, .owner_cpu: 1 > > [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195 > > [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 > > [3.228906] 834a2160 88007c043ad0 81551edc > > 88007c028000 > > [3.229512] 88007c043af0 81136868 834a2160 > > 88007aff5940 > > [3.230105] 88007c043b08 81136b05 834a2160 > > 88007c043b20 > > [3.230716] Call Trace: > > [3.230906] [] dump_stack+0x4e/0x82 > > [3.231289] [] spin_dump+0x78/0xc0 > > [3.231642] [] do_raw_spin_unlock+0x75/0xd0 > > [3.232039] [] _raw_spin_unlock+0x27/0x50 > > [3.232431] [] update_classid_sock+0x68/0x80 > > [3.232836] [] iterate_fd+0x71/0x150 > > [3.233197] [] update_classid+0x47/0x80 > > [3.233571] [] cgrp_attach+0x14/0x20 > > [3.233929] [] cgroup_taskset_migrate+0x1e1/0x330 > > [3.234366] [] cgroup_migrate+0xf5/0x190 > > [3.235130] [] cgroup_attach_task+0x176/0x200 > > [3.235953] [] __cgroup_procs_write+0x2ad/0x460 > > [3.236805] [] cgroup_procs_write+0x14/0x20 > > [3.237205] [] cgroup_file_write+0x35/0x1c0 > > [3.237600] [] kernfs_fop_write+0x141/0x190 > > [3.237998] [] __vfs_write+0x28/0xe0 > > [3.239554] [] vfs_write+0xac/0x1a0 > > [3.240308] [] SyS_write+0x49/0xb0 > > [3.240656] [] entry_SYSCALL_64_fastpath+0x12/0x76 > > I have enabled a few additional cgroup controllers as well, because I was > trying to figure out why I only see the 'memory' cgroup controller in > cgroup.controllers. pid and io show up but not net_prio or net_cls. > Not sure why systemd (v227) is not mounting them. net_prio and net_cls aren't gonna be on the v2 hierarchy. The match in this patchset is being introduced to replace them; however, you can mount them separately on a v1 hierarchy and use the same as before. > Though, after a while a similar call trace is produced. I guess this > has nothing to do with the current changes. > > [ 11.594536] [ cut here ] > [ 11.595274] WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 > pids_cancel.constprop.6+0x31/0x40() > [ 11.595958] Modules linked in: > [ 11.596199] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #196 > [ 11.596689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 > [ 11.597632] 81f66d8b 88007c04bb90 8155ccdc > > [ 11.598234] 88007c04bbc8 810de202 8800793dda00 > 88007a096800 > [ 11.598877] 88007c04bc80 88007a6b6200 0001 > 88007c04bbd8 > [ 11.599547] Call Trace: > [ 11.599784] [] dump_stack+0x4e/0x82 > [ 11.600197] [] warn_slowpath_common+0x82/0xc0 > [ 11.600705] [] warn_slowpath_null+0x1a/0x20 > [ 11.601208] [] pids_cancel.constprop.6+0x31/0x40 > [ 11.601764] [] pids_can_attach+0x6d/0xf0 Yeah, this is a known problem regarding css's lifetime. Working on it. The earlier dump, I think, is likely to have been caused by the same issue. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
On 11/23/2015 08:11 AM, Daniel Wagner wrote: > [3.217648] systemd[1]: tmp.mount: Directory /tmp to mount over is not > empty, mounting anyway. > [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1 > [3.225653] lock: cgroup_sk_update_lock+0x0/0x60, .magic: , > .owner: systemd/1, .owner_cpu: 1 > [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195 > [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 > [3.228906] 834a2160 88007c043ad0 81551edc > 88007c028000 > [3.229512] 88007c043af0 81136868 834a2160 > 88007aff5940 > [3.230105] 88007c043b08 81136b05 834a2160 > 88007c043b20 > [3.230716] Call Trace: > [3.230906] [] dump_stack+0x4e/0x82 > [3.231289] [] spin_dump+0x78/0xc0 > [3.231642] [] do_raw_spin_unlock+0x75/0xd0 > [3.232039] [] _raw_spin_unlock+0x27/0x50 > [3.232431] [] update_classid_sock+0x68/0x80 > [3.232836] [] iterate_fd+0x71/0x150 > [3.233197] [] update_classid+0x47/0x80 > [3.233571] [] cgrp_attach+0x14/0x20 > [3.233929] [] cgroup_taskset_migrate+0x1e1/0x330 > [3.234366] [] cgroup_migrate+0xf5/0x190 > [3.234747] [] ? cgroup_migrate+0x5/0x190 > [3.235130] [] cgroup_attach_task+0x176/0x200 > [3.235543] [] ? cgroup_attach_task+0x5/0x200 > [3.235953] [] __cgroup_procs_write+0x2ad/0x460 > [3.236377] [] ? __cgroup_procs_write+0x5e/0x460 > [3.236805] [] cgroup_procs_write+0x14/0x20 > [3.237205] [] cgroup_file_write+0x35/0x1c0 > [3.237600] [] kernfs_fop_write+0x141/0x190 > [3.237998] [] __vfs_write+0x28/0xe0 > [3.238361] [] ? percpu_down_read+0x57/0xa0 > [3.238761] [] ? __sb_start_write+0xb4/0xf0 > [3.239154] [] ? __sb_start_write+0xb4/0xf0 > [3.239554] [] vfs_write+0xac/0x1a0 > [3.239930] [] ? __fget_light+0x66/0x90 > [3.240308] [] SyS_write+0x49/0xb0 > [3.240656] [] entry_SYSCALL_64_fastpath+0x12/0x76 I have enabled a few additional cgroup controllers as well, because I was trying to figure out why I only see the 'memory' cgroup controller in cgroup.controllers. pid and io show up but not net_prio or net_cls. Not sure why systemd (v227) is not mounting them. Though, after a while a similar call trace is produced. I guess this has nothing to do with the current changes. [ 11.594536] [ cut here ] [ 11.595274] WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40() [ 11.595958] Modules linked in: [ 11.596199] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #196 [ 11.596689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 11.597632] 81f66d8b 88007c04bb90 8155ccdc [ 11.598234] 88007c04bbc8 810de202 8800793dda00 88007a096800 [ 11.598877] 88007c04bc80 88007a6b6200 0001 88007c04bbd8 [ 11.599547] Call Trace: [ 11.599784] [] dump_stack+0x4e/0x82 [ 11.600197] [] warn_slowpath_common+0x82/0xc0 [ 11.600705] [] warn_slowpath_null+0x1a/0x20 [ 11.601208] [] pids_cancel.constprop.6+0x31/0x40 [ 11.601764] [] pids_can_attach+0x6d/0xf0 [ 11.602245] [] cgroup_taskset_migrate+0x6a/0x330 [ 11.602795] [] cgroup_migrate+0xf5/0x190 [ 11.603276] [] ? cgroup_migrate+0x5/0x190 [ 11.603788] [] cgroup_attach_task+0x176/0x200 [ 11.604308] [] ? cgroup_attach_task+0x5/0x200 [ 11.604831] [] __cgroup_procs_write+0x2ad/0x460 [ 11.605367] [] ? __cgroup_procs_write+0x5e/0x460 [ 11.605929] [] cgroup_procs_write+0x14/0x20 [ 11.606448] [] cgroup_file_write+0x35/0x1c0 [ 11.606931] [] kernfs_fop_write+0x141/0x190 [ 11.607401] [] __vfs_write+0x28/0xe0 [ 11.607834] [] ? percpu_down_read+0x57/0xa0 [ 11.608366] [] ? __sb_start_write+0xb4/0xf0 [ 11.608874] [] ? __sb_start_write+0xb4/0xf0 [ 11.609343] [] vfs_write+0xac/0x1a0 [ 11.609843] [] ? __fget_light+0x66/0x90 [ 11.610315] [] SyS_write+0x49/0xb0 [ 11.610756] [] entry_SYSCALL_64_fastpath+0x12/0x76 [ 11.611305] ---[ end trace 7f953d0ce5af99ea ]--- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
Hi Tejun, On 11/21/2015 05:13 PM, Tejun Heo wrote: > This is v3 of the xt_cgroup2 patchset. Changes from the last take are > > * Folded cgroup2 path matching into xt_cgroup as a new revision rather > than a separate xt_cgroup2 match as suggested by Pablo. > > * Refreshed on top of Nina's net_cls dynamic config update fix patch. > I included the fix patch as part of this series to ease reviewing. I started to play with your patches and was greeted by this: [3.217648] systemd[1]: tmp.mount: Directory /tmp to mount over is not empty, mounting anyway. [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1 [3.225653] lock: cgroup_sk_update_lock+0x0/0x60, .magic: , .owner: systemd/1, .owner_cpu: 1 [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195 [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [3.228906] 834a2160 88007c043ad0 81551edc 88007c028000 [3.229512] 88007c043af0 81136868 834a2160 88007aff5940 [3.230105] 88007c043b08 81136b05 834a2160 88007c043b20 [3.230716] Call Trace: [3.230906] [] dump_stack+0x4e/0x82 [3.231289] [] spin_dump+0x78/0xc0 [3.231642] [] do_raw_spin_unlock+0x75/0xd0 [3.232039] [] _raw_spin_unlock+0x27/0x50 [3.232431] [] update_classid_sock+0x68/0x80 [3.232836] [] iterate_fd+0x71/0x150 [3.233197] [] update_classid+0x47/0x80 [3.233571] [] cgrp_attach+0x14/0x20 [3.233929] [] cgroup_taskset_migrate+0x1e1/0x330 [3.234366] [] cgroup_migrate+0xf5/0x190 [3.234747] [] ? cgroup_migrate+0x5/0x190 [3.235130] [] cgroup_attach_task+0x176/0x200 [3.235543] [] ? cgroup_attach_task+0x5/0x200 [3.235953] [] __cgroup_procs_write+0x2ad/0x460 [3.236377] [] ? __cgroup_procs_write+0x5e/0x460 [3.236805] [] cgroup_procs_write+0x14/0x20 [3.237205] [] cgroup_file_write+0x35/0x1c0 [3.237600] [] kernfs_fop_write+0x141/0x190 [3.237998] [] __vfs_write+0x28/0xe0 [3.238361] [] ? percpu_down_read+0x57/0xa0 [3.238761] [] ? __sb_start_write+0xb4/0xf0 [3.239154] [] ? __sb_start_write+0xb4/0xf0 [3.239554] [] vfs_write+0xac/0x1a0 [3.239930] [] ? __fget_light+0x66/0x90 [3.240308] [] SyS_write+0x49/0xb0 [3.240656] [] entry_SYSCALL_64_fastpath+0x12/0x76 I am using a Fedora 23 host with systemd.unified_cgroup_hierarchy=1. The config is available here: http://monom.org/cgroup/config-review-xt_cgroup2 Probably completely rubbish, because it's my random test config. cheers, daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
Oops, made a copy & paste error on Neil Horman's address. Sorry, Neil. The thread can be found at http://lkml.kernel.org/g/1448122441-9335-1-git-send-email...@kernel.org Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
Hello, This is v3 of the xt_cgroup2 patchset. Changes from the last take are * Folded cgroup2 path matching into xt_cgroup as a new revision rather than a separate xt_cgroup2 match as suggested by Pablo. * Refreshed on top of Nina's net_cls dynamic config update fix patch. I included the fix patch as part of this series to ease reviewing. The changes from v1 to v2 are * Instead of adding sock->sk_cgroup separately, sock->sk_cgrp_data now carries either (prioidx, classid) pair or cgroup2 pointer. This avoids inflating struct sock with yet another cgroup related field. Unfortunately, this does add some complexity but that's the trade-off and the complexity is contained in cgroup proper. * Various small updats as per David and Jan's reviews. In cgroup v1, dealing with cgroup membership was difficult because the number of membership associations was unbound. As a result, cgroup v1 grew several controllers whose primary purpose is either tagging membership or pull in configuration knobs from other subsystems so that cgroup membership test can be avoided. net_cls and net_prio controllers are examples of the latter. They allow configuring network-specific attributes from cgroup side so that network subsystem can avoid testing cgroup membership; unfortunately, these are not only cumbersome but also problematic. Both net_cls and net_prio aren't properly hierarchical. Both inherit configuration from the parent on creation but there's no interaction afterwards. An ancestor doesn't restrict the behavior in its subtree in anyway and configuration changes aren't propagated downwards. Especially when combined with cgroup delegation, this is problematic because delegatees can mess up whatever network configuration implemented at the system level. net_prio would allow the delegatees to set whatever priority value regardless of CAP_NET_ADMIN and net_cls the same for classid. While it is possible to solve these issues from controller side by implementing hierarchical allowable ranges in both controllers, it would involve quite a bit of complexity in the controllers and further obfuscate network configuration as it becomes even more difficult to tell what's actually being configured looking from the network side. While not much can be done for v1 at this point, as membership handling is sane on cgroup v2, it'd be better to make cgroup matching behave like other network matches and classifiers than introducing further complications. This patchset includes the following nine patches. 0001-cgroup-record-ancestor-IDs-and-reimplement-cgroup_is.patch 0002-kernfs-implement-kernfs_walk_and_get.patch 0003-cgroup-implement-cgroup_get_from_path-and-expose-cgr.patch 0004-cgroups-Allow-dynamically-changing-net_classid.patch 0005-netprio_cgroup-limit-the-maximum-css-id-to-USHRT_MAX.patch 0006-net-wrap-sock-sk_cgrp_prioidx-and-sk_classid-inside-.patch 0007-sock-cgroup-add-sock-sk_cgroup.patch 0008-netfilter-prepare-xt_cgroup-for-multi-revisions.patch 0009-netfilter-implement-xt_cgroup-cgroup2-path-match.patch 0001-0003 are prepatory patches in kernfs and cgroup. These patches are available in the following branch which will stay stable. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-4.5-ancestor-test 0004 is the following net_cls config update fix patch included in this series to ease reviewing as it causes a conflict with a later patch in this series. http://lkml.kernel.org/g/1448051499-1885574-1-git-send-email-nin...@fb.com 0005-0007 consolidate two cgroup related fields in struct sock into cgroup_sock_data and update it so that it can alternatively carry a cgroup pointer. 0008-0009 implement cgroup2 patch matching in xt_cgroup. This patchset is on top of v4.4-rc1 and also available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-xt_cgroup2 I'll post iptables extension as a reply. diffstat follows. Thanks. fs/kernfs/dir.c | 46 +++ include/linux/cgroup-defs.h | 126 +++ include/linux/cgroup.h | 66 +++- include/linux/kernfs.h | 12 ++ include/net/cls_cgroup.h | 11 +- include/net/netprio_cgroup.h | 16 +++ include/net/sock.h | 13 --- include/uapi/linux/netfilter/xt_cgroup.h | 15 +++ kernel/cgroup.c | 126 --- net/Kconfig |6 + net/core/dev.c |3 net/core/netclassid_cgroup.c | 37 ++--- net/core/netprio_cgroup.c| 19 net/core/scm.c |4 net/core/sock.c | 17 net/netfilter/nft_meta.c |2 net/netfilter/xt_cgroup.c| 108 ++ 17 files changed, 531 insertions(+