Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

2021-04-19 Thread Giuseppe Scrivano
wer_first seems unused? >> + >> +if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) >> + extent0 = _map->extent[idx]; >> +else >> +extent0 = _map->forward[idx]; >> +if (extent0->lower_first == 0) >> +break; >> + >> +extent0 = NULL; >> +} Tested-by: Giuseppe Scrivano Regards, Giuseppe

Re: [PATCH] kernel: automatically split user namespace extent

2021-04-02 Thread Giuseppe Scrivano
Hi Serge, "Serge E. Hallyn" writes: > On Wed, Dec 02, 2020 at 05:12:27PM +0100, Giuseppe Scrivano wrote: >> Hi Eric, >> >> ebied...@xmission.com (Eric W. Biederman) writes: >> >> > Nit: The tag should have been "userns:"

[PATCH v2] userns: automatically split user namespace extent

2020-12-03 Thread Giuseppe Scrivano
100 & [1] 1552 $ printf "0 0 100\n" | tee /proc/$!/uid_map 0 0 100 $ cat /proc/$!/uid_map 0 0 1 1 1 99 Signed-off-by: Giuseppe Scrivano --- v2: - move the split logic when the extent are mapped to the parent map to reduce l

Re: [PATCH] kernel: automatically split user namespace extent

2020-12-02 Thread Giuseppe Scrivano
Hi Eric, ebied...@xmission.com (Eric W. Biederman) writes: > Nit: The tag should have been "userns:" rather than kernel. > > Giuseppe Scrivano writes: > >> writing to the id map fails when an extent overlaps multiple mappings >> in the parent user namespace, e.

[PATCH] kernel: automatically split user namespace extent

2020-11-26 Thread Giuseppe Scrivano
100 & [1] 1552 $ printf "0 0 100\n" | tee /proc/$!/uid_map 0 0 100 $ cat /proc/$!/uid_map 0 0 1 1 1 99 Signed-off-by: Giuseppe Scrivano --- kernel/user_namespace.c | 62 ++--- 1 file changed, 52

[PATCH v3 0/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-11-18 Thread Giuseppe Scrivano
() to set the close-on-exec bits in the bitmap. - add test with rlimit(RLIMIT_NOFILE) in place. - use "cur_max" that is already used by close_range(..., 0). v1: https://lkml.kernel.org/lkml/20201013140609.2269319-1-gscri...@redhat.com/ Giuseppe Scrivano (2): fs, close_range

[PATCH v3 2/2] selftests: core: add tests for CLOSE_RANGE_CLOEXEC

2020-11-18 Thread Giuseppe Scrivano
doesn't affect the result. Signed-off-by: Giuseppe Scrivano --- .../testing/selftests/core/close_range_test.c | 74 +++ 1 file changed, 74 insertions(+) diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c index

[PATCH v3 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-11-18 Thread Giuseppe Scrivano
d multiple times to account for the different ranges that must be closed. Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues. The runtime is able to use the existing open fds, the seccomp profile can block close_range() and the syscalls used for its fallback. Signed-off-by: Giusepp

Re: [PATCH v2 0/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-29 Thread Giuseppe Scrivano
Hi Christian, Christian Brauner writes: > On Mon, Oct 19, 2020 at 12:26:52PM +0200, Giuseppe Scrivano wrote: >> When the new flag is used, close_range will set the close-on-exec bit >> for the file descriptors instead of close()-ing them. >> >> It is useful

Re: LPC 2020 Hackroom Session: summary and next steps for isolated user namespaces

2020-10-19 Thread Giuseppe Scrivano
"Serge E. Hallyn" writes: > On Tue, Oct 13, 2020 at 05:17:36PM +0200, Giuseppe Scrivano wrote: >> "Serge E. Hallyn" writes: >> >> > On Mon, Oct 12, 2020 at 07:05:10PM +0200, Giuseppe Scrivano wrote: >> >> Josh Triplett writes: >>

[PATCH v2 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-19 Thread Giuseppe Scrivano
d multiple times to account for the different ranges that must be closed. Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues. The runtime is able to use the open fds and the seccomp profile could block close_range() and the syscalls used for its fallback. Signed-off-by: Giusepp

[PATCH v2 2/2] selftests: add tests for CLOSE_RANGE_CLOEXEC

2020-10-19 Thread Giuseppe Scrivano
Signed-off-by: Giuseppe Scrivano --- .../testing/selftests/core/close_range_test.c | 74 +++ 1 file changed, 74 insertions(+) diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c index c99b98b0d461..c9db282158bb 100644

[PATCH v2 0/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-19 Thread Giuseppe Scrivano
_range(..., 0). Giuseppe Scrivano (2): fs, close_range: add flag CLOSE_RANGE_CLOEXEC selftests: add tests for CLOSE_RANGE_CLOEXEC fs/file.c | 44 --- include/uapi/linux/close_range.h | 3 + .../testing/selftests/core/close_rang

Re: [PATCH 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-13 Thread Giuseppe Scrivano
Christian Brauner writes: > On Tue, Oct 13, 2020 at 04:06:08PM +0200, Giuseppe Scrivano wrote: > > Hey Guiseppe, > > Thanks for the patch! > >> When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't >> immediately close the files but it sets the close-on-e

Re: LPC 2020 Hackroom Session: summary and next steps for isolated user namespaces

2020-10-13 Thread Giuseppe Scrivano
"Serge E. Hallyn" writes: > On Mon, Oct 12, 2020 at 07:05:10PM +0200, Giuseppe Scrivano wrote: >> Josh Triplett writes: >> >> > On Fri, Oct 09, 2020 at 11:26:06PM -0500, Serge E. Hallyn wrote: >> >> > 3. Find a way to allow setgroups() in

[PATCH 2/2] selftests: add tests for CLOSE_RANGE_CLOEXEC

2020-10-13 Thread Giuseppe Scrivano
Signed-off-by: Giuseppe Scrivano --- .../testing/selftests/core/close_range_test.c | 44 +++ 1 file changed, 44 insertions(+) diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c index c99b98b0d461..b8789262cd7d 100644

[PATCH 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-13 Thread Giuseppe Scrivano
When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't immediately close the files but it sets the close-on-exec bit. Signed-off-by: Giuseppe Scrivano --- fs/file.c| 56 ++-- include/uapi/linux/close_range.h | 3 ++ 2 files changed, 42

[PATCH 0/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-13 Thread Giuseppe Scrivano
the container process is executed. Giuseppe Scrivano (2): fs, close_range: add flag CLOSE_RANGE_CLOEXEC selftests: add tests for CLOSE_RANGE_CLOEXEC fs/file.c | 56 +-- include/uapi/linux/close_range.h | 3 + .../testing/selftests/core

Re: LPC 2020 Hackroom Session: summary and next steps for isolated user namespaces

2020-10-12 Thread Giuseppe Scrivano
Josh Triplett writes: > On Fri, Oct 09, 2020 at 11:26:06PM -0500, Serge E. Hallyn wrote: >> > 3. Find a way to allow setgroups() in a user namespace while keeping >> >in mind the case of groups used for negative access control. >> >This was suggested by Josh Triplett and Geoffrey Thomas.

Re: [PATCH] hugetlb_cgroup: convert comma to semicolon

2020-08-19 Thread Giuseppe Scrivano
Andrew Morton writes: > On Tue, 18 Aug 2020 06:43:33 + Xu Wang wrote: > >> Replace a comma between expression statements by a semicolon. >> >> ... >> >> --- a/mm/hugetlb_cgroup.c >> +++ b/mm/hugetlb_cgroup.c >> @@ -655,7 +655,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int >>

Re: [PATCH v2 2/3] seccomp: Introduce addfd ioctl to seccomp user notifier

2020-05-29 Thread Giuseppe Scrivano
eplacement of > specific file descriptors, following dup2-like semantics. > > Signed-off-by: Sargun Dhillon > Suggested-by: Matt Denton > Cc: Kees Cook , > Cc: Jann Horn , > Cc: Robert Sesek , > Cc: Chris Palmer > Cc: Christian Brauner > Cc: Tycho Andersen > --- Thanks, this is a really useful feature. Tested-by: Giuseppe Scrivano

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-21 Thread Giuseppe Scrivano
Al Viro <v...@zeniv.linux.org.uk> writes: > On Tue, Dec 19, 2017 at 07:40:43PM +0100, Giuseppe Scrivano wrote: >> Giuseppe Scrivano <gscri...@redhat.com> writes: >> >> > The only issue I've seen with my version is that if I do: >> > >> > #

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-21 Thread Giuseppe Scrivano
Al Viro writes: > On Tue, Dec 19, 2017 at 07:40:43PM +0100, Giuseppe Scrivano wrote: >> Giuseppe Scrivano writes: >> >> > The only issue I've seen with my version is that if I do: >> > >> > # unshare -im /bin/sh >> > # mount -t mqu

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
Giuseppe Scrivano <gscri...@redhat.com> writes: > The only issue I've seen with my version is that if I do: > > # unshare -im /bin/sh > # mount -t mqueue mqueue /dev/mqueue > # touch /dev/mqueue/foo > # umount /dev/mqueue > # mount -t mqueue mqueue /dev/mqueue >

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
Giuseppe Scrivano writes: > The only issue I've seen with my version is that if I do: > > # unshare -im /bin/sh > # mount -t mqueue mqueue /dev/mqueue > # touch /dev/mqueue/foo > # umount /dev/mqueue > # mount -t mqueue mqueue /dev/mqueue > > then /dev/mqueue/foo

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
Dmitry Vyukov writes: >> Unrelated issue, but register_filesystem() should be the last thing >> module_init() of a filesystem driver does. It's a separate story, >> in any case... > > Giuseppe, what report is this? > If there is a reproducer, you can ask syzbot to test a

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
Dmitry Vyukov writes: >> Unrelated issue, but register_filesystem() should be the last thing >> module_init() of a filesystem driver does. It's a separate story, >> in any case... > > Giuseppe, what report is this? > If there is a reproducer, you can ask syzbot to test a patch. I have tried

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
Al Viro <v...@zeniv.linux.org.uk> writes: > On Tue, Dec 19, 2017 at 11:48:19AM +, Al Viro wrote: >> On Tue, Dec 19, 2017 at 11:14:40AM +0100, Giuseppe Scrivano wrote: >> > mqueue_evict_inode() doesn't access the ipc namespace if it was >> > already free

Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
Al Viro writes: > On Tue, Dec 19, 2017 at 11:48:19AM +, Al Viro wrote: >> On Tue, Dec 19, 2017 at 11:14:40AM +0100, Giuseppe Scrivano wrote: >> > mqueue_evict_inode() doesn't access the ipc namespace if it was >> > already freed. It can happen if in a

[PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
b fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 8801c51bb300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb == Reported-by: syzbot <syzkal...@googlegroups.com> Signed-off-by: Giuseppe Scrivano <gscri...@redhat.c

[PATCH linux-next] mqueue: fix IPC namespace use-after-free

2017-12-19 Thread Giuseppe Scrivano
b fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 8801c51bb300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb == Reported-by: syzbot Signed-off-by: Giuseppe Scrivano --- include/linux/ipc_namespace.h | 3 ++- ipc/mqueue.c

Re: [PATCH] exit: move exit_task_namespaces() after exit_task_work()

2017-12-16 Thread Giuseppe Scrivano
Cong Wang writes: > On Thu, Dec 14, 2017 at 1:08 PM, Al Viro wrote: >> On Thu, Dec 14, 2017 at 12:17:57PM -0800, Cong Wang wrote: >>> syzbot reported we have a use-after-free when mqueue_evict_inode() >>> is called on __cleanup_mnt() path,

Re: [PATCH] exit: move exit_task_namespaces() after exit_task_work()

2017-12-16 Thread Giuseppe Scrivano
Cong Wang writes: > On Thu, Dec 14, 2017 at 1:08 PM, Al Viro wrote: >> On Thu, Dec 14, 2017 at 12:17:57PM -0800, Cong Wang wrote: >>> syzbot reported we have a use-after-free when mqueue_evict_inode() >>> is called on __cleanup_mnt() path, where the ipc ns is already >>> freed by the previous

[PATCH v3] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-12-06 Thread Giuseppe Scrivano
at all. On my machine, the time for creating 1000 new IPC namespaces dropped from ~8s to ~2s. Signed-off-by: Giuseppe Scrivano <gscri...@redhat.com> --- v2->v3: rebased on top of linux-next include/linux/ipc_namespace.h | 4 ++-- ipc/mqueue.c

[PATCH v3] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-12-06 Thread Giuseppe Scrivano
at all. On my machine, the time for creating 1000 new IPC namespaces dropped from ~8s to ~2s. Signed-off-by: Giuseppe Scrivano --- v2->v3: rebased on top of linux-next include/linux/ipc_namespace.h | 4 ++-- ipc/mqueue.c | 49 ++-

[PATCH v2] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-30 Thread Giuseppe Scrivano
at all. On my machine, the time for creating 1000 new IPC namespaces dropped from ~8s to ~2s. Signed-off-by: Giuseppe Scrivano <gscri...@redhat.com> --- v1 here: https://lkml.org/lkml/2017/11/27/427 v1 -> v2: Declare and use a mutex instead of a spinlock. include/linux/ipc_namespa

[PATCH v2] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-30 Thread Giuseppe Scrivano
at all. On my machine, the time for creating 1000 new IPC namespaces dropped from ~8s to ~2s. Signed-off-by: Giuseppe Scrivano --- v1 here: https://lkml.org/lkml/2017/11/27/427 v1 -> v2: Declare and use a mutex instead of a spinlock. include/linux/ipc_namespace.h | 2 +- ipc/mqueu

Re: [RFC PATCH] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-30 Thread Giuseppe Scrivano
Andrew Morton <a...@linux-foundation.org> writes: > On Wed, 29 Nov 2017 11:33:28 +0100 Giuseppe Scrivano <gscri...@redhat.com> > wrote: > >> Andrew Morton <a...@linux-foundation.org> writes: >> >> > OK, but this simply moves the expense

Re: [RFC PATCH] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-30 Thread Giuseppe Scrivano
Andrew Morton writes: > On Wed, 29 Nov 2017 11:33:28 +0100 Giuseppe Scrivano > wrote: > >> Andrew Morton writes: >> >> > OK, but this simply moves the expense so it happens later on. Why is >> > that better? >> >> the optimizati

Re: [RFC PATCH] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-29 Thread Giuseppe Scrivano
Andrew Morton writes: > OK, but this simply moves the expense so it happens later on. Why is > that better? the optimization is for new IPC namespaces that don't use mq_open. In this case there won't be any kern_mount_data cost at all. Regards, Giuseppe

Re: [RFC PATCH] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-29 Thread Giuseppe Scrivano
Andrew Morton writes: > OK, but this simply moves the expense so it happens later on. Why is > that better? the optimization is for new IPC namespaces that don't use mq_open. In this case there won't be any kern_mount_data cost at all. Regards, Giuseppe

[RFC PATCH] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-27 Thread Giuseppe Scrivano
kern_mount_data is a relatively expensive operation when creating a new IPC namespace, so delay the mount until its first usage when not creating the the global namespace. On my machine, the time for creating 1000 new IPC namespaces dropped from ~8s to ~2s. Signed-off-by: Giuseppe Scrivano

[RFC PATCH] ipc, mqueue: lazy call kern_mount_data in new namespaces

2017-11-27 Thread Giuseppe Scrivano
kern_mount_data is a relatively expensive operation when creating a new IPC namespace, so delay the mount until its first usage when not creating the the global namespace. On my machine, the time for creating 1000 new IPC namespaces dropped from ~8s to ~2s. Signed-off-by: Giuseppe Scrivano