RE: [PATCH] Make core_pattern support namespace
Hi, Biederman > -Original Message- > From: Eric W. Biederman [mailto:ebied...@xmission.com] > Sent: Friday, February 19, 2016 4:18 AM > To: Zhao Lei > Cc: 'Mateusz Guzik' ; > contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] Make core_pattern support namespace > > Zhao Lei writes: > > > Hi, Mateusz Guzik > > > >> -Original Message- > >> From: Mateusz Guzik [mailto:mgu...@redhat.com] > >> Sent: Thursday, February 18, 2016 4:54 AM > >> To: Eric W. Biederman > >> Cc: Zhao Lei ; > contain...@lists.linux-foundation.org; > >> linux-kernel@vger.kernel.org > >> Subject: Re: [PATCH] Make core_pattern support namespace > >> > >> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote: > >> > Mateusz Guzik writes: > >> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote: > >> > >> For container based on namespace design, it is good to allow > >> > >> each container keeping their own coredump setting. > >> > > > >> > > Sorry if this is a false alarm, I don't have easy means to test it, but > >> > > is not this an immediate privilege escalation? > >> > > >> > It is. This is why we do not currently have a per namespace setting. > >> > > >> > >> Thanks for confimation. > >> > >> > Solving the user mode helper problem is technically a fair amount of > >> > work, if not theoretically challenging. > >> > > >> > >> Well, I would say custom core_patterns without pipe support are still > >> better than none. > >> > > +1. > > -1. > > The problem is solvable. It is just a matter of effort to build the > necessary infrastructure and make certain everything works correctly. > Writting a pipe for both host and container have some limit: 1: All host who wantting to run container can not custom core_patterns to other value, it is to say, core_patterns will turn to be a const value in linux release with container support. 2: If a host support 2 types of container manager, for example, lxc and docker, each manager will try to modify host's core_patterns to its internal pipe program, and cause competition. 3: container can not modify core_patterns for its need, it is not like a real system. > >> Say one would ensure a stable core_pattern (i.e. that it cannot be > >> modified as it is being parsed) and a restricted set of allowed > >> characters in the pattern (which would not include the pipe), validated > >> when one attempts to set the pattern. > >> > >> Does this sound acceptable? If so, and there are no counter ideas from > >> Lei, I can get around to that. > >> > > If we can let kernel select pipe_program in vm's filesystem, and run > > pipe_program with vm's filesystem, set a pipe for core_patterm in vm > > will be safe. > > What is your opinion on above solution? > > Please see the other thread about user mode helpers that is current > active on the container mailling list. > User mode helpers is discussed in other threads, but we hadn't get a conclusion to answer is user mode helpers better than letting kernel support core_pattern in namespace, just as our discussing in this thread. > > If above way is not acceptable, or impossible to realize, I also > > agree your solution of limit vm setting pipe. > > I honestly think have a fully capable system that we have now that is > capable of using setns and entering a containers context is better than > something half baked. The solution either needs to support everything > core_pattern does today but correctly in a container environment. > If we can fix problem of "the pipe dumping data to host filesystem", both host and container will able to support full core_pattern. > To make the case that something does not need to be supported, a > convincing argument needs to be presented and tested that no one ever > does that. Without such an argument you will be breaking userspace > in a different way. Not actually fixing things. > It is same problem with above. When we fixed it, all container can be free to set core_pattern without breaking host env, and the every container manager don't need to add special argument for setting core_pattern. > My baseline reference implementation of all of this is that it is > possible when a sufficiently privileged process writes to core_pattern > to fork a child with the same environment and context as the writer. > That
Re: [PATCH] Make core_pattern support namespace
Zhao Lei writes: > Hi, Mateusz Guzik > >> -Original Message- >> From: Mateusz Guzik [mailto:mgu...@redhat.com] >> Sent: Thursday, February 18, 2016 4:54 AM >> To: Eric W. Biederman >> Cc: Zhao Lei ; contain...@lists.linux-foundation.org; >> linux-kernel@vger.kernel.org >> Subject: Re: [PATCH] Make core_pattern support namespace >> >> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote: >> > Mateusz Guzik writes: >> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote: >> > >> For container based on namespace design, it is good to allow >> > >> each container keeping their own coredump setting. >> > > >> > > Sorry if this is a false alarm, I don't have easy means to test it, but >> > > is not this an immediate privilege escalation? >> > >> > It is. This is why we do not currently have a per namespace setting. >> > >> >> Thanks for confimation. >> >> > Solving the user mode helper problem is technically a fair amount of >> > work, if not theoretically challenging. >> > >> >> Well, I would say custom core_patterns without pipe support are still >> better than none. >> > +1. -1. The problem is solvable. It is just a matter of effort to build the necessary infrastructure and make certain everything works correctly. >> Say one would ensure a stable core_pattern (i.e. that it cannot be >> modified as it is being parsed) and a restricted set of allowed >> characters in the pattern (which would not include the pipe), validated >> when one attempts to set the pattern. >> >> Does this sound acceptable? If so, and there are no counter ideas from >> Lei, I can get around to that. >> > If we can let kernel select pipe_program in vm's filesystem, and run > pipe_program with vm's filesystem, set a pipe for core_patterm in vm > will be safe. > What is your opinion on above solution? Please see the other thread about user mode helpers that is current active on the container mailling list. > If above way is not acceptable, or impossible to realize, I also > agree your solution of limit vm setting pipe. I honestly think have a fully capable system that we have now that is capable of using setns and entering a containers context is better than something half baked. The solution either needs to support everything core_pattern does today but correctly in a container environment. To make the case that something does not need to be supported, a convincing argument needs to be presented and tested that no one ever does that. Without such an argument you will be breaking userspace in a different way. Not actually fixing things. My baseline reference implementation of all of this is that it is possible when a sufficiently privileged process writes to core_pattern to fork a child with the same environment and context as the writer. That forked child could then become a kernel thread and fork any additional children needed as user mode helpers. Eric
RE: [PATCH] Make core_pattern support namespace
Hi, Mateusz Guzik > -Original Message- > From: Mateusz Guzik [mailto:mgu...@redhat.com] > Sent: Thursday, February 18, 2016 4:54 AM > To: Eric W. Biederman > Cc: Zhao Lei ; contain...@lists.linux-foundation.org; > linux-kernel@vger.kernel.org > Subject: Re: [PATCH] Make core_pattern support namespace > > On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote: > > Mateusz Guzik writes: > > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote: > > >> For container based on namespace design, it is good to allow > > >> each container keeping their own coredump setting. > > > > > > Sorry if this is a false alarm, I don't have easy means to test it, but > > > is not this an immediate privilege escalation? > > > > It is. This is why we do not currently have a per namespace setting. > > > > Thanks for confimation. > > > Solving the user mode helper problem is technically a fair amount of > > work, if not theoretically challenging. > > > > Well, I would say custom core_patterns without pipe support are still > better than none. > +1. > Say one would ensure a stable core_pattern (i.e. that it cannot be > modified as it is being parsed) and a restricted set of allowed > characters in the pattern (which would not include the pipe), validated > when one attempts to set the pattern. > > Does this sound acceptable? If so, and there are no counter ideas from > Lei, I can get around to that. > If we can let kernel select pipe_program in vm's filesystem, and run pipe_program with vm's filesystem, set a pipe for core_patterm in vm will be safe. What is your opinion on above solution? If above way is not acceptable, or impossible to realize, I also agree your solution of limit vm setting pipe. Thanks Zhaolei > -- > Mateusz Guzik
RE: [PATCH] Make core_pattern support namespace
Hi, Mateusz Guzik Thanks for your detailed comment and suggestion on this patch. > -Original Message- > From: Mateusz Guzik [mailto:mgu...@redhat.com] > Sent: Tuesday, February 16, 2016 10:26 PM > To: Zhao Lei > Cc: linux-kernel@vger.kernel.org; contain...@lists.linux-foundation.org > Subject: Re: [PATCH] Make core_pattern support namespace > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote: > > Currently, each container shared one copy of coredump setting > > with the host system, if host system changed the setting, each > > running containers will be affected. > > > > Moreover, it is not easy to let each container keeping their own > > coredump setting. > > > > We can use some workaround as pipe program to make the second > > requirement possible, but it is not simple, and both host and > > container are limited to set to fixed pipe program. > > In one word, for host running contailer, we can't change core_pattern > > anymore. > > To make the problem more hard, if a host running more than one > > container product, each product will try to snatch the global > > coredump setting to fit their own requirement. > > > > For container based on namespace design, it is good to allow > > each container keeping their own coredump setting. > > > > It will bring us following benefit: > > 1: Each container can change their own coredump setting > >based on operation on /proc/sys/kernel/core_pattern > > 2: Coredump setting changed in host will not affect > >running containers. > > 3: Support both case of "putting coredump in guest" and > >"putting curedump in host". > > > > Each namespace-based software(lxc, docker, ..) can use this function > > to custom their dump setting. > > > > And this function makes each continer working as separate system, > > it fit for design goal of namespace. > > > > Sorry if this is a false alarm, I don't have easy means to test it, but > is not this an immediate privilege escalation? > > In particular, if the new pattern was to include a pipe, would not it > cause the kernel to run whatever the namespace put in there, but on the > "host"? > In my mind, if user don't run vm in with privilege request, the container manager can do following processing: 1: User set custom core_pattern in starting container, as: # lxc-start --core-pattern='xxx' -n vm01 or # docker run --core-pattern='xxx' my_image or set CORE_PATTERN in vm's config file 2: Container manager(lxc or docker) mount /proc as rw in init period, and set core_pattern. 3: Container manager remount /proc as ro, and give vm to user. User/program in vm can not change core_pattern setting, just as lxc and docker's current default behavor. If user run vm with privilege request, it will be a problem. User/process in vm can change core-pattern setting, if change to a file, it is not a problem, because it is a file in vm's filesystem. But if changed to a pipe, it is a problem in current kernel design, the pipe program is in host's filesystem, it means the vm can change host. In short: Run vm in non-privilege | do anything | no problem Run vm in privilege | change core_pattern to file | no problem Run vm in privilege | change core_pattern to pipe | IS problem As a solution, maybe we can modify kernel to search the pipe program in vm's filesystem, and the pipe problem runs in vm's fs context, so a vm can only put dump file in its private filesystem, except vm manager link the dir into host. Another solution is to only allow vm change core_pattern to file(avoid pipe), but is looks strange, the vm should have same core_dump setting function with normal system. > The feature definitely looks useful, but this is another privileged > operation which is now made available to unprivileged users. This poses > some questions: > - should the namespace be allowed to set anything it wants, including > pipes? I would argue the latter should be disallowed for simplicity Yes, current patch does allow, it can be fixed in above way. > - now that the pattern can change at any time by namespace root, it > becomes fishy to parse it for filename generation without any > protection against concurrent modification Before this patch, kernel can works corrent, so I think kernel already have lock to avoid changing and reading the cure_pattern buffer in same time. This patch only modify the buffer place, so the existence lock will still can work, I'll confirm it in source. > - how do you ensure that namespaces which did not explicitely set their > pattern still get updates from the host? > > That sai
Re: [PATCH] Make core_pattern support namespace
On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote: > Mateusz Guzik writes: > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote: > >> For container based on namespace design, it is good to allow > >> each container keeping their own coredump setting. > > > > Sorry if this is a false alarm, I don't have easy means to test it, but > > is not this an immediate privilege escalation? > > It is. This is why we do not currently have a per namespace setting. > Thanks for confimation. > Solving the user mode helper problem is technically a fair amount of > work, if not theoretically challenging. > Well, I would say custom core_patterns without pipe support are still better than none. Say one would ensure a stable core_pattern (i.e. that it cannot be modified as it is being parsed) and a restricted set of allowed characters in the pattern (which would not include the pipe), validated when one attempts to set the pattern. Does this sound acceptable? If so, and there are no counter ideas from Lei, I can get around to that. -- Mateusz Guzik
Re: [PATCH] Make core_pattern support namespace
Mateusz Guzik writes: > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote: >> Currently, each container shared one copy of coredump setting >> with the host system, if host system changed the setting, each >> running containers will be affected. >> >> Moreover, it is not easy to let each container keeping their own >> coredump setting. >> >> We can use some workaround as pipe program to make the second >> requirement possible, but it is not simple, and both host and >> container are limited to set to fixed pipe program. >> In one word, for host running contailer, we can't change core_pattern >> anymore. >> To make the problem more hard, if a host running more than one >> container product, each product will try to snatch the global >> coredump setting to fit their own requirement. >> >> For container based on namespace design, it is good to allow >> each container keeping their own coredump setting. >> >> It will bring us following benefit: >> 1: Each container can change their own coredump setting >>based on operation on /proc/sys/kernel/core_pattern >> 2: Coredump setting changed in host will not affect >>running containers. >> 3: Support both case of "putting coredump in guest" and >>"putting curedump in host". >> >> Each namespace-based software(lxc, docker, ..) can use this function >> to custom their dump setting. >> >> And this function makes each continer working as separate system, >> it fit for design goal of namespace. >> > > Sorry if this is a false alarm, I don't have easy means to test it, but > is not this an immediate privilege escalation? It is. This is why we do not currently have a per namespace setting. Solving the user mode helper problem is technically a fair amount of work, if not theoretically challenging. Eric
Re: [PATCH] Make core_pattern support namespace
On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote: > Currently, each container shared one copy of coredump setting > with the host system, if host system changed the setting, each > running containers will be affected. > > Moreover, it is not easy to let each container keeping their own > coredump setting. > > We can use some workaround as pipe program to make the second > requirement possible, but it is not simple, and both host and > container are limited to set to fixed pipe program. > In one word, for host running contailer, we can't change core_pattern > anymore. > To make the problem more hard, if a host running more than one > container product, each product will try to snatch the global > coredump setting to fit their own requirement. > > For container based on namespace design, it is good to allow > each container keeping their own coredump setting. > > It will bring us following benefit: > 1: Each container can change their own coredump setting >based on operation on /proc/sys/kernel/core_pattern > 2: Coredump setting changed in host will not affect >running containers. > 3: Support both case of "putting coredump in guest" and >"putting curedump in host". > > Each namespace-based software(lxc, docker, ..) can use this function > to custom their dump setting. > > And this function makes each continer working as separate system, > it fit for design goal of namespace. > Sorry if this is a false alarm, I don't have easy means to test it, but is not this an immediate privilege escalation? In particular, if the new pattern was to include a pipe, would not it cause the kernel to run whatever the namespace put in there, but on the "host"? The feature definitely looks useful, but this is another privileged operation which is now made available to unprivileged users. This poses some questions: - should the namespace be allowed to set anything it wants, including pipes? I would argue the latter should be disallowed for simplicity - now that the pattern can change at any time by namespace root, it becomes fishy to parse it for filename generation without any protection against concurrent modification - how do you ensure that namespaces which did not explicitely set their pattern still get updates from the host? That said, I suggest adding an allocated buffer to hold it or be NULL otherwise. If the pointer is NULL, the "host" pattern is used. For dumping purposes the code could just kmalloc, rcu_read_lock, memcpy and be done with it. Or, if it is acceptable given the size, just have a buffer on the stack. Finally, the code setting it could simply xchg the pointer to avoid locks if there is no good lock to be taken here, and then kfree_rcu the old buffer. Just my $0,03. -- Mateusz Guzik
[PATCH] Make core_pattern support namespace
Currently, each container shared one copy of coredump setting with the host system, if host system changed the setting, each running containers will be affected. Moreover, it is not easy to let each container keeping their own coredump setting. We can use some workaround as pipe program to make the second requirement possible, but it is not simple, and both host and container are limited to set to fixed pipe program. In one word, for host running contailer, we can't change core_pattern anymore. To make the problem more hard, if a host running more than one container product, each product will try to snatch the global coredump setting to fit their own requirement. For container based on namespace design, it is good to allow each container keeping their own coredump setting. It will bring us following benefit: 1: Each container can change their own coredump setting based on operation on /proc/sys/kernel/core_pattern 2: Coredump setting changed in host will not affect running containers. 3: Support both case of "putting coredump in guest" and "putting curedump in host". Each namespace-based software(lxc, docker, ..) can use this function to custom their dump setting. And this function makes each continer working as separate system, it fit for design goal of namespace. Test(in lxc): # In the host # # echo host_core >/proc/sys/kernel/core_pattern # cat /proc/sys/kernel/core_pattern host_core # ulimit -c 1024000 # ./make_dump Segmentation fault (core dumped) # ls -l -rw--- 1 root root 331776 Feb 4 18:02 host_core.2175 -rwxr-xr-x 1 root root 759731 Feb 4 18:01 make_dump # # In the container # # cat /proc/sys/kernel/core_pattern host_core # echo container_core >/proc/sys/kernel/core_pattern # ./make_dump Segmentation fault (core dumped) # ls -l -rwxr-xr-x1 root root 759731 Feb 4 10:45 make_dump -rw---1 root root 331776 Feb 4 10:45 container_core.16 # # Return to host # # cat /proc/sys/kernel/core_pattern host_core # ls host_core.2175 make_dump make_dump.c # rm -f host_core.2175 # ./make_dump Segmentation fault (core dumped) # ls -l -rw--- 1 root root 331776 Feb 4 18:49 host_core.2351 -rwxr-xr-x 1 root root 759731 Feb 4 18:01 make_dump # Signed-off-by: Zhao Lei --- fs/coredump.c | 3 +-- include/linux/pid_namespace.h | 2 ++ kernel/pid.c | 1 + kernel/pid_namespace.c| 3 +++ kernel/sysctl.c | 22 -- 5 files changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 9ea87e9..8a7ef9b 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -46,7 +46,6 @@ int core_uses_pid; unsigned int core_pipe_limit; -char core_pattern[CORENAME_MAX_SIZE] = "core"; static int core_name_size = CORENAME_MAX_SIZE; struct core_name { @@ -183,7 +182,7 @@ put_exe_file: static int format_corename(struct core_name *cn, struct coredump_params *cprm) { const struct cred *cred = current_cred(); - const char *pat_ptr = core_pattern; + const char *pat_ptr = current->nsproxy->pid_ns_for_children->core_pattern; int ispipe = (*pat_ptr == '|'); int pid_in_pattern = 0; int err = 0; diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 918b117..a5af1e9 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -9,6 +9,7 @@ #include #include #include +#include struct pidmap { atomic_t nr_free; @@ -45,6 +46,7 @@ struct pid_namespace { int hide_pid; int reboot; /* group exit code if this pidns was rebooted */ struct ns_common ns; + char core_pattern[CORENAME_MAX_SIZE]; }; extern struct pid_namespace init_pid_ns; diff --git a/kernel/pid.c b/kernel/pid.c index 4d73a83..c79c1d5 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -83,6 +83,7 @@ struct pid_namespace init_pid_ns = { #ifdef CONFIG_PID_NS .ns.ops = &pidns_operations, #endif + .core_pattern = "core", }; EXPORT_SYMBOL_GPL(init_pid_ns); diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index a65ba13..16d6d21 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -123,6 +123,9 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns for (i = 1; i < PIDMAP_ENTRIES; i++) atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE); + strncpy(ns->core_pattern, parent_pid_ns->core_pattern, + sizeof(ns->core_pattern)); + return ns; out_free_map: diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 97715fd..70f8af5 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -100,7 +100,6 @@ extern int suid_dumpable; #ifdef CONFIG_COREDUMP extern int core_uses_pid; -extern char core_pattern[]; extern unsigned int core_pipe_limit; #endif extern int pid_max; @@ -469,7 +468,7 @@ stat