RE: [PATCH] Make core_pattern support namespace

2016-02-19 Thread Zhao Lei
Hi, Biederman

> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Friday, February 19, 2016 4:18 AM
> To: Zhao Lei 
> Cc: 'Mateusz Guzik' ;
> contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] Make core_pattern support namespace
> 
> Zhao Lei  writes:
> 
> > Hi, Mateusz Guzik
> >
> >> -Original Message-
> >> From: Mateusz Guzik [mailto:mgu...@redhat.com]
> >> Sent: Thursday, February 18, 2016 4:54 AM
> >> To: Eric W. Biederman 
> >> Cc: Zhao Lei ;
> contain...@lists.linux-foundation.org;
> >> linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH] Make core_pattern support namespace
> >>
> >> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
> >> > Mateusz Guzik  writes:
> >> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> >> > >> For container based on namespace design, it is good to allow
> >> > >> each container keeping their own coredump setting.
> >> > >
> >> > > Sorry if this is a false alarm, I don't have easy means to test it, but
> >> > > is not this an immediate privilege escalation?
> >> >
> >> > It is.  This is why we do not currently have a per namespace setting.
> >> >
> >>
> >> Thanks for confimation.
> >>
> >> > Solving the user mode helper problem is technically a fair amount of
> >> > work, if not theoretically challenging.
> >> >
> >>
> >> Well, I would say custom core_patterns without pipe support are still
> >> better than none.
> >>
> > +1.
> 
> -1.
> 
> The problem is solvable.  It is just a matter of effort to build the
> necessary infrastructure and make certain everything works correctly.
> 
Writting a pipe for both host and container have some limit:
1: All host who wantting to run container can not custom core_patterns
   to other value, it is to say, core_patterns will turn to be a const
   value in linux release with container support.
2: If a host support 2 types of container manager, for example,
   lxc and docker, each manager will try to modify host's core_patterns
   to its internal pipe program, and cause competition.
3: container can not modify core_patterns for its need, it is not like
   a real system.

> >> Say one would ensure a stable core_pattern (i.e. that it cannot be
> >> modified as it is being parsed) and a restricted set of allowed
> >> characters in the pattern (which would not include the pipe), validated
> >> when one attempts to set the pattern.
> >>
> >> Does this sound acceptable? If so, and there are no counter ideas from
> >> Lei, I can get around to that.
> >>
> > If we can let kernel select pipe_program in vm's filesystem, and run
> > pipe_program with vm's filesystem, set a pipe for core_patterm in vm
> > will be safe.
> > What is your opinion on above solution?
> 
> Please see the other thread about user mode helpers that is current
> active on the container mailling list.
> 
User mode helpers is discussed in other threads, but we hadn't get a
conclusion to answer is user mode helpers better than letting kernel
support core_pattern in namespace, just as our discussing in this thread.

> > If above way is not acceptable, or impossible to realize, I also
> > agree your solution of limit vm setting pipe.
> 
> I honestly think have a fully capable system that we have now that is
> capable of using setns and entering a containers context is better than
> something half baked.  The solution either needs to support everything
> core_pattern does today but correctly in a container environment.
> 
If we can fix problem of "the pipe dumping data to host filesystem",
both host and container will able to support full core_pattern.

> To make the case that something does not need to be supported, a
> convincing argument needs to be presented and tested that no one ever
> does that.  Without such an argument you will be breaking userspace
> in a different way.  Not actually fixing things.
> 
It is same problem with above.
When we fixed it, all container can be free to set core_pattern without
breaking host env, and the every container manager don't need to add
special argument for setting core_pattern.

> My baseline reference implementation of all of this is that it is
> possible when a sufficiently privileged process writes to core_pattern
> to fork a child with the same environment and context as the writer.
> That

Re: [PATCH] Make core_pattern support namespace

2016-02-18 Thread Eric W. Biederman
Zhao Lei  writes:

> Hi, Mateusz Guzik
>
>> -Original Message-
>> From: Mateusz Guzik [mailto:mgu...@redhat.com]
>> Sent: Thursday, February 18, 2016 4:54 AM
>> To: Eric W. Biederman 
>> Cc: Zhao Lei ; contain...@lists.linux-foundation.org;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH] Make core_pattern support namespace
>> 
>> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
>> > Mateusz Guzik  writes:
>> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
>> > >> For container based on namespace design, it is good to allow
>> > >> each container keeping their own coredump setting.
>> > >
>> > > Sorry if this is a false alarm, I don't have easy means to test it, but
>> > > is not this an immediate privilege escalation?
>> >
>> > It is.  This is why we do not currently have a per namespace setting.
>> >
>> 
>> Thanks for confimation.
>> 
>> > Solving the user mode helper problem is technically a fair amount of
>> > work, if not theoretically challenging.
>> >
>> 
>> Well, I would say custom core_patterns without pipe support are still
>> better than none.
>> 
> +1.

-1.

The problem is solvable.  It is just a matter of effort to build the
necessary infrastructure and make certain everything works correctly.

>> Say one would ensure a stable core_pattern (i.e. that it cannot be
>> modified as it is being parsed) and a restricted set of allowed
>> characters in the pattern (which would not include the pipe), validated
>> when one attempts to set the pattern.
>> 
>> Does this sound acceptable? If so, and there are no counter ideas from
>> Lei, I can get around to that.
>> 
> If we can let kernel select pipe_program in vm's filesystem, and run
> pipe_program with vm's filesystem, set a pipe for core_patterm in vm
> will be safe.
> What is your opinion on above solution?

Please see the other thread about user mode helpers that is current
active on the container mailling list.

> If above way is not acceptable, or impossible to realize, I also
> agree your solution of limit vm setting pipe.

I honestly think have a fully capable system that we have now that is
capable of using setns and entering a containers context is better than
something half baked.  The solution either needs to support everything
core_pattern does today but correctly in a container environment.

To make the case that something does not need to be supported, a
convincing argument needs to be presented and tested that no one ever
does that.  Without such an argument you will be breaking userspace
in a different way.  Not actually fixing things.

My baseline reference implementation of all of this is that it is
possible when a sufficiently privileged process writes to core_pattern
to fork a child with the same environment and context as the writer.
That forked child could then become a kernel thread and fork any
additional children needed as user mode helpers.

Eric


RE: [PATCH] Make core_pattern support namespace

2016-02-18 Thread Zhao Lei
Hi, Mateusz Guzik

> -Original Message-
> From: Mateusz Guzik [mailto:mgu...@redhat.com]
> Sent: Thursday, February 18, 2016 4:54 AM
> To: Eric W. Biederman 
> Cc: Zhao Lei ; contain...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] Make core_pattern support namespace
> 
> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
> > Mateusz Guzik  writes:
> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> > >> For container based on namespace design, it is good to allow
> > >> each container keeping their own coredump setting.
> > >
> > > Sorry if this is a false alarm, I don't have easy means to test it, but
> > > is not this an immediate privilege escalation?
> >
> > It is.  This is why we do not currently have a per namespace setting.
> >
> 
> Thanks for confimation.
> 
> > Solving the user mode helper problem is technically a fair amount of
> > work, if not theoretically challenging.
> >
> 
> Well, I would say custom core_patterns without pipe support are still
> better than none.
> 
+1.

> Say one would ensure a stable core_pattern (i.e. that it cannot be
> modified as it is being parsed) and a restricted set of allowed
> characters in the pattern (which would not include the pipe), validated
> when one attempts to set the pattern.
> 
> Does this sound acceptable? If so, and there are no counter ideas from
> Lei, I can get around to that.
> 
If we can let kernel select pipe_program in vm's filesystem, and run
pipe_program with vm's filesystem, set a pipe for core_patterm in vm
will be safe.
What is your opinion on above solution?

If above way is not acceptable, or impossible to realize, I also
agree your solution of limit vm setting pipe.

Thanks
Zhaolei

> --
> Mateusz Guzik






RE: [PATCH] Make core_pattern support namespace

2016-02-18 Thread Zhao Lei
Hi, Mateusz Guzik

Thanks for your detailed comment and suggestion on this patch.

> -Original Message-
> From: Mateusz Guzik [mailto:mgu...@redhat.com]
> Sent: Tuesday, February 16, 2016 10:26 PM
> To: Zhao Lei 
> Cc: linux-kernel@vger.kernel.org; contain...@lists.linux-foundation.org
> Subject: Re: [PATCH] Make core_pattern support namespace
> 
> On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> > Currently, each container shared one copy of coredump setting
> > with the host system, if host system changed the setting, each
> > running containers will be affected.
> >
> > Moreover, it is not easy to let each container keeping their own
> > coredump setting.
> >
> > We can use some workaround as pipe program to make the second
> > requirement possible, but it is not simple, and both host and
> > container are limited to set to fixed pipe program.
> > In one word, for host running contailer, we can't change core_pattern
> > anymore.
> > To make the problem more hard, if a host running more than one
> > container product, each product will try to snatch the global
> > coredump setting to fit their own requirement.
> >
> > For container based on namespace design, it is good to allow
> > each container keeping their own coredump setting.
> >
> > It will bring us following benefit:
> > 1: Each container can change their own coredump setting
> >based on operation on /proc/sys/kernel/core_pattern
> > 2: Coredump setting changed in host will not affect
> >running containers.
> > 3: Support both case of "putting coredump in guest" and
> >"putting curedump in host".
> >
> > Each namespace-based software(lxc, docker, ..) can use this function
> > to custom their dump setting.
> >
> > And this function makes each continer working as separate system,
> > it fit for design goal of namespace.
> >
> 
> Sorry if this is a false alarm, I don't have easy means to test it, but
> is not this an immediate privilege escalation?
> 
> In particular, if the new pattern was to include a pipe, would not it
> cause the kernel to run whatever the namespace put in there, but on the
> "host"?
> 
In my mind, if user don't run vm in with privilege request,
the container manager can do following processing:
1: User set custom core_pattern in starting container, as:
   # lxc-start --core-pattern='xxx' -n vm01
   or
   # docker run --core-pattern='xxx' my_image
   or
   set CORE_PATTERN in vm's config file
2: Container manager(lxc or docker) mount /proc as rw in init period,
   and set core_pattern.
3: Container manager remount /proc as ro, and give vm to user.
User/program in vm can not change core_pattern setting, just as lxc
and docker's current default behavor.

If user run vm with privilege request, it will be a problem.
User/process in vm can change core-pattern setting, if change to a file,
it is not a problem, because it is a file in vm's filesystem.
But if changed to a pipe, it is a problem in current kernel design,
the pipe program is in host's filesystem, it means the vm can change host.

In short:
 Run vm in non-privilege | do anything | no problem
 Run vm in privilege | change core_pattern to file | no problem
 Run vm in privilege | change core_pattern to pipe | IS problem

As a solution, maybe we can modify kernel to search the pipe program
in vm's filesystem, and the pipe problem runs in vm's fs context,
so a vm can only put dump file in its private filesystem, except vm
manager link the dir into host.

Another solution is to only allow vm change core_pattern to file(avoid pipe),
but is looks strange, the vm should have same core_dump setting function
with normal system.

> The feature definitely looks useful, but this is another privileged
> operation which is now made available to unprivileged users. This poses
> some questions:
> - should the namespace be allowed to set anything it wants, including
>   pipes? I would argue the latter should be disallowed for simplicity
Yes, current patch does allow, it can be fixed in above way.

> - now that the pattern can change at any time by namespace root, it
>   becomes fishy to parse it for filename generation without any
>   protection against concurrent modification
Before this patch, kernel can works corrent, so I think kernel already
have lock to avoid changing and reading the cure_pattern buffer in
same time.
This patch only modify the buffer place, so the existence lock will
still can work, I'll confirm it in source.

> - how do you ensure that namespaces which did not explicitely set their
>   pattern still get updates from the host?
> 
> That sai

Re: [PATCH] Make core_pattern support namespace

2016-02-17 Thread Mateusz Guzik
On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
> Mateusz Guzik  writes:
> > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> >> For container based on namespace design, it is good to allow
> >> each container keeping their own coredump setting.
> >
> > Sorry if this is a false alarm, I don't have easy means to test it, but
> > is not this an immediate privilege escalation?
> 
> It is.  This is why we do not currently have a per namespace setting.
> 

Thanks for confimation.

> Solving the user mode helper problem is technically a fair amount of
> work, if not theoretically challenging.
> 

Well, I would say custom core_patterns without pipe support are still
better than none.

Say one would ensure a stable core_pattern (i.e. that it cannot be
modified as it is being parsed) and a restricted set of allowed
characters in the pattern (which would not include the pipe), validated
when one attempts to set the pattern.

Does this sound acceptable? If so, and there are no counter ideas from
Lei, I can get around to that.

-- 
Mateusz Guzik


Re: [PATCH] Make core_pattern support namespace

2016-02-17 Thread Eric W. Biederman
Mateusz Guzik  writes:

> On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
>> Currently, each container shared one copy of coredump setting
>> with the host system, if host system changed the setting, each
>> running containers will be affected.
>> 
>> Moreover, it is not easy to let each container keeping their own
>> coredump setting.
>> 
>> We can use some workaround as pipe program to make the second
>> requirement possible, but it is not simple, and both host and
>> container are limited to set to fixed pipe program.
>> In one word, for host running contailer, we can't change core_pattern
>> anymore.
>> To make the problem more hard, if a host running more than one
>> container product, each product will try to snatch the global
>> coredump setting to fit their own requirement.
>> 
>> For container based on namespace design, it is good to allow
>> each container keeping their own coredump setting.
>> 
>> It will bring us following benefit:
>> 1: Each container can change their own coredump setting
>>based on operation on /proc/sys/kernel/core_pattern
>> 2: Coredump setting changed in host will not affect
>>running containers.
>> 3: Support both case of "putting coredump in guest" and
>>"putting curedump in host".
>> 
>> Each namespace-based software(lxc, docker, ..) can use this function
>> to custom their dump setting.
>> 
>> And this function makes each continer working as separate system,
>> it fit for design goal of namespace.
>> 
>
> Sorry if this is a false alarm, I don't have easy means to test it, but
> is not this an immediate privilege escalation?

It is.  This is why we do not currently have a per namespace setting.

Solving the user mode helper problem is technically a fair amount of
work, if not theoretically challenging.

Eric


Re: [PATCH] Make core_pattern support namespace

2016-02-16 Thread Mateusz Guzik
On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> Currently, each container shared one copy of coredump setting
> with the host system, if host system changed the setting, each
> running containers will be affected.
> 
> Moreover, it is not easy to let each container keeping their own
> coredump setting.
> 
> We can use some workaround as pipe program to make the second
> requirement possible, but it is not simple, and both host and
> container are limited to set to fixed pipe program.
> In one word, for host running contailer, we can't change core_pattern
> anymore.
> To make the problem more hard, if a host running more than one
> container product, each product will try to snatch the global
> coredump setting to fit their own requirement.
> 
> For container based on namespace design, it is good to allow
> each container keeping their own coredump setting.
> 
> It will bring us following benefit:
> 1: Each container can change their own coredump setting
>based on operation on /proc/sys/kernel/core_pattern
> 2: Coredump setting changed in host will not affect
>running containers.
> 3: Support both case of "putting coredump in guest" and
>"putting curedump in host".
> 
> Each namespace-based software(lxc, docker, ..) can use this function
> to custom their dump setting.
> 
> And this function makes each continer working as separate system,
> it fit for design goal of namespace.
> 

Sorry if this is a false alarm, I don't have easy means to test it, but
is not this an immediate privilege escalation?

In particular, if the new pattern was to include a pipe, would not it
cause the kernel to run whatever the namespace put in there, but on the
"host"?

The feature definitely looks useful, but this is another privileged
operation which is now made available to unprivileged users. This poses
some questions:
- should the namespace be allowed to set anything it wants, including
  pipes? I would argue the latter should be disallowed for simplicity
- now that the pattern can change at any time by namespace root, it
  becomes fishy to parse it for filename generation without any
  protection against concurrent modification
- how do you ensure that namespaces which did not explicitely set their
  pattern still get updates from the host?

That said, I suggest adding an allocated buffer to hold it or be NULL
otherwise. If the pointer is NULL, the "host" pattern is used.

For dumping purposes the code could just kmalloc, rcu_read_lock, memcpy
and be done with it. Or, if it is acceptable given the size, just have a
buffer on the stack.

Finally, the code setting it could simply xchg the pointer to avoid
locks if there is no good lock to be taken here, and then kfree_rcu the
old buffer.

Just my $0,03.

-- 
Mateusz Guzik


[PATCH] Make core_pattern support namespace

2016-02-16 Thread Zhao Lei
Currently, each container shared one copy of coredump setting
with the host system, if host system changed the setting, each
running containers will be affected.

Moreover, it is not easy to let each container keeping their own
coredump setting.

We can use some workaround as pipe program to make the second
requirement possible, but it is not simple, and both host and
container are limited to set to fixed pipe program.
In one word, for host running contailer, we can't change core_pattern
anymore.
To make the problem more hard, if a host running more than one
container product, each product will try to snatch the global
coredump setting to fit their own requirement.

For container based on namespace design, it is good to allow
each container keeping their own coredump setting.

It will bring us following benefit:
1: Each container can change their own coredump setting
   based on operation on /proc/sys/kernel/core_pattern
2: Coredump setting changed in host will not affect
   running containers.
3: Support both case of "putting coredump in guest" and
   "putting curedump in host".

Each namespace-based software(lxc, docker, ..) can use this function
to custom their dump setting.

And this function makes each continer working as separate system,
it fit for design goal of namespace.

Test(in lxc):
 # In the host
 # 
 # echo host_core >/proc/sys/kernel/core_pattern
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ulimit -c 1024000
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw--- 1 root root 331776 Feb  4 18:02 host_core.2175
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #

 # In the container
 # 
 # cat /proc/sys/kernel/core_pattern
 host_core
 # echo container_core >/proc/sys/kernel/core_pattern
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rwxr-xr-x1 root root   759731 Feb  4 10:45 make_dump
 -rw---1 root root   331776 Feb  4 10:45 container_core.16
 #

 # Return to host
 # 
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ls
 host_core.2175  make_dump  make_dump.c
 # rm -f host_core.2175
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw--- 1 root root 331776 Feb  4 18:49 host_core.2351
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #

Signed-off-by: Zhao Lei 
---
 fs/coredump.c |  3 +--
 include/linux/pid_namespace.h |  2 ++
 kernel/pid.c  |  1 +
 kernel/pid_namespace.c|  3 +++
 kernel/sysctl.c   | 22 --
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 9ea87e9..8a7ef9b 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -46,7 +46,6 @@
 
 int core_uses_pid;
 unsigned int core_pipe_limit;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 
 struct core_name {
@@ -183,7 +182,7 @@ put_exe_file:
 static int format_corename(struct core_name *cn, struct coredump_params *cprm)
 {
const struct cred *cred = current_cred();
-   const char *pat_ptr = core_pattern;
+   const char *pat_ptr = 
current->nsproxy->pid_ns_for_children->core_pattern;
int ispipe = (*pat_ptr == '|');
int pid_in_pattern = 0;
int err = 0;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 918b117..a5af1e9 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pidmap {
atomic_t nr_free;
@@ -45,6 +46,7 @@ struct pid_namespace {
int hide_pid;
int reboot; /* group exit code if this pidns was rebooted */
struct ns_common ns;
+   char core_pattern[CORENAME_MAX_SIZE];
 };
 
 extern struct pid_namespace init_pid_ns;
diff --git a/kernel/pid.c b/kernel/pid.c
index 4d73a83..c79c1d5 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -83,6 +83,7 @@ struct pid_namespace init_pid_ns = {
 #ifdef CONFIG_PID_NS
.ns.ops = &pidns_operations,
 #endif
+   .core_pattern = "core",
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a65ba13..16d6d21 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -123,6 +123,9 @@ static struct pid_namespace *create_pid_namespace(struct 
user_namespace *user_ns
for (i = 1; i < PIDMAP_ENTRIES; i++)
atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
 
+   strncpy(ns->core_pattern, parent_pid_ns->core_pattern,
+   sizeof(ns->core_pattern));
+
return ns;
 
 out_free_map:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 97715fd..70f8af5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -100,7 +100,6 @@
 extern int suid_dumpable;
 #ifdef CONFIG_COREDUMP
 extern int core_uses_pid;
-extern char core_pattern[];
 extern unsigned int core_pipe_limit;
 #endif
 extern int pid_max;
@@ -469,7 +468,7 @@ stat