Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread Eric W. Biederman
"W. Trevor King"  writes:

> On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote:
>> In theory, we could get nsfs to show this information as an option
>> (just add a show_options entry to the superblock ops), but the
>> problem is that although each namespace has a parent user_ns,
>> there's no way to get it without digging in the namespace specific
>> structure.  Probably we should restructure to move it into
>> ns_common, then we could display it (and enforce all namespaces
>> having owning user_ns) but it would be a reasonably large (but
>> mechanical) change.
>
> It sounds like everyone is either positive or or neutral on this
> groundwork, even if we haven't decided if/how to expose the
> information to userspace.  I'm happy to work up a patch while the rest
> of the discussion continues.  I'm also happy to let someone else work
> up the patch, if anyone else is chomping at the bit ;).

I am dubious on moving all of the user namespace members into ns_common.

I would happy to be proved wrong but I suspect in the cases where we
actually use that user namespace the code will become uglier.  Making
the ordinary uses uglier to make a rare corner case nicer is the wrong
trade off.

But feel free to try it is certainly worth doing if it doesn't make the
code that uses the user namespaces uglier.

Eric



Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread Eric W. Biederman
"W. Trevor King"  writes:

> On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote:
>> In theory, we could get nsfs to show this information as an option
>> (just add a show_options entry to the superblock ops), but the
>> problem is that although each namespace has a parent user_ns,
>> there's no way to get it without digging in the namespace specific
>> structure.  Probably we should restructure to move it into
>> ns_common, then we could display it (and enforce all namespaces
>> having owning user_ns) but it would be a reasonably large (but
>> mechanical) change.
>
> It sounds like everyone is either positive or or neutral on this
> groundwork, even if we haven't decided if/how to expose the
> information to userspace.  I'm happy to work up a patch while the rest
> of the discussion continues.  I'm also happy to let someone else work
> up the patch, if anyone else is chomping at the bit ;).

I am dubious on moving all of the user namespace members into ns_common.

I would happy to be proved wrong but I suspect in the cases where we
actually use that user namespace the code will become uglier.  Making
the ordinary uses uglier to make a rare corner case nicer is the wrong
trade off.

But feel free to try it is certainly worth doing if it doesn't make the
code that uses the user namespaces uglier.

Eric



Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread W. Trevor King
On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote:
> In theory, we could get nsfs to show this information as an option
> (just add a show_options entry to the superblock ops), but the
> problem is that although each namespace has a parent user_ns,
> there's no way to get it without digging in the namespace specific
> structure.  Probably we should restructure to move it into
> ns_common, then we could display it (and enforce all namespaces
> having owning user_ns) but it would be a reasonably large (but
> mechanical) change.

It sounds like everyone is either positive or or neutral on this
groundwork, even if we haven't decided if/how to expose the
information to userspace.  I'm happy to work up a patch while the rest
of the discussion continues.  I'm also happy to let someone else work
up the patch, if anyone else is chomping at the bit ;).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread W. Trevor King
On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote:
> In theory, we could get nsfs to show this information as an option
> (just add a show_options entry to the superblock ops), but the
> problem is that although each namespace has a parent user_ns,
> there's no way to get it without digging in the namespace specific
> structure.  Probably we should restructure to move it into
> ns_common, then we could display it (and enforce all namespaces
> having owning user_ns) but it would be a reasonably large (but
> mechanical) change.

It sounds like everyone is either positive or or neutral on this
groundwork, even if we haven't decided if/how to expose the
information to userspace.  I'm happy to work up a patch while the rest
of the discussion continues.  I'm also happy to let someone else work
up the patch, if anyone else is chomping at the bit ;).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread Michael Kerrisk (man-pages)

On 07/07/2016 09:17 PM, James Bottomley wrote:

On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote:

On 7 July 2016 at 17:01, James Bottomley
 wrote:

[Serge already answered the parenting issue]

On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:

Hm.  Probably best-effort based on the process hierarchy.  So
yeah you could probably get a tree into a state that would be
wrongly recreated. Create a new netns, bind mount it, exit;  Have
another task create a new user_ns, bind mount it, exit;  Third
task setns()s first to the new netns then to the new user_ns.  I
suspect criu will recreate that wrongly.


This is a bit pathological, and you have to be root to do it: so
root can set up a nesting hierarchy, bind it and destroy the pids
but I know of no current orchestration system which does this.

Actually, I have to back pedal a bit: the way I currently set up
architecture emulation containers does precisely this: I set up the
namespaces unprivileged with child mount namespaces, but then I ask
root to bind the userns and kill the process that created it so I
have a permanent handle to enter the namespace by, so I suspect
that when our current orchestration systems get more sophisticated,
they might eventually want to do something like this as well.

In theory, we could get nsfs to show this information as an option
(just add a show_options entry to the superblock ops), but the
problem is that although each namespace has a parent user_ns,
there's no way to get it without digging in the namespace specific
structure.  Probably we should restructure to move it into
ns_common, then we could display it (and enforce all namespaces
having owning user_ns) but it would be a


I'm missing something here. Is it not already the case that all
namespaces have an owning user_ns?


Um, yes, I don't believe I said they don't.  The problem I thought you
were having is that there's no way of seeing what it is.


Your words "and enforce all namespaces having owning user_ns" were
what left me puzzled--it sounded to me that the implication was
that this is not "enforced" right now.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread Michael Kerrisk (man-pages)

On 07/07/2016 09:17 PM, James Bottomley wrote:

On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote:

On 7 July 2016 at 17:01, James Bottomley
 wrote:

[Serge already answered the parenting issue]

On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:

Hm.  Probably best-effort based on the process hierarchy.  So
yeah you could probably get a tree into a state that would be
wrongly recreated. Create a new netns, bind mount it, exit;  Have
another task create a new user_ns, bind mount it, exit;  Third
task setns()s first to the new netns then to the new user_ns.  I
suspect criu will recreate that wrongly.


This is a bit pathological, and you have to be root to do it: so
root can set up a nesting hierarchy, bind it and destroy the pids
but I know of no current orchestration system which does this.

Actually, I have to back pedal a bit: the way I currently set up
architecture emulation containers does precisely this: I set up the
namespaces unprivileged with child mount namespaces, but then I ask
root to bind the userns and kill the process that created it so I
have a permanent handle to enter the namespace by, so I suspect
that when our current orchestration systems get more sophisticated,
they might eventually want to do something like this as well.

In theory, we could get nsfs to show this information as an option
(just add a show_options entry to the superblock ops), but the
problem is that although each namespace has a parent user_ns,
there's no way to get it without digging in the namespace specific
structure.  Probably we should restructure to move it into
ns_common, then we could display it (and enforce all namespaces
having owning user_ns) but it would be a


I'm missing something here. Is it not already the case that all
namespaces have an owning user_ns?


Um, yes, I don't believe I said they don't.  The problem I thought you
were having is that there's no way of seeing what it is.


Your words "and enforce all namespaces having owning user_ns" were
what left me puzzled--it sounded to me that the implication was
that this is not "enforced" right now.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread W. Trevor King
On Thu, Jul 07, 2016 at 11:54:54PM -0700, Andrew Vagin wrote:
> On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote:
> > On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote:
> > > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
> > > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
> > > > > I think we can show all required information in fdinfo. We open
> > > > > a namespaces file (/proc/pid/ns/N) and then read
> > > > > /proc/pid/fdinfo/X for it.
> > > > 
> > > > Here is a proof-of-concept patch.
> > > > …
> > > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY)
> > > > 
> > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
> > > > pos:0
> > > > flags:  010
> > > > mnt_id: 2
> > > > userns: 4026531837
> > > > 
> > > > In [4]: print "/proc/self/ns/user -> %s" %
> > > > os.readlink("/proc/self/ns/user")
> > > > /proc/self/ns/user -> user:[4026531837]
> > > 
> > > can't you just do
> > > 
> > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'
> > …
> > If you only put one level in fdinfo, you're stuck if one of the
> > namespaces involved has neither bind mounts nor a PID to give you
> > handle on it [1].  And if you want to put that whole ancestor tree in
> > fdinfo, you have to come up with some way to handle the two-parent
> > branching.
> 
> I think it's a bad idea to draw a tree in fdinfo. Why do we want to know
> this hierarchy? Probably we will want to access these namespaces (setns),
> in this case we need to have a way to open them.
> 
> Maybe we need to extend functionality of the nsfs filesystem
> (somethink like /proc/PID for namespaces)?

A similar idea came up during the PID-translation brainstorming [1],
but I'm not sure if anything ever came of that.  Once you're dealing
with a separate pseudo-filesystem, it seems easier to decouple it from
proc and just make a mountable namespace-hierarchy filesystem (like we
have mountable cgroup hierarchy filesystems).  That also gets you an
opt-in playground while the details of the nsfs filesystem view are
worked out.  Are you imagining something like:

  $ tree .
  .
  ├── mnt{inum}
  │   └── user -> ../user{inum}
  ├── pid{inum}
  │   ├── pid{inum}
  │   │   └── user -> ../../user{inum}/user{inum}
  │   └── user -> ../user{inum}
  └── user{inum}
  └── user{inum}

Cheers,
Trevor

[1]: http://thread.gmane.org/gmane.linux.kernel.containers/28105/focus=28164
 Subject: RE: [RFC]Pid conversion between pid namespace
 Date: Fri, 25 Jul 2014 10:01:45 +
 Message-ID: 
<5871495633F38949900D2BF2DC04883E56C7A2@G08CNEXMBPEKD02.g08.fujitsu.local>

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread W. Trevor King
On Thu, Jul 07, 2016 at 11:54:54PM -0700, Andrew Vagin wrote:
> On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote:
> > On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote:
> > > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
> > > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
> > > > > I think we can show all required information in fdinfo. We open
> > > > > a namespaces file (/proc/pid/ns/N) and then read
> > > > > /proc/pid/fdinfo/X for it.
> > > > 
> > > > Here is a proof-of-concept patch.
> > > > …
> > > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY)
> > > > 
> > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
> > > > pos:0
> > > > flags:  010
> > > > mnt_id: 2
> > > > userns: 4026531837
> > > > 
> > > > In [4]: print "/proc/self/ns/user -> %s" %
> > > > os.readlink("/proc/self/ns/user")
> > > > /proc/self/ns/user -> user:[4026531837]
> > > 
> > > can't you just do
> > > 
> > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'
> > …
> > If you only put one level in fdinfo, you're stuck if one of the
> > namespaces involved has neither bind mounts nor a PID to give you
> > handle on it [1].  And if you want to put that whole ancestor tree in
> > fdinfo, you have to come up with some way to handle the two-parent
> > branching.
> 
> I think it's a bad idea to draw a tree in fdinfo. Why do we want to know
> this hierarchy? Probably we will want to access these namespaces (setns),
> in this case we need to have a way to open them.
> 
> Maybe we need to extend functionality of the nsfs filesystem
> (somethink like /proc/PID for namespaces)?

A similar idea came up during the PID-translation brainstorming [1],
but I'm not sure if anything ever came of that.  Once you're dealing
with a separate pseudo-filesystem, it seems easier to decouple it from
proc and just make a mountable namespace-hierarchy filesystem (like we
have mountable cgroup hierarchy filesystems).  That also gets you an
opt-in playground while the details of the nsfs filesystem view are
worked out.  Are you imagining something like:

  $ tree .
  .
  ├── mnt{inum}
  │   └── user -> ../user{inum}
  ├── pid{inum}
  │   ├── pid{inum}
  │   │   └── user -> ../../user{inum}/user{inum}
  │   └── user -> ../user{inum}
  └── user{inum}
  └── user{inum}

Cheers,
Trevor

[1]: http://thread.gmane.org/gmane.linux.kernel.containers/28105/focus=28164
 Subject: RE: [RFC]Pid conversion between pid namespace
 Date: Fri, 25 Jul 2014 10:01:45 +
 Message-ID: 
<5871495633F38949900D2BF2DC04883E56C7A2@G08CNEXMBPEKD02.g08.fujitsu.local>

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread Andrew Vagin
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote:
> On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote:
> > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
> > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
> > > > I think we can show all required information in fdinfo. We open
> > > > a namespaces file (/proc/pid/ns/N) and then read
> > > > /proc/pid/fdinfo/X for it.
> > > 
> > > Here is a proof-of-concept patch.
> > > …
> > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY)
> > > 
> > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
> > > pos:  0
> > > flags:010
> > > mnt_id:   2
> > > userns: 4026531837
> > > 
> > > In [4]: print "/proc/self/ns/user -> %s" %
> > > os.readlink("/proc/self/ns/user")
> > > /proc/self/ns/user -> user:[4026531837]
> > 
> > can't you just do
> > 
> > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'
> 
> With Andrew's fdinfo approach you know the user namespace owning
> /proc/self/ns/pid is 4026531837.  That happens to be
> /proc/self/ns/user in this case, but doesn't have to be in general.
> 
> > But what Michael was asking about was the parent user_ns of all the
> > other namespaces ... I don't think there's any way we can get that
> > out of any information in /proc/self/
> 
> If fdinfo only shows immediate parents, you'd need to walk the tree to
> get back to the root.  And at each layer of the PID namespace tree
> there will be another user-namespace parent branching off).  With a
> tree like:
> 
>   Namespace | Parent   | Owning userns
>  ---+--+---
>   Root userns   | -| -
>   Root PID ns   | -| Root userns
>   Child userns  | Root usens   | Root userns
>   Child PID ns  | Root PID ns  | Root userns
>   Grandchild userns | Child userns | Child userns
>   Grandchild PID ns | Child PID ns | Grandchild userns
> 
> Walking from the granchild PID namespace would give you:
> 
>   Grandchild PID ns
>   |-- Child PID ns
>   |   |-- Root PID ns
>   |   `-- Root userns 
>   `-- Granchild userns
>   `-- Child userns
>   `-- Root userns
> 
> If you only put one level in fdinfo, you're stuck if one of the
> namespaces involved has neither bind mounts nor a PID to give you
> handle on it [1].  And if you want to put that whole ancestor tree in
> fdinfo, you have to come up with some way to handle the two-parent
> branching.

I think it's a bad idea to draw a tree in fdinfo. Why do we want to know
this hierarchy? Probably we will want to access these namespaces (setns),
in this case we need to have a way to open them.

Maybe we need to extend functionality of the nsfs filesystem
(somethink like /proc/PID for namespaces)?

> 
> I'm also not sure how exposing nsfs information [2] would handle
> namespaces that had neither a surviving bind mount nor a direct
> process.
> 
> If all the information is available (possible after a mechanical patch
> [3] makes it more accessible), then it seems easier to put it in a
> separate /proc or /sys file.  There was a stab at this for PID
> namespaces in [4] (the same series that landed NStgid, etc.) with
> additional background and alternative approaches in [5].  There were
> problems with that patch (and it was trying to do more by also listing
> a process's ID in each PID namespace), but the “let's put the whole
> tree in a new file” approach seems sound to me.
> 
> Cheers,
> Trevor
> 
> [1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536
>  Subject: Re: Introspecting userns relationships to other namespaces?
>  Date: Thu, 7 Jul 2016 13:24:42 -0500
>  Message-ID: <20160707182442.ga6...@mail.hallyn.com>
> [2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499
>  Subject: Re: [CRIU] Introspecting userns relationships to other 
> namespaces?
>  Date: Thu, 07 Jul 2016 20:20:05 -0700
>  Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com>
> [3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537
>  Subject: Re: Introspecting userns relationships to other namespaces?
>  Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com>
>  Date: Thu, 07 Jul 2016 08:01:52 -0700
> [4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928
>  Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
>  Date: Tue, 23 Dec 2014 18:20:37 +0800
&

Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread Andrew Vagin
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote:
> On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote:
> > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
> > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
> > > > I think we can show all required information in fdinfo. We open
> > > > a namespaces file (/proc/pid/ns/N) and then read
> > > > /proc/pid/fdinfo/X for it.
> > > 
> > > Here is a proof-of-concept patch.
> > > …
> > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY)
> > > 
> > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
> > > pos:  0
> > > flags:010
> > > mnt_id:   2
> > > userns: 4026531837
> > > 
> > > In [4]: print "/proc/self/ns/user -> %s" %
> > > os.readlink("/proc/self/ns/user")
> > > /proc/self/ns/user -> user:[4026531837]
> > 
> > can't you just do
> > 
> > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'
> 
> With Andrew's fdinfo approach you know the user namespace owning
> /proc/self/ns/pid is 4026531837.  That happens to be
> /proc/self/ns/user in this case, but doesn't have to be in general.
> 
> > But what Michael was asking about was the parent user_ns of all the
> > other namespaces ... I don't think there's any way we can get that
> > out of any information in /proc/self/
> 
> If fdinfo only shows immediate parents, you'd need to walk the tree to
> get back to the root.  And at each layer of the PID namespace tree
> there will be another user-namespace parent branching off).  With a
> tree like:
> 
>   Namespace | Parent   | Owning userns
>  ---+--+---
>   Root userns   | -| -
>   Root PID ns   | -| Root userns
>   Child userns  | Root usens   | Root userns
>   Child PID ns  | Root PID ns  | Root userns
>   Grandchild userns | Child userns | Child userns
>   Grandchild PID ns | Child PID ns | Grandchild userns
> 
> Walking from the granchild PID namespace would give you:
> 
>   Grandchild PID ns
>   |-- Child PID ns
>   |   |-- Root PID ns
>   |   `-- Root userns 
>   `-- Granchild userns
>   `-- Child userns
>   `-- Root userns
> 
> If you only put one level in fdinfo, you're stuck if one of the
> namespaces involved has neither bind mounts nor a PID to give you
> handle on it [1].  And if you want to put that whole ancestor tree in
> fdinfo, you have to come up with some way to handle the two-parent
> branching.

I think it's a bad idea to draw a tree in fdinfo. Why do we want to know
this hierarchy? Probably we will want to access these namespaces (setns),
in this case we need to have a way to open them.

Maybe we need to extend functionality of the nsfs filesystem
(somethink like /proc/PID for namespaces)?

> 
> I'm also not sure how exposing nsfs information [2] would handle
> namespaces that had neither a surviving bind mount nor a direct
> process.
> 
> If all the information is available (possible after a mechanical patch
> [3] makes it more accessible), then it seems easier to put it in a
> separate /proc or /sys file.  There was a stab at this for PID
> namespaces in [4] (the same series that landed NStgid, etc.) with
> additional background and alternative approaches in [5].  There were
> problems with that patch (and it was trying to do more by also listing
> a process's ID in each PID namespace), but the “let's put the whole
> tree in a new file” approach seems sound to me.
> 
> Cheers,
> Trevor
> 
> [1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536
>  Subject: Re: Introspecting userns relationships to other namespaces?
>  Date: Thu, 7 Jul 2016 13:24:42 -0500
>  Message-ID: <20160707182442.ga6...@mail.hallyn.com>
> [2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499
>  Subject: Re: [CRIU] Introspecting userns relationships to other 
> namespaces?
>  Date: Thu, 07 Jul 2016 20:20:05 -0700
>  Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com>
> [3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537
>  Subject: Re: Introspecting userns relationships to other namespaces?
>  Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com>
>  Date: Thu, 07 Jul 2016 08:01:52 -0700
> [4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928
>  Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
>  Date: Tue, 23 Dec 2014 18:20:37 +0800
&

Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread W. Trevor King
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote:
> And if you want to put that whole ancestor tree in fdinfo, you have
> to come up with some way to handle the two-parent branching.

Going towards the roots is nice, because you know a given namespace
will only have two parents, but it leaks information about the system
into the container.  It's probably better to follow the NStgid,
etc. example and only walk toward the leaves.  So a (privileged?)
process in the root namespace could see the whole tree, while a
process in non-root namespaces could only see their namespaces and
descendants.  In situations where you were part of a namespace that
belonged to an external user namespace (e.g. you nsenter a child user
namespace but are still in the root PID namespace), you'd want an
“unknown” entry for the parent you couldn't see.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-08 Thread W. Trevor King
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote:
> And if you want to put that whole ancestor tree in fdinfo, you have
> to come up with some way to handle the two-parent branching.

Going towards the roots is nice, because you know a given namespace
will only have two parents, but it leaks information about the system
into the container.  It's probably better to follow the NStgid,
etc. example and only walk toward the leaves.  So a (privileged?)
process in the root namespace could see the whole tree, while a
process in non-root namespaces could only see their namespaces and
descendants.  In situations where you were part of a namespace that
belonged to an external user namespace (e.g. you nsenter a child user
namespace but are still in the root PID namespace), you'd want an
“unknown” entry for the parent you couldn't see.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread W. Trevor King
On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote:
> On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
> > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
> > > I think we can show all required information in fdinfo. We open
> > > a namespaces file (/proc/pid/ns/N) and then read
> > > /proc/pid/fdinfo/X for it.
> > 
> > Here is a proof-of-concept patch.
> > …
> > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY)
> > 
> > In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
> > pos:0
> > flags:  010
> > mnt_id: 2
> > userns: 4026531837
> > 
> > In [4]: print "/proc/self/ns/user -> %s" %
> > os.readlink("/proc/self/ns/user")
> > /proc/self/ns/user -> user:[4026531837]
> 
> can't you just do
> 
> readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'

With Andrew's fdinfo approach you know the user namespace owning
/proc/self/ns/pid is 4026531837.  That happens to be
/proc/self/ns/user in this case, but doesn't have to be in general.

> But what Michael was asking about was the parent user_ns of all the
> other namespaces ... I don't think there's any way we can get that
> out of any information in /proc/self/

If fdinfo only shows immediate parents, you'd need to walk the tree to
get back to the root.  And at each layer of the PID namespace tree
there will be another user-namespace parent branching off).  With a
tree like:

  Namespace | Parent   | Owning userns
 ---+--+---
  Root userns   | -| -
  Root PID ns   | -| Root userns
  Child userns  | Root usens   | Root userns
  Child PID ns  | Root PID ns  | Root userns
  Grandchild userns | Child userns | Child userns
  Grandchild PID ns | Child PID ns | Grandchild userns

Walking from the granchild PID namespace would give you:

  Grandchild PID ns
  |-- Child PID ns
  |   |-- Root PID ns
  |   `-- Root userns 
  `-- Granchild userns
  `-- Child userns
  `-- Root userns

If you only put one level in fdinfo, you're stuck if one of the
namespaces involved has neither bind mounts nor a PID to give you
handle on it [1].  And if you want to put that whole ancestor tree in
fdinfo, you have to come up with some way to handle the two-parent
branching.

I'm also not sure how exposing nsfs information [2] would handle
namespaces that had neither a surviving bind mount nor a direct
process.

If all the information is available (possible after a mechanical patch
[3] makes it more accessible), then it seems easier to put it in a
separate /proc or /sys file.  There was a stab at this for PID
namespaces in [4] (the same series that landed NStgid, etc.) with
additional background and alternative approaches in [5].  There were
problems with that patch (and it was trying to do more by also listing
a process's ID in each PID namespace), but the “let's put the whole
tree in a new file” approach seems sound to me.

Cheers,
Trevor

[1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536
 Subject: Re: Introspecting userns relationships to other namespaces?
     Date: Thu, 7 Jul 2016 13:24:42 -0500
 Message-ID: <20160707182442.ga6...@mail.hallyn.com>
[2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499
 Subject: Re: [CRIU] Introspecting userns relationships to other namespaces?
 Date: Thu, 07 Jul 2016 20:20:05 -0700
     Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com>
[3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537
 Subject: Re: Introspecting userns relationships to other namespaces?
 Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com>
 Date: Thu, 07 Jul 2016 08:01:52 -0700
[4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928
 Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
 Date: Tue, 23 Dec 2014 18:20:37 +0800
 Message-ID: <1419330039-29207-2-git-send-email-chenhanx...@cn.fujitsu.com>
[5]: http://thread.gmane.org/gmane.linux.kernel.containers/28105
 Subject: [RFC]Pid conversion between pid namespace
 Date: Thu, 3 Jul 2014 12:18:33 +
 Message-ID: 
<5871495633F38949900D2BF2DC04883E55C374@G08CNEXMBPEKD02.g08.fujitsu.local>

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread W. Trevor King
On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote:
> On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
> > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
> > > I think we can show all required information in fdinfo. We open
> > > a namespaces file (/proc/pid/ns/N) and then read
> > > /proc/pid/fdinfo/X for it.
> > 
> > Here is a proof-of-concept patch.
> > …
> > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY)
> > 
> > In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
> > pos:0
> > flags:  010
> > mnt_id: 2
> > userns: 4026531837
> > 
> > In [4]: print "/proc/self/ns/user -> %s" %
> > os.readlink("/proc/self/ns/user")
> > /proc/self/ns/user -> user:[4026531837]
> 
> can't you just do
> 
> readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'

With Andrew's fdinfo approach you know the user namespace owning
/proc/self/ns/pid is 4026531837.  That happens to be
/proc/self/ns/user in this case, but doesn't have to be in general.

> But what Michael was asking about was the parent user_ns of all the
> other namespaces ... I don't think there's any way we can get that
> out of any information in /proc/self/

If fdinfo only shows immediate parents, you'd need to walk the tree to
get back to the root.  And at each layer of the PID namespace tree
there will be another user-namespace parent branching off).  With a
tree like:

  Namespace | Parent   | Owning userns
 ---+--+---
  Root userns   | -| -
  Root PID ns   | -| Root userns
  Child userns  | Root usens   | Root userns
  Child PID ns  | Root PID ns  | Root userns
  Grandchild userns | Child userns | Child userns
  Grandchild PID ns | Child PID ns | Grandchild userns

Walking from the granchild PID namespace would give you:

  Grandchild PID ns
  |-- Child PID ns
  |   |-- Root PID ns
  |   `-- Root userns 
  `-- Granchild userns
  `-- Child userns
  `-- Root userns

If you only put one level in fdinfo, you're stuck if one of the
namespaces involved has neither bind mounts nor a PID to give you
handle on it [1].  And if you want to put that whole ancestor tree in
fdinfo, you have to come up with some way to handle the two-parent
branching.

I'm also not sure how exposing nsfs information [2] would handle
namespaces that had neither a surviving bind mount nor a direct
process.

If all the information is available (possible after a mechanical patch
[3] makes it more accessible), then it seems easier to put it in a
separate /proc or /sys file.  There was a stab at this for PID
namespaces in [4] (the same series that landed NStgid, etc.) with
additional background and alternative approaches in [5].  There were
problems with that patch (and it was trying to do more by also listing
a process's ID in each PID namespace), but the “let's put the whole
tree in a new file” approach seems sound to me.

Cheers,
Trevor

[1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536
 Subject: Re: Introspecting userns relationships to other namespaces?
     Date: Thu, 7 Jul 2016 13:24:42 -0500
 Message-ID: <20160707182442.ga6...@mail.hallyn.com>
[2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499
 Subject: Re: [CRIU] Introspecting userns relationships to other namespaces?
 Date: Thu, 07 Jul 2016 20:20:05 -0700
     Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com>
[3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537
 Subject: Re: Introspecting userns relationships to other namespaces?
 Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com>
 Date: Thu, 07 Jul 2016 08:01:52 -0700
[4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928
 Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
 Date: Tue, 23 Dec 2014 18:20:37 +0800
 Message-ID: <1419330039-29207-2-git-send-email-chenhanx...@cn.fujitsu.com>
[5]: http://thread.gmane.org/gmane.linux.kernel.containers/28105
 Subject: [RFC]Pid conversion between pid namespace
 Date: Thu, 3 Jul 2014 12:18:33 +
 Message-ID: 
<5871495633F38949900D2BF2DC04883E55C374@G08CNEXMBPEKD02.g08.fujitsu.local>

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread James Bottomley
On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote:
> On 7 July 2016 at 17:01, James Bottomley
>  wrote:
[Serge already answered the parenting issue]
> > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
> > > Hm.  Probably best-effort based on the process hierarchy.  So 
> > > yeah you could probably get a tree into a state that would be 
> > > wrongly recreated. Create a new netns, bind mount it, exit;  Have 
> > > another task create a new user_ns, bind mount it, exit;  Third 
> > > task setns()s first to the new netns then to the new user_ns.  I 
> > > suspect criu will recreate that wrongly.
> > 
> > This is a bit pathological, and you have to be root to do it: so 
> > root can set up a nesting hierarchy, bind it and destroy the pids 
> > but I know of no current orchestration system which does this.
> > 
> > Actually, I have to back pedal a bit: the way I currently set up
> > architecture emulation containers does precisely this: I set up the
> > namespaces unprivileged with child mount namespaces, but then I ask
> > root to bind the userns and kill the process that created it so I 
> > have a permanent handle to enter the namespace by, so I suspect 
> > that when our current orchestration systems get more sophisticated, 
> > they might eventually want to do something like this as well.
> > 
> > In theory, we could get nsfs to show this information as an option
> > (just add a show_options entry to the superblock ops), but the 
> > problem is that although each namespace has a parent user_ns, 
> > there's no way to get it without digging in the namespace specific 
> > structure.  Probably we should restructure to move it into 
> > ns_common, then we could display it (and enforce all namespaces 
> > having owning user_ns) but it would be a
> 
> I'm missing something here. Is it not already the case that all
> namespaces have an owning user_ns?

Um, yes, I don't believe I said they don't.  The problem I thought you
were having is that there's no way of seeing what it is.

nsfs is the Namespace fileystem where bound namespaces appear to a cat
of /proc/self/mounts.  It can display any information that's in
ns_common (the common core of namespaces) but the owning user_ns
pointer currently isn't in this structure.  Every user namespace has a
pointer to it, but they're all privately embedded in the individual
namespace specific structures.  What I was proposing was that since
every current namespace has a pointer somewhere to the owning user
namespace, we could abstract this out into ns_common so it's now
accessible to be displayed by nsfs, probably as a mount option.

James




Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread James Bottomley
On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote:
> On 7 July 2016 at 17:01, James Bottomley
>  wrote:
[Serge already answered the parenting issue]
> > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
> > > Hm.  Probably best-effort based on the process hierarchy.  So 
> > > yeah you could probably get a tree into a state that would be 
> > > wrongly recreated. Create a new netns, bind mount it, exit;  Have 
> > > another task create a new user_ns, bind mount it, exit;  Third 
> > > task setns()s first to the new netns then to the new user_ns.  I 
> > > suspect criu will recreate that wrongly.
> > 
> > This is a bit pathological, and you have to be root to do it: so 
> > root can set up a nesting hierarchy, bind it and destroy the pids 
> > but I know of no current orchestration system which does this.
> > 
> > Actually, I have to back pedal a bit: the way I currently set up
> > architecture emulation containers does precisely this: I set up the
> > namespaces unprivileged with child mount namespaces, but then I ask
> > root to bind the userns and kill the process that created it so I 
> > have a permanent handle to enter the namespace by, so I suspect 
> > that when our current orchestration systems get more sophisticated, 
> > they might eventually want to do something like this as well.
> > 
> > In theory, we could get nsfs to show this information as an option
> > (just add a show_options entry to the superblock ops), but the 
> > problem is that although each namespace has a parent user_ns, 
> > there's no way to get it without digging in the namespace specific 
> > structure.  Probably we should restructure to move it into 
> > ns_common, then we could display it (and enforce all namespaces 
> > having owning user_ns) but it would be a
> 
> I'm missing something here. Is it not already the case that all
> namespaces have an owning user_ns?

Um, yes, I don't believe I said they don't.  The problem I thought you
were having is that there's no way of seeing what it is.

nsfs is the Namespace fileystem where bound namespaces appear to a cat
of /proc/self/mounts.  It can display any information that's in
ns_common (the common core of namespaces) but the owning user_ns
pointer currently isn't in this structure.  Every user namespace has a
pointer to it, but they're all privately embedded in the individual
namespace specific structures.  What I was proposing was that since
every current namespace has a pointer somewhere to the owning user
namespace, we could abstract this out into ns_common so it's now
accessible to be displayed by nsfs, probably as a mount option.

James




Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Serge E. Hallyn
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> On 7 July 2016 at 17:01, James Bottomley
>  wrote:
> > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
> >> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> >> > Hi Serge,
> >> >
> >> > On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> >> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man
> >> > > -pages) wrote:
> >> > > > [Rats! Doing now what I should have down to start with. Looping
> >> > > > some lists and CRIU and other possibly relevant people into
> >> > > > this conversation]
> >> > > >
> >> > > > Hi Eric,
> >> > > >
> >> > > > On 5 July 2016 at 23:47, Eric W. Biederman <
> >> > > > ebied...@xmission.com> wrote:
> >> > > > > "Michael Kerrisk (man-pages)" 
> >> > > > > writes:
> >> > > > >
> >> > > > > > Hi Eric,
> >> > > > > >
> >> > > > > > I have a question. Is there any way currently to discover
> >> > > > > > which user namespace a particular nonuser namespace is
> >> > > > > > governed by? Maybe I am missing something, but there does
> >> > > > > > not seem to be a way to do this. Also, can one discover
> >> > > > > > which userns is the parent of a given userns? Again, I
> >> > > > > > can't see a way to do this.
> >> > > > > >
> >> > > > > > The point here is introspecting so that a process might
> >> > > > > > determine what its capabilities are when operating on some
> >> > > > > > resource governed by a (nonuser) namespace.
> >> > > > >
> >> > > > > To the best of my knowledge that there is not an interface to
> >> > > > > get that information.  It would be good to have such an
> >> > > > > interface for no other reason than the CRIU folks are going
> >> > > > > to need it at some point.  I am a bit surprised they have not
> >> > > > > complained yet.
> >> > >
> >> > > I don't think they need it.  They do in fact have what they need.
> >> > >   Assume you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in
> >> > > init_user_ns;  T1 spawned T1_1 in a new userns;  T2 spawned T2_1
> >> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping,
> >> > > does not matter.
> >> > >
> >> > > At restart, it doesn't matter which task originally created the
> >> > > new userns. criu knows T1_1 and T2_1 are in the same userns;  it
> >> > > creates the userns, sets up the mapping, and T1_1 and T2_1
> >> > > setns() to it.
> >> >
> >> > I'm missing something here. How does the parental relationships
> >> > between the user namespaces get reconstructed? Those relationships
> >> > will govern what capabilities a process will have in various user
> >> > namespaces.
> >
> > Actually, you get the parent namespace from the process tree by
> > tracking the user namespaces of the parent pids.   Currently non-root
> > users can't bind the namespace, so the only way to keep a new user_ns
> > around if you're not root is to keep the process around, so for
> > multiply nested user namespaces you can usually build the user_ns
> > hierarchy by looking at the process hierarchy.  Conversely, if the
> > process is reparented to init, chances are that the user_ns is also
> > parented to init_user_ns.
> 
> Yes, but "chances are" == this isn't robust.  PR_SET_CHILD_SUBREAPER
> further complicates things.
> 
> By the way, is that really what happens? Do child user namespaces get
> reparented to the grandparent ns if the parent ns disappears (i.e.,

The parent ns cannot disappear.  The child ns pins the creator's cred,
which pins the parent user_ns.



Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Serge E. Hallyn
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> On 7 July 2016 at 17:01, James Bottomley
>  wrote:
> > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
> >> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> >> > Hi Serge,
> >> >
> >> > On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> >> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man
> >> > > -pages) wrote:
> >> > > > [Rats! Doing now what I should have down to start with. Looping
> >> > > > some lists and CRIU and other possibly relevant people into
> >> > > > this conversation]
> >> > > >
> >> > > > Hi Eric,
> >> > > >
> >> > > > On 5 July 2016 at 23:47, Eric W. Biederman <
> >> > > > ebied...@xmission.com> wrote:
> >> > > > > "Michael Kerrisk (man-pages)" 
> >> > > > > writes:
> >> > > > >
> >> > > > > > Hi Eric,
> >> > > > > >
> >> > > > > > I have a question. Is there any way currently to discover
> >> > > > > > which user namespace a particular nonuser namespace is
> >> > > > > > governed by? Maybe I am missing something, but there does
> >> > > > > > not seem to be a way to do this. Also, can one discover
> >> > > > > > which userns is the parent of a given userns? Again, I
> >> > > > > > can't see a way to do this.
> >> > > > > >
> >> > > > > > The point here is introspecting so that a process might
> >> > > > > > determine what its capabilities are when operating on some
> >> > > > > > resource governed by a (nonuser) namespace.
> >> > > > >
> >> > > > > To the best of my knowledge that there is not an interface to
> >> > > > > get that information.  It would be good to have such an
> >> > > > > interface for no other reason than the CRIU folks are going
> >> > > > > to need it at some point.  I am a bit surprised they have not
> >> > > > > complained yet.
> >> > >
> >> > > I don't think they need it.  They do in fact have what they need.
> >> > >   Assume you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in
> >> > > init_user_ns;  T1 spawned T1_1 in a new userns;  T2 spawned T2_1
> >> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping,
> >> > > does not matter.
> >> > >
> >> > > At restart, it doesn't matter which task originally created the
> >> > > new userns. criu knows T1_1 and T2_1 are in the same userns;  it
> >> > > creates the userns, sets up the mapping, and T1_1 and T2_1
> >> > > setns() to it.
> >> >
> >> > I'm missing something here. How does the parental relationships
> >> > between the user namespaces get reconstructed? Those relationships
> >> > will govern what capabilities a process will have in various user
> >> > namespaces.
> >
> > Actually, you get the parent namespace from the process tree by
> > tracking the user namespaces of the parent pids.   Currently non-root
> > users can't bind the namespace, so the only way to keep a new user_ns
> > around if you're not root is to keep the process around, so for
> > multiply nested user namespaces you can usually build the user_ns
> > hierarchy by looking at the process hierarchy.  Conversely, if the
> > process is reparented to init, chances are that the user_ns is also
> > parented to init_user_ns.
> 
> Yes, but "chances are" == this isn't robust.  PR_SET_CHILD_SUBREAPER
> further complicates things.
> 
> By the way, is that really what happens? Do child user namespaces get
> reparented to the grandparent ns if the parent ns disappears (i.e.,

The parent ns cannot disappear.  The child ns pins the creator's cred,
which pins the parent user_ns.



Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Michael Kerrisk (man-pages)
On 7 July 2016 at 17:01, James Bottomley
 wrote:
> On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
>> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
>> > Hi Serge,
>> >
>> > On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
>> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man
>> > > -pages) wrote:
>> > > > [Rats! Doing now what I should have down to start with. Looping
>> > > > some lists and CRIU and other possibly relevant people into
>> > > > this conversation]
>> > > >
>> > > > Hi Eric,
>> > > >
>> > > > On 5 July 2016 at 23:47, Eric W. Biederman <
>> > > > ebied...@xmission.com> wrote:
>> > > > > "Michael Kerrisk (man-pages)" 
>> > > > > writes:
>> > > > >
>> > > > > > Hi Eric,
>> > > > > >
>> > > > > > I have a question. Is there any way currently to discover
>> > > > > > which user namespace a particular nonuser namespace is
>> > > > > > governed by? Maybe I am missing something, but there does
>> > > > > > not seem to be a way to do this. Also, can one discover
>> > > > > > which userns is the parent of a given userns? Again, I
>> > > > > > can't see a way to do this.
>> > > > > >
>> > > > > > The point here is introspecting so that a process might
>> > > > > > determine what its capabilities are when operating on some
>> > > > > > resource governed by a (nonuser) namespace.
>> > > > >
>> > > > > To the best of my knowledge that there is not an interface to
>> > > > > get that information.  It would be good to have such an
>> > > > > interface for no other reason than the CRIU folks are going
>> > > > > to need it at some point.  I am a bit surprised they have not
>> > > > > complained yet.
>> > >
>> > > I don't think they need it.  They do in fact have what they need.
>> > >   Assume you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in
>> > > init_user_ns;  T1 spawned T1_1 in a new userns;  T2 spawned T2_1
>> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping,
>> > > does not matter.
>> > >
>> > > At restart, it doesn't matter which task originally created the
>> > > new userns. criu knows T1_1 and T2_1 are in the same userns;  it
>> > > creates the userns, sets up the mapping, and T1_1 and T2_1
>> > > setns() to it.
>> >
>> > I'm missing something here. How does the parental relationships
>> > between the user namespaces get reconstructed? Those relationships
>> > will govern what capabilities a process will have in various user
>> > namespaces.
>
> Actually, you get the parent namespace from the process tree by
> tracking the user namespaces of the parent pids.   Currently non-root
> users can't bind the namespace, so the only way to keep a new user_ns
> around if you're not root is to keep the process around, so for
> multiply nested user namespaces you can usually build the user_ns
> hierarchy by looking at the process hierarchy.  Conversely, if the
> process is reparented to init, chances are that the user_ns is also
> parented to init_user_ns.

Yes, but "chances are" == this isn't robust.  PR_SET_CHILD_SUBREAPER
further complicates things.

By the way, is that really what happens? Do child user namespaces get
reparented to the grandparent ns if the parent ns disappears (i.e.,
ceases to have any members and no bind mounts)? I hadn't thought about
that scenario before. It may be worth documenting in
user_namespaces(7).

>> Hm.  Probably best-effort based on the process hierarchy.  So yeah
>> you could probably get a tree into a state that would be wrongly
>> recreated. Create a new netns, bind mount it, exit;  Have another
>> task create a new user_ns, bind mount it, exit;  Third task setns()s
>> first to the new netns then to the new user_ns.  I suspect criu will
>> recreate that wrongly.
>
> This is a bit pathological, and you have to be root to do it: so root
> can set up a nesting hierarchy, bind it and destroy the pids but I know
> of no current orchestration system which does this.
>
> Actually, I have to back pedal a bit: the way I currently set up
> architecture emulation containers does precisely this: I set up the
> namespaces unprivileged with child mount namespaces, but then I ask
> root to bind the userns and kill the process that created it so I have
> a permanent handle to enter the namespace by, so I suspect that when
> our current orchestration systems get more sophisticated, they might
> eventually want to do something like this as well.
>
> In theory, we could get nsfs to show this information as an option
> (just add a show_options entry to the superblock ops), but the problem
> is that although each namespace has a parent user_ns, there's no way to
> get it without digging in the namespace specific structure.  Probably
> we should restructure to move it into ns_common, then we could display
> it (and enforce all namespaces having owning user_ns) but it would be a

I'm missing something here. Is it not already the case that all

Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Michael Kerrisk (man-pages)
On 7 July 2016 at 17:01, James Bottomley
 wrote:
> On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
>> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
>> > Hi Serge,
>> >
>> > On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
>> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man
>> > > -pages) wrote:
>> > > > [Rats! Doing now what I should have down to start with. Looping
>> > > > some lists and CRIU and other possibly relevant people into
>> > > > this conversation]
>> > > >
>> > > > Hi Eric,
>> > > >
>> > > > On 5 July 2016 at 23:47, Eric W. Biederman <
>> > > > ebied...@xmission.com> wrote:
>> > > > > "Michael Kerrisk (man-pages)" 
>> > > > > writes:
>> > > > >
>> > > > > > Hi Eric,
>> > > > > >
>> > > > > > I have a question. Is there any way currently to discover
>> > > > > > which user namespace a particular nonuser namespace is
>> > > > > > governed by? Maybe I am missing something, but there does
>> > > > > > not seem to be a way to do this. Also, can one discover
>> > > > > > which userns is the parent of a given userns? Again, I
>> > > > > > can't see a way to do this.
>> > > > > >
>> > > > > > The point here is introspecting so that a process might
>> > > > > > determine what its capabilities are when operating on some
>> > > > > > resource governed by a (nonuser) namespace.
>> > > > >
>> > > > > To the best of my knowledge that there is not an interface to
>> > > > > get that information.  It would be good to have such an
>> > > > > interface for no other reason than the CRIU folks are going
>> > > > > to need it at some point.  I am a bit surprised they have not
>> > > > > complained yet.
>> > >
>> > > I don't think they need it.  They do in fact have what they need.
>> > >   Assume you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in
>> > > init_user_ns;  T1 spawned T1_1 in a new userns;  T2 spawned T2_1
>> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping,
>> > > does not matter.
>> > >
>> > > At restart, it doesn't matter which task originally created the
>> > > new userns. criu knows T1_1 and T2_1 are in the same userns;  it
>> > > creates the userns, sets up the mapping, and T1_1 and T2_1
>> > > setns() to it.
>> >
>> > I'm missing something here. How does the parental relationships
>> > between the user namespaces get reconstructed? Those relationships
>> > will govern what capabilities a process will have in various user
>> > namespaces.
>
> Actually, you get the parent namespace from the process tree by
> tracking the user namespaces of the parent pids.   Currently non-root
> users can't bind the namespace, so the only way to keep a new user_ns
> around if you're not root is to keep the process around, so for
> multiply nested user namespaces you can usually build the user_ns
> hierarchy by looking at the process hierarchy.  Conversely, if the
> process is reparented to init, chances are that the user_ns is also
> parented to init_user_ns.

Yes, but "chances are" == this isn't robust.  PR_SET_CHILD_SUBREAPER
further complicates things.

By the way, is that really what happens? Do child user namespaces get
reparented to the grandparent ns if the parent ns disappears (i.e.,
ceases to have any members and no bind mounts)? I hadn't thought about
that scenario before. It may be worth documenting in
user_namespaces(7).

>> Hm.  Probably best-effort based on the process hierarchy.  So yeah
>> you could probably get a tree into a state that would be wrongly
>> recreated. Create a new netns, bind mount it, exit;  Have another
>> task create a new user_ns, bind mount it, exit;  Third task setns()s
>> first to the new netns then to the new user_ns.  I suspect criu will
>> recreate that wrongly.
>
> This is a bit pathological, and you have to be root to do it: so root
> can set up a nesting hierarchy, bind it and destroy the pids but I know
> of no current orchestration system which does this.
>
> Actually, I have to back pedal a bit: the way I currently set up
> architecture emulation containers does precisely this: I set up the
> namespaces unprivileged with child mount namespaces, but then I ask
> root to bind the userns and kill the process that created it so I have
> a permanent handle to enter the namespace by, so I suspect that when
> our current orchestration systems get more sophisticated, they might
> eventually want to do something like this as well.
>
> In theory, we could get nsfs to show this information as an option
> (just add a show_options entry to the superblock ops), but the problem
> is that although each namespace has a parent user_ns, there's no way to
> get it without digging in the namespace specific structure.  Probably
> we should restructure to move it into ns_common, then we could display
> it (and enforce all namespaces having owning user_ns) but it would be a

I'm missing something here. Is it not already the case that all
namespaces have an owning user_ns?

Cheers,

Michael

> reasonably large (but 

Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread James Bottomley
On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> > Hi Serge,
> > 
> > On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man
> > > -pages) wrote:
> > > > [Rats! Doing now what I should have down to start with. Looping 
> > > > some lists and CRIU and other possibly relevant people into 
> > > > this conversation]
> > > > 
> > > > Hi Eric,
> > > > 
> > > > On 5 July 2016 at 23:47, Eric W. Biederman <
> > > > ebied...@xmission.com> wrote:
> > > > > "Michael Kerrisk (man-pages)" 
> > > > > writes:
> > > > > 
> > > > > > Hi Eric,
> > > > > > 
> > > > > > I have a question. Is there any way currently to discover 
> > > > > > which user namespace a particular nonuser namespace is 
> > > > > > governed by? Maybe I am missing something, but there does 
> > > > > > not seem to be a way to do this. Also, can one discover 
> > > > > > which userns is the parent of a given userns? Again, I 
> > > > > > can't see a way to do this.
> > > > > > 
> > > > > > The point here is introspecting so that a process might 
> > > > > > determine what its capabilities are when operating on some 
> > > > > > resource governed by a (nonuser) namespace.
> > > > > 
> > > > > To the best of my knowledge that there is not an interface to 
> > > > > get that information.  It would be good to have such an 
> > > > > interface for no other reason than the CRIU folks are going 
> > > > > to need it at some point.  I am a bit surprised they have not
> > > > > complained yet.
> > > 
> > > I don't think they need it.  They do in fact have what they need.
> > >   Assume you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in
> > > init_user_ns;  T1 spawned T1_1 in a new userns;  T2 spawned T2_1 
> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping,
> > > does not matter.
> > > 
> > > At restart, it doesn't matter which task originally created the 
> > > new userns. criu knows T1_1 and T2_1 are in the same userns;  it 
> > > creates the userns, sets up the mapping, and T1_1 and T2_1
> > > setns() to it.
> > 
> > I'm missing something here. How does the parental relationships
> > between the user namespaces get reconstructed? Those relationships
> > will govern what capabilities a process will have in various user
> > namespaces.

Actually, you get the parent namespace from the process tree by
tracking the user namespaces of the parent pids.  Currently non-root
users can't bind the namespace, so the only way to keep a new user_ns
around if you're not root is to keep the process around, so for
multiply nested user namespaces you can usually build the user_ns
hierarchy by looking at the process hierarchy.  Conversely, if the
process is reparented to init, chances are that the user_ns is also
parented to init_user_ns.

> Hm.  Probably best-effort based on the process hierarchy.  So yeah
> you could probably get a tree into a state that would be wrongly
> recreated. Create a new netns, bind mount it, exit;  Have another 
> task create a new user_ns, bind mount it, exit;  Third task setns()s 
> first to the new netns then to the new user_ns.  I suspect criu will 
> recreate that wrongly.

This is a bit pathological, and you have to be root to do it: so root
can set up a nesting hierarchy, bind it and destroy the pids but I know
of no current orchestration system which does this.

Actually, I have to back pedal a bit: the way I currently set up
architecture emulation containers does precisely this: I set up the
namespaces unprivileged with child mount namespaces, but then I ask
root to bind the userns and kill the process that created it so I have
a permanent handle to enter the namespace by, so I suspect that when
our current orchestration systems get more sophisticated, they might
eventually want to do something like this as well.

In theory, we could get nsfs to show this information as an option
(just add a show_options entry to the superblock ops), but the problem
is that although each namespace has a parent user_ns, there's no way to
get it without digging in the namespace specific structure.  Probably
we should restructure to move it into ns_common, then we could display
it (and enforce all namespaces having owning user_ns) but it would be a
reasonably large (but mechanical) change.

James



Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread James Bottomley
On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> > Hi Serge,
> > 
> > On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man
> > > -pages) wrote:
> > > > [Rats! Doing now what I should have down to start with. Looping 
> > > > some lists and CRIU and other possibly relevant people into 
> > > > this conversation]
> > > > 
> > > > Hi Eric,
> > > > 
> > > > On 5 July 2016 at 23:47, Eric W. Biederman <
> > > > ebied...@xmission.com> wrote:
> > > > > "Michael Kerrisk (man-pages)" 
> > > > > writes:
> > > > > 
> > > > > > Hi Eric,
> > > > > > 
> > > > > > I have a question. Is there any way currently to discover 
> > > > > > which user namespace a particular nonuser namespace is 
> > > > > > governed by? Maybe I am missing something, but there does 
> > > > > > not seem to be a way to do this. Also, can one discover 
> > > > > > which userns is the parent of a given userns? Again, I 
> > > > > > can't see a way to do this.
> > > > > > 
> > > > > > The point here is introspecting so that a process might 
> > > > > > determine what its capabilities are when operating on some 
> > > > > > resource governed by a (nonuser) namespace.
> > > > > 
> > > > > To the best of my knowledge that there is not an interface to 
> > > > > get that information.  It would be good to have such an 
> > > > > interface for no other reason than the CRIU folks are going 
> > > > > to need it at some point.  I am a bit surprised they have not
> > > > > complained yet.
> > > 
> > > I don't think they need it.  They do in fact have what they need.
> > >   Assume you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in
> > > init_user_ns;  T1 spawned T1_1 in a new userns;  T2 spawned T2_1 
> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping,
> > > does not matter.
> > > 
> > > At restart, it doesn't matter which task originally created the 
> > > new userns. criu knows T1_1 and T2_1 are in the same userns;  it 
> > > creates the userns, sets up the mapping, and T1_1 and T2_1
> > > setns() to it.
> > 
> > I'm missing something here. How does the parental relationships
> > between the user namespaces get reconstructed? Those relationships
> > will govern what capabilities a process will have in various user
> > namespaces.

Actually, you get the parent namespace from the process tree by
tracking the user namespaces of the parent pids.  Currently non-root
users can't bind the namespace, so the only way to keep a new user_ns
around if you're not root is to keep the process around, so for
multiply nested user namespaces you can usually build the user_ns
hierarchy by looking at the process hierarchy.  Conversely, if the
process is reparented to init, chances are that the user_ns is also
parented to init_user_ns.

> Hm.  Probably best-effort based on the process hierarchy.  So yeah
> you could probably get a tree into a state that would be wrongly
> recreated. Create a new netns, bind mount it, exit;  Have another 
> task create a new user_ns, bind mount it, exit;  Third task setns()s 
> first to the new netns then to the new user_ns.  I suspect criu will 
> recreate that wrongly.

This is a bit pathological, and you have to be root to do it: so root
can set up a nesting hierarchy, bind it and destroy the pids but I know
of no current orchestration system which does this.

Actually, I have to back pedal a bit: the way I currently set up
architecture emulation containers does precisely this: I set up the
namespaces unprivileged with child mount namespaces, but then I ask
root to bind the userns and kill the process that created it so I have
a permanent handle to enter the namespace by, so I suspect that when
our current orchestration systems get more sophisticated, they might
eventually want to do something like this as well.

In theory, we could get nsfs to show this information as an option
(just add a show_options entry to the superblock ops), but the problem
is that although each namespace has a parent user_ns, there's no way to
get it without digging in the namespace specific structure.  Probably
we should restructure to move it into ns_common, then we could display
it (and enforce all namespaces having owning user_ns) but it would be a
reasonably large (but mechanical) change.

James



Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Serge E. Hallyn
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> Hi Serge,
> 
> On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
> >> [Rats! Doing now what I should have down to start with. Looping some
> >> lists and CRIU and other possibly relevant people into this
> >> conversation]
> >>
> >> Hi Eric,
> >>
> >> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
> >> > "Michael Kerrisk (man-pages)"  writes:
> >> >
> >> >> Hi Eric,
> >> >>
> >> >> I have a question. Is there any way currently to discover which
> >> >> user namespace a particular nonuser namespace is governed by?
> >> >> Maybe I am missing something, but there does not seem to be a
> >> >> way to do this. Also, can one discover which userns is the
> >> >> parent of a given userns? Again, I can't see a way to do this.
> >> >>
> >> >> The point here is introspecting so that a process might determine
> >> >> what its capabilities are when operating on some resource governed
> >> >> by a (nonuser) namespace.
> >> >
> >> > To the best of my knowledge that there is not an interface to get that
> >> > information.  It would be good to have such an interface for no other
> >> > reason than the CRIU folks are going to need it at some point.  I am a
> >> > bit surprised they have not complained yet.
> >
> > I don't think they need it.  They do in fact have what they need.  Assume
> > you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
> > spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
> > There's some {handwave} uid mapping, does not matter.
> >
> > At restart, it doesn't matter which task originally created the new userns.
> > criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, 
> > sets
> > up the mapping, and T1_1 and T2_1 setns() to it.
> 
> I'm missing something here. How does the parental relationships
> between the user namespaces get reconstructed? Those relationships
> will govern what capabilities a process will have in various user
> namespaces.

Hm.  Probably best-effort based on the process hierarchy.  So yeah you
could probably get a tree into a state that would be wrongly recreated.
Create a new netns, bind mount it, exit;  Have another task create a
new user_ns, bind mount it, exit;  Third task setns()s first to the new
netns then to the new user_ns.  I suspect criu will recreate that
wrongly.


Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Serge E. Hallyn
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> Hi Serge,
> 
> On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
> >> [Rats! Doing now what I should have down to start with. Looping some
> >> lists and CRIU and other possibly relevant people into this
> >> conversation]
> >>
> >> Hi Eric,
> >>
> >> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
> >> > "Michael Kerrisk (man-pages)"  writes:
> >> >
> >> >> Hi Eric,
> >> >>
> >> >> I have a question. Is there any way currently to discover which
> >> >> user namespace a particular nonuser namespace is governed by?
> >> >> Maybe I am missing something, but there does not seem to be a
> >> >> way to do this. Also, can one discover which userns is the
> >> >> parent of a given userns? Again, I can't see a way to do this.
> >> >>
> >> >> The point here is introspecting so that a process might determine
> >> >> what its capabilities are when operating on some resource governed
> >> >> by a (nonuser) namespace.
> >> >
> >> > To the best of my knowledge that there is not an interface to get that
> >> > information.  It would be good to have such an interface for no other
> >> > reason than the CRIU folks are going to need it at some point.  I am a
> >> > bit surprised they have not complained yet.
> >
> > I don't think they need it.  They do in fact have what they need.  Assume
> > you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
> > spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
> > There's some {handwave} uid mapping, does not matter.
> >
> > At restart, it doesn't matter which task originally created the new userns.
> > criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, 
> > sets
> > up the mapping, and T1_1 and T2_1 setns() to it.
> 
> I'm missing something here. How does the parental relationships
> between the user namespaces get reconstructed? Those relationships
> will govern what capabilities a process will have in various user
> namespaces.

Hm.  Probably best-effort based on the process hierarchy.  So yeah you
could probably get a tree into a state that would be wrongly recreated.
Create a new netns, bind mount it, exit;  Have another task create a
new user_ns, bind mount it, exit;  Third task setns()s first to the new
netns then to the new user_ns.  I suspect criu will recreate that
wrongly.


Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Michael Kerrisk (man-pages)
Hi Serge,

On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
>> [Rats! Doing now what I should have down to start with. Looping some
>> lists and CRIU and other possibly relevant people into this
>> conversation]
>>
>> Hi Eric,
>>
>> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
>> > "Michael Kerrisk (man-pages)"  writes:
>> >
>> >> Hi Eric,
>> >>
>> >> I have a question. Is there any way currently to discover which
>> >> user namespace a particular nonuser namespace is governed by?
>> >> Maybe I am missing something, but there does not seem to be a
>> >> way to do this. Also, can one discover which userns is the
>> >> parent of a given userns? Again, I can't see a way to do this.
>> >>
>> >> The point here is introspecting so that a process might determine
>> >> what its capabilities are when operating on some resource governed
>> >> by a (nonuser) namespace.
>> >
>> > To the best of my knowledge that there is not an interface to get that
>> > information.  It would be good to have such an interface for no other
>> > reason than the CRIU folks are going to need it at some point.  I am a
>> > bit surprised they have not complained yet.
>
> I don't think they need it.  They do in fact have what they need.  Assume
> you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
> spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
> There's some {handwave} uid mapping, does not matter.
>
> At restart, it doesn't matter which task originally created the new userns.
> criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, sets
> up the mapping, and T1_1 and T2_1 setns() to it.

I'm missing something here. How does the parental relationships
between the user namespaces get reconstructed? Those relationships
will govern what capabilities a process will have in various user
namespaces.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: Introspecting userns relationships to other namespaces?

2016-07-07 Thread Michael Kerrisk (man-pages)
Hi Serge,

On 6 July 2016 at 16:13, Serge E. Hallyn  wrote:
> On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
>> [Rats! Doing now what I should have down to start with. Looping some
>> lists and CRIU and other possibly relevant people into this
>> conversation]
>>
>> Hi Eric,
>>
>> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
>> > "Michael Kerrisk (man-pages)"  writes:
>> >
>> >> Hi Eric,
>> >>
>> >> I have a question. Is there any way currently to discover which
>> >> user namespace a particular nonuser namespace is governed by?
>> >> Maybe I am missing something, but there does not seem to be a
>> >> way to do this. Also, can one discover which userns is the
>> >> parent of a given userns? Again, I can't see a way to do this.
>> >>
>> >> The point here is introspecting so that a process might determine
>> >> what its capabilities are when operating on some resource governed
>> >> by a (nonuser) namespace.
>> >
>> > To the best of my knowledge that there is not an interface to get that
>> > information.  It would be good to have such an interface for no other
>> > reason than the CRIU folks are going to need it at some point.  I am a
>> > bit surprised they have not complained yet.
>
> I don't think they need it.  They do in fact have what they need.  Assume
> you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
> spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
> There's some {handwave} uid mapping, does not matter.
>
> At restart, it doesn't matter which task originally created the new userns.
> criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, sets
> up the mapping, and T1_1 and T2_1 setns() to it.

I'm missing something here. How does the parental relationships
between the user namespaces get reconstructed? Those relationships
will govern what capabilities a process will have in various user
namespaces.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: Introspecting userns relationships to other namespaces?

2016-07-06 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
>> [Rats! Doing now what I should have down to start with. Looping some
>> lists and CRIU and other possibly relevant people into this
>> conversation]
>> 
>> Hi Eric,
>> 
>> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
>> > "Michael Kerrisk (man-pages)"  writes:
>> >
>> >> Hi Eric,
>> >>
>> >> I have a question. Is there any way currently to discover which
>> >> user namespace a particular nonuser namespace is governed by?
>> >> Maybe I am missing something, but there does not seem to be a
>> >> way to do this. Also, can one discover which userns is the
>> >> parent of a given userns? Again, I can't see a way to do this.
>> >>
>> >> The point here is introspecting so that a process might determine
>> >> what its capabilities are when operating on some resource governed
>> >> by a (nonuser) namespace.
>> >
>> > To the best of my knowledge that there is not an interface to get that
>> > information.  It would be good to have such an interface for no other
>> > reason than the CRIU folks are going to need it at some point.  I am a
>> > bit surprised they have not complained yet.
>
> I don't think they need it.  They do in fact have what they need.  Assume
> you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
> spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
> There's some {handwave} uid mapping, does not matter.
>
> At restart, it doesn't matter which task originally created the new userns.
> criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, sets
> up the mapping, and T1_1 and T2_1 setns() to it.

Given that the simple cases are so easy it probably doesn't matter in
that sense.

However we now have the case where user namespaces own pid namespaces,
and uts namespaces, and network namespaces, and ipc namespaces, and
filesystems.  Throw in some mount propagation and use of setns and
things could get confusing.   It is something that will need to be
figured out if CRIU is going to properly checkpoint containers
containing containers containing containers containing containers.

Did I mention I like recursion?

>> > That said in a normal use scenario I don't think that information is
>> > needed.
>> >
>> > Do you have a particular use case besides checkpoint/restart where this
>> > is useful?  That might help in coming up with a good userspace interface
>> > for this information.
>> 
>> So, I spend a moderate amount of time working with people to introduce
>> them to the namespaces infrastructure, and one topic that comes up now
>> and this introspection/visualization tools. For example,
>> nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields
>> in /proc/PID--it's possible to (and someone I was working with did)
>> write tools that introspect the PID namespace hierarchy to show all of
>> process's and their PIDs in the various namespace instance. It's a
>> natural enough thing to want to do, when confronted with the
>> complexity of the namespaces.
>> 
>> Someone else then asked me a question that led me to wonder about
>> generally introspecting on the parental relationships between user
>> namespaces and the association of other namespaces types with user
>> namespaces. One use would be visualization, in order to understand the
>> running system. Another would be to answer the question I already
>> mentioned: what capability does process X have to perform operations
>> on a resource governed by namespace Y?
>
> I agree they'll probably want it, but if we want for a real need and
> use case we can do a better job of providing what's needed.

That two which is why I mentioned CRIU.  But yeah it will probably take
a little while to get there.

Eric


Re: Introspecting userns relationships to other namespaces?

2016-07-06 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
>> [Rats! Doing now what I should have down to start with. Looping some
>> lists and CRIU and other possibly relevant people into this
>> conversation]
>> 
>> Hi Eric,
>> 
>> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
>> > "Michael Kerrisk (man-pages)"  writes:
>> >
>> >> Hi Eric,
>> >>
>> >> I have a question. Is there any way currently to discover which
>> >> user namespace a particular nonuser namespace is governed by?
>> >> Maybe I am missing something, but there does not seem to be a
>> >> way to do this. Also, can one discover which userns is the
>> >> parent of a given userns? Again, I can't see a way to do this.
>> >>
>> >> The point here is introspecting so that a process might determine
>> >> what its capabilities are when operating on some resource governed
>> >> by a (nonuser) namespace.
>> >
>> > To the best of my knowledge that there is not an interface to get that
>> > information.  It would be good to have such an interface for no other
>> > reason than the CRIU folks are going to need it at some point.  I am a
>> > bit surprised they have not complained yet.
>
> I don't think they need it.  They do in fact have what they need.  Assume
> you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
> spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
> There's some {handwave} uid mapping, does not matter.
>
> At restart, it doesn't matter which task originally created the new userns.
> criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, sets
> up the mapping, and T1_1 and T2_1 setns() to it.

Given that the simple cases are so easy it probably doesn't matter in
that sense.

However we now have the case where user namespaces own pid namespaces,
and uts namespaces, and network namespaces, and ipc namespaces, and
filesystems.  Throw in some mount propagation and use of setns and
things could get confusing.   It is something that will need to be
figured out if CRIU is going to properly checkpoint containers
containing containers containing containers containing containers.

Did I mention I like recursion?

>> > That said in a normal use scenario I don't think that information is
>> > needed.
>> >
>> > Do you have a particular use case besides checkpoint/restart where this
>> > is useful?  That might help in coming up with a good userspace interface
>> > for this information.
>> 
>> So, I spend a moderate amount of time working with people to introduce
>> them to the namespaces infrastructure, and one topic that comes up now
>> and this introspection/visualization tools. For example,
>> nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields
>> in /proc/PID--it's possible to (and someone I was working with did)
>> write tools that introspect the PID namespace hierarchy to show all of
>> process's and their PIDs in the various namespace instance. It's a
>> natural enough thing to want to do, when confronted with the
>> complexity of the namespaces.
>> 
>> Someone else then asked me a question that led me to wonder about
>> generally introspecting on the parental relationships between user
>> namespaces and the association of other namespaces types with user
>> namespaces. One use would be visualization, in order to understand the
>> running system. Another would be to answer the question I already
>> mentioned: what capability does process X have to perform operations
>> on a resource governed by namespace Y?
>
> I agree they'll probably want it, but if we want for a real need and
> use case we can do a better job of providing what's needed.

That two which is why I mentioned CRIU.  But yeah it will probably take
a little while to get there.

Eric


Re: Introspecting userns relationships to other namespaces?

2016-07-06 Thread Serge E. Hallyn
On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
> [Rats! Doing now what I should have down to start with. Looping some
> lists and CRIU and other possibly relevant people into this
> conversation]
> 
> Hi Eric,
> 
> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
> > "Michael Kerrisk (man-pages)"  writes:
> >
> >> Hi Eric,
> >>
> >> I have a question. Is there any way currently to discover which
> >> user namespace a particular nonuser namespace is governed by?
> >> Maybe I am missing something, but there does not seem to be a
> >> way to do this. Also, can one discover which userns is the
> >> parent of a given userns? Again, I can't see a way to do this.
> >>
> >> The point here is introspecting so that a process might determine
> >> what its capabilities are when operating on some resource governed
> >> by a (nonuser) namespace.
> >
> > To the best of my knowledge that there is not an interface to get that
> > information.  It would be good to have such an interface for no other
> > reason than the CRIU folks are going to need it at some point.  I am a
> > bit surprised they have not complained yet.

I don't think they need it.  They do in fact have what they need.  Assume
you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
There's some {handwave} uid mapping, does not matter.

At restart, it doesn't matter which task originally created the new userns.
criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, sets
up the mapping, and T1_1 and T2_1 setns() to it.

> > That said in a normal use scenario I don't think that information is
> > needed.
> >
> > Do you have a particular use case besides checkpoint/restart where this
> > is useful?  That might help in coming up with a good userspace interface
> > for this information.
> 
> So, I spend a moderate amount of time working with people to introduce
> them to the namespaces infrastructure, and one topic that comes up now
> and this introspection/visualization tools. For example,
> nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields
> in /proc/PID--it's possible to (and someone I was working with did)
> write tools that introspect the PID namespace hierarchy to show all of
> process's and their PIDs in the various namespace instance. It's a
> natural enough thing to want to do, when confronted with the
> complexity of the namespaces.
> 
> Someone else then asked me a question that led me to wonder about
> generally introspecting on the parental relationships between user
> namespaces and the association of other namespaces types with user
> namespaces. One use would be visualization, in order to understand the
> running system. Another would be to answer the question I already
> mentioned: what capability does process X have to perform operations
> on a resource governed by namespace Y?

I agree they'll probably want it, but if we want for a real need and
use case we can do a better job of providing what's needed.

-serge


Re: Introspecting userns relationships to other namespaces?

2016-07-06 Thread Serge E. Hallyn
On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote:
> [Rats! Doing now what I should have down to start with. Looping some
> lists and CRIU and other possibly relevant people into this
> conversation]
> 
> Hi Eric,
> 
> On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
> > "Michael Kerrisk (man-pages)"  writes:
> >
> >> Hi Eric,
> >>
> >> I have a question. Is there any way currently to discover which
> >> user namespace a particular nonuser namespace is governed by?
> >> Maybe I am missing something, but there does not seem to be a
> >> way to do this. Also, can one discover which userns is the
> >> parent of a given userns? Again, I can't see a way to do this.
> >>
> >> The point here is introspecting so that a process might determine
> >> what its capabilities are when operating on some resource governed
> >> by a (nonuser) namespace.
> >
> > To the best of my knowledge that there is not an interface to get that
> > information.  It would be good to have such an interface for no other
> > reason than the CRIU folks are going to need it at some point.  I am a
> > bit surprised they have not complained yet.

I don't think they need it.  They do in fact have what they need.  Assume
you have tasks T1, T2, T1_1 and T2_1;  T1 and T2 are in init_user_ns;  T1
spawned T1_1 in a new userns;  T2 spawned T2_1 which setns()d to T1_1's ns.
There's some {handwave} uid mapping, does not matter.

At restart, it doesn't matter which task originally created the new userns.
criu knows T1_1 and T2_1 are in the same userns;  it creates the userns, sets
up the mapping, and T1_1 and T2_1 setns() to it.

> > That said in a normal use scenario I don't think that information is
> > needed.
> >
> > Do you have a particular use case besides checkpoint/restart where this
> > is useful?  That might help in coming up with a good userspace interface
> > for this information.
> 
> So, I spend a moderate amount of time working with people to introduce
> them to the namespaces infrastructure, and one topic that comes up now
> and this introspection/visualization tools. For example,
> nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields
> in /proc/PID--it's possible to (and someone I was working with did)
> write tools that introspect the PID namespace hierarchy to show all of
> process's and their PIDs in the various namespace instance. It's a
> natural enough thing to want to do, when confronted with the
> complexity of the namespaces.
> 
> Someone else then asked me a question that led me to wonder about
> generally introspecting on the parental relationships between user
> namespaces and the association of other namespaces types with user
> namespaces. One use would be visualization, in order to understand the
> running system. Another would be to answer the question I already
> mentioned: what capability does process X have to perform operations
> on a resource governed by namespace Y?

I agree they'll probably want it, but if we want for a real need and
use case we can do a better job of providing what's needed.

-serge


Re: Introspecting userns relationships to other namespaces?

2016-07-06 Thread Michael Kerrisk (man-pages)
[Rats! Doing now what I should have down to start with. Looping some
lists and CRIU and other possibly relevant people into this
conversation]

Hi Eric,

On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
> "Michael Kerrisk (man-pages)"  writes:
>
>> Hi Eric,
>>
>> I have a question. Is there any way currently to discover which
>> user namespace a particular nonuser namespace is governed by?
>> Maybe I am missing something, but there does not seem to be a
>> way to do this. Also, can one discover which userns is the
>> parent of a given userns? Again, I can't see a way to do this.
>>
>> The point here is introspecting so that a process might determine
>> what its capabilities are when operating on some resource governed
>> by a (nonuser) namespace.
>
> To the best of my knowledge that there is not an interface to get that
> information.  It would be good to have such an interface for no other
> reason than the CRIU folks are going to need it at some point.  I am a
> bit surprised they have not complained yet.
>
> That said in a normal use scenario I don't think that information is
> needed.
>
> Do you have a particular use case besides checkpoint/restart where this
> is useful?  That might help in coming up with a good userspace interface
> for this information.

So, I spend a moderate amount of time working with people to introduce
them to the namespaces infrastructure, and one topic that comes up now
and this introspection/visualization tools. For example,
nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields
in /proc/PID--it's possible to (and someone I was working with did)
write tools that introspect the PID namespace hierarchy to show all of
process's and their PIDs in the various namespace instance. It's a
natural enough thing to want to do, when confronted with the
complexity of the namespaces.

Someone else then asked me a question that led me to wonder about
generally introspecting on the parental relationships between user
namespaces and the association of other namespaces types with user
namespaces. One use would be visualization, in order to understand the
running system. Another would be to answer the question I already
mentioned: what capability does process X have to perform operations
on a resource governed by namespace Y?

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: Introspecting userns relationships to other namespaces?

2016-07-06 Thread Michael Kerrisk (man-pages)
[Rats! Doing now what I should have down to start with. Looping some
lists and CRIU and other possibly relevant people into this
conversation]

Hi Eric,

On 5 July 2016 at 23:47, Eric W. Biederman  wrote:
> "Michael Kerrisk (man-pages)"  writes:
>
>> Hi Eric,
>>
>> I have a question. Is there any way currently to discover which
>> user namespace a particular nonuser namespace is governed by?
>> Maybe I am missing something, but there does not seem to be a
>> way to do this. Also, can one discover which userns is the
>> parent of a given userns? Again, I can't see a way to do this.
>>
>> The point here is introspecting so that a process might determine
>> what its capabilities are when operating on some resource governed
>> by a (nonuser) namespace.
>
> To the best of my knowledge that there is not an interface to get that
> information.  It would be good to have such an interface for no other
> reason than the CRIU folks are going to need it at some point.  I am a
> bit surprised they have not complained yet.
>
> That said in a normal use scenario I don't think that information is
> needed.
>
> Do you have a particular use case besides checkpoint/restart where this
> is useful?  That might help in coming up with a good userspace interface
> for this information.

So, I spend a moderate amount of time working with people to introduce
them to the namespaces infrastructure, and one topic that comes up now
and this introspection/visualization tools. For example,
nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields
in /proc/PID--it's possible to (and someone I was working with did)
write tools that introspect the PID namespace hierarchy to show all of
process's and their PIDs in the various namespace instance. It's a
natural enough thing to want to do, when confronted with the
complexity of the namespaces.

Someone else then asked me a question that led me to wonder about
generally introspecting on the parental relationships between user
namespaces and the association of other namespaces types with user
namespaces. One use would be visualization, in order to understand the
running system. Another would be to answer the question I already
mentioned: what capability does process X have to perform operations
on a resource governed by namespace Y?

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/