Re: Introspecting userns relationships to other namespaces?
"W. Trevor King"writes: > On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote: >> In theory, we could get nsfs to show this information as an option >> (just add a show_options entry to the superblock ops), but the >> problem is that although each namespace has a parent user_ns, >> there's no way to get it without digging in the namespace specific >> structure. Probably we should restructure to move it into >> ns_common, then we could display it (and enforce all namespaces >> having owning user_ns) but it would be a reasonably large (but >> mechanical) change. > > It sounds like everyone is either positive or or neutral on this > groundwork, even if we haven't decided if/how to expose the > information to userspace. I'm happy to work up a patch while the rest > of the discussion continues. I'm also happy to let someone else work > up the patch, if anyone else is chomping at the bit ;). I am dubious on moving all of the user namespace members into ns_common. I would happy to be proved wrong but I suspect in the cases where we actually use that user namespace the code will become uglier. Making the ordinary uses uglier to make a rare corner case nicer is the wrong trade off. But feel free to try it is certainly worth doing if it doesn't make the code that uses the user namespaces uglier. Eric
Re: Introspecting userns relationships to other namespaces?
"W. Trevor King" writes: > On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote: >> In theory, we could get nsfs to show this information as an option >> (just add a show_options entry to the superblock ops), but the >> problem is that although each namespace has a parent user_ns, >> there's no way to get it without digging in the namespace specific >> structure. Probably we should restructure to move it into >> ns_common, then we could display it (and enforce all namespaces >> having owning user_ns) but it would be a reasonably large (but >> mechanical) change. > > It sounds like everyone is either positive or or neutral on this > groundwork, even if we haven't decided if/how to expose the > information to userspace. I'm happy to work up a patch while the rest > of the discussion continues. I'm also happy to let someone else work > up the patch, if anyone else is chomping at the bit ;). I am dubious on moving all of the user namespace members into ns_common. I would happy to be proved wrong but I suspect in the cases where we actually use that user namespace the code will become uglier. Making the ordinary uses uglier to make a rare corner case nicer is the wrong trade off. But feel free to try it is certainly worth doing if it doesn't make the code that uses the user namespaces uglier. Eric
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote: > In theory, we could get nsfs to show this information as an option > (just add a show_options entry to the superblock ops), but the > problem is that although each namespace has a parent user_ns, > there's no way to get it without digging in the namespace specific > structure. Probably we should restructure to move it into > ns_common, then we could display it (and enforce all namespaces > having owning user_ns) but it would be a reasonably large (but > mechanical) change. It sounds like everyone is either positive or or neutral on this groundwork, even if we haven't decided if/how to expose the information to userspace. I'm happy to work up a patch while the rest of the discussion continues. I'm also happy to let someone else work up the patch, if anyone else is chomping at the bit ;). Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote: > In theory, we could get nsfs to show this information as an option > (just add a show_options entry to the superblock ops), but the > problem is that although each namespace has a parent user_ns, > there's no way to get it without digging in the namespace specific > structure. Probably we should restructure to move it into > ns_common, then we could display it (and enforce all namespaces > having owning user_ns) but it would be a reasonably large (but > mechanical) change. It sounds like everyone is either positive or or neutral on this groundwork, even if we haven't decided if/how to expose the information to userspace. I'm happy to work up a patch while the rest of the discussion continues. I'm also happy to let someone else work up the patch, if anyone else is chomping at the bit ;). Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On 07/07/2016 09:17 PM, James Bottomley wrote: On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote: On 7 July 2016 at 17:01, James Bottomleywrote: [Serge already answered the parenting issue] On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: Hm. Probably best-effort based on the process hierarchy. So yeah you could probably get a tree into a state that would be wrongly recreated. Create a new netns, bind mount it, exit; Have another task create a new user_ns, bind mount it, exit; Third task setns()s first to the new netns then to the new user_ns. I suspect criu will recreate that wrongly. This is a bit pathological, and you have to be root to do it: so root can set up a nesting hierarchy, bind it and destroy the pids but I know of no current orchestration system which does this. Actually, I have to back pedal a bit: the way I currently set up architecture emulation containers does precisely this: I set up the namespaces unprivileged with child mount namespaces, but then I ask root to bind the userns and kill the process that created it so I have a permanent handle to enter the namespace by, so I suspect that when our current orchestration systems get more sophisticated, they might eventually want to do something like this as well. In theory, we could get nsfs to show this information as an option (just add a show_options entry to the superblock ops), but the problem is that although each namespace has a parent user_ns, there's no way to get it without digging in the namespace specific structure. Probably we should restructure to move it into ns_common, then we could display it (and enforce all namespaces having owning user_ns) but it would be a I'm missing something here. Is it not already the case that all namespaces have an owning user_ns? Um, yes, I don't believe I said they don't. The problem I thought you were having is that there's no way of seeing what it is. Your words "and enforce all namespaces having owning user_ns" were what left me puzzled--it sounded to me that the implication was that this is not "enforced" right now. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Introspecting userns relationships to other namespaces?
On 07/07/2016 09:17 PM, James Bottomley wrote: On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote: On 7 July 2016 at 17:01, James Bottomley wrote: [Serge already answered the parenting issue] On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: Hm. Probably best-effort based on the process hierarchy. So yeah you could probably get a tree into a state that would be wrongly recreated. Create a new netns, bind mount it, exit; Have another task create a new user_ns, bind mount it, exit; Third task setns()s first to the new netns then to the new user_ns. I suspect criu will recreate that wrongly. This is a bit pathological, and you have to be root to do it: so root can set up a nesting hierarchy, bind it and destroy the pids but I know of no current orchestration system which does this. Actually, I have to back pedal a bit: the way I currently set up architecture emulation containers does precisely this: I set up the namespaces unprivileged with child mount namespaces, but then I ask root to bind the userns and kill the process that created it so I have a permanent handle to enter the namespace by, so I suspect that when our current orchestration systems get more sophisticated, they might eventually want to do something like this as well. In theory, we could get nsfs to show this information as an option (just add a show_options entry to the superblock ops), but the problem is that although each namespace has a parent user_ns, there's no way to get it without digging in the namespace specific structure. Probably we should restructure to move it into ns_common, then we could display it (and enforce all namespaces having owning user_ns) but it would be a I'm missing something here. Is it not already the case that all namespaces have an owning user_ns? Um, yes, I don't believe I said they don't. The problem I thought you were having is that there's no way of seeing what it is. Your words "and enforce all namespaces having owning user_ns" were what left me puzzled--it sounded to me that the implication was that this is not "enforced" right now. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 11:54:54PM -0700, Andrew Vagin wrote: > On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote: > > On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote: > > > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote: > > > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote: > > > > > I think we can show all required information in fdinfo. We open > > > > > a namespaces file (/proc/pid/ns/N) and then read > > > > > /proc/pid/fdinfo/X for it. > > > > > > > > Here is a proof-of-concept patch. > > > > … > > > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY) > > > > > > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read() > > > > pos:0 > > > > flags: 010 > > > > mnt_id: 2 > > > > userns: 4026531837 > > > > > > > > In [4]: print "/proc/self/ns/user -> %s" % > > > > os.readlink("/proc/self/ns/user") > > > > /proc/self/ns/user -> user:[4026531837] > > > > > > can't you just do > > > > > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/' > > … > > If you only put one level in fdinfo, you're stuck if one of the > > namespaces involved has neither bind mounts nor a PID to give you > > handle on it [1]. And if you want to put that whole ancestor tree in > > fdinfo, you have to come up with some way to handle the two-parent > > branching. > > I think it's a bad idea to draw a tree in fdinfo. Why do we want to know > this hierarchy? Probably we will want to access these namespaces (setns), > in this case we need to have a way to open them. > > Maybe we need to extend functionality of the nsfs filesystem > (somethink like /proc/PID for namespaces)? A similar idea came up during the PID-translation brainstorming [1], but I'm not sure if anything ever came of that. Once you're dealing with a separate pseudo-filesystem, it seems easier to decouple it from proc and just make a mountable namespace-hierarchy filesystem (like we have mountable cgroup hierarchy filesystems). That also gets you an opt-in playground while the details of the nsfs filesystem view are worked out. Are you imagining something like: $ tree . . ├── mnt{inum} │ └── user -> ../user{inum} ├── pid{inum} │ ├── pid{inum} │ │ └── user -> ../../user{inum}/user{inum} │ └── user -> ../user{inum} └── user{inum} └── user{inum} Cheers, Trevor [1]: http://thread.gmane.org/gmane.linux.kernel.containers/28105/focus=28164 Subject: RE: [RFC]Pid conversion between pid namespace Date: Fri, 25 Jul 2014 10:01:45 + Message-ID: <5871495633F38949900D2BF2DC04883E56C7A2@G08CNEXMBPEKD02.g08.fujitsu.local> -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 11:54:54PM -0700, Andrew Vagin wrote: > On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote: > > On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote: > > > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote: > > > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote: > > > > > I think we can show all required information in fdinfo. We open > > > > > a namespaces file (/proc/pid/ns/N) and then read > > > > > /proc/pid/fdinfo/X for it. > > > > > > > > Here is a proof-of-concept patch. > > > > … > > > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY) > > > > > > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read() > > > > pos:0 > > > > flags: 010 > > > > mnt_id: 2 > > > > userns: 4026531837 > > > > > > > > In [4]: print "/proc/self/ns/user -> %s" % > > > > os.readlink("/proc/self/ns/user") > > > > /proc/self/ns/user -> user:[4026531837] > > > > > > can't you just do > > > > > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/' > > … > > If you only put one level in fdinfo, you're stuck if one of the > > namespaces involved has neither bind mounts nor a PID to give you > > handle on it [1]. And if you want to put that whole ancestor tree in > > fdinfo, you have to come up with some way to handle the two-parent > > branching. > > I think it's a bad idea to draw a tree in fdinfo. Why do we want to know > this hierarchy? Probably we will want to access these namespaces (setns), > in this case we need to have a way to open them. > > Maybe we need to extend functionality of the nsfs filesystem > (somethink like /proc/PID for namespaces)? A similar idea came up during the PID-translation brainstorming [1], but I'm not sure if anything ever came of that. Once you're dealing with a separate pseudo-filesystem, it seems easier to decouple it from proc and just make a mountable namespace-hierarchy filesystem (like we have mountable cgroup hierarchy filesystems). That also gets you an opt-in playground while the details of the nsfs filesystem view are worked out. Are you imagining something like: $ tree . . ├── mnt{inum} │ └── user -> ../user{inum} ├── pid{inum} │ ├── pid{inum} │ │ └── user -> ../../user{inum}/user{inum} │ └── user -> ../user{inum} └── user{inum} └── user{inum} Cheers, Trevor [1]: http://thread.gmane.org/gmane.linux.kernel.containers/28105/focus=28164 Subject: RE: [RFC]Pid conversion between pid namespace Date: Fri, 25 Jul 2014 10:01:45 + Message-ID: <5871495633F38949900D2BF2DC04883E56C7A2@G08CNEXMBPEKD02.g08.fujitsu.local> -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote: > On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote: > > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote: > > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote: > > > > I think we can show all required information in fdinfo. We open > > > > a namespaces file (/proc/pid/ns/N) and then read > > > > /proc/pid/fdinfo/X for it. > > > > > > Here is a proof-of-concept patch. > > > … > > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY) > > > > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read() > > > pos: 0 > > > flags:010 > > > mnt_id: 2 > > > userns: 4026531837 > > > > > > In [4]: print "/proc/self/ns/user -> %s" % > > > os.readlink("/proc/self/ns/user") > > > /proc/self/ns/user -> user:[4026531837] > > > > can't you just do > > > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/' > > With Andrew's fdinfo approach you know the user namespace owning > /proc/self/ns/pid is 4026531837. That happens to be > /proc/self/ns/user in this case, but doesn't have to be in general. > > > But what Michael was asking about was the parent user_ns of all the > > other namespaces ... I don't think there's any way we can get that > > out of any information in /proc/self/ > > If fdinfo only shows immediate parents, you'd need to walk the tree to > get back to the root. And at each layer of the PID namespace tree > there will be another user-namespace parent branching off). With a > tree like: > > Namespace | Parent | Owning userns > ---+--+--- > Root userns | -| - > Root PID ns | -| Root userns > Child userns | Root usens | Root userns > Child PID ns | Root PID ns | Root userns > Grandchild userns | Child userns | Child userns > Grandchild PID ns | Child PID ns | Grandchild userns > > Walking from the granchild PID namespace would give you: > > Grandchild PID ns > |-- Child PID ns > | |-- Root PID ns > | `-- Root userns > `-- Granchild userns > `-- Child userns > `-- Root userns > > If you only put one level in fdinfo, you're stuck if one of the > namespaces involved has neither bind mounts nor a PID to give you > handle on it [1]. And if you want to put that whole ancestor tree in > fdinfo, you have to come up with some way to handle the two-parent > branching. I think it's a bad idea to draw a tree in fdinfo. Why do we want to know this hierarchy? Probably we will want to access these namespaces (setns), in this case we need to have a way to open them. Maybe we need to extend functionality of the nsfs filesystem (somethink like /proc/PID for namespaces)? > > I'm also not sure how exposing nsfs information [2] would handle > namespaces that had neither a surviving bind mount nor a direct > process. > > If all the information is available (possible after a mechanical patch > [3] makes it more accessible), then it seems easier to put it in a > separate /proc or /sys file. There was a stab at this for PID > namespaces in [4] (the same series that landed NStgid, etc.) with > additional background and alternative approaches in [5]. There were > problems with that patch (and it was trying to do more by also listing > a process's ID in each PID namespace), but the “let's put the whole > tree in a new file” approach seems sound to me. > > Cheers, > Trevor > > [1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536 > Subject: Re: Introspecting userns relationships to other namespaces? > Date: Thu, 7 Jul 2016 13:24:42 -0500 > Message-ID: <20160707182442.ga6...@mail.hallyn.com> > [2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499 > Subject: Re: [CRIU] Introspecting userns relationships to other > namespaces? > Date: Thu, 07 Jul 2016 20:20:05 -0700 > Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com> > [3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537 > Subject: Re: Introspecting userns relationships to other namespaces? > Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com> > Date: Thu, 07 Jul 2016 08:01:52 -0700 > [4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928 > Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace > Date: Tue, 23 Dec 2014 18:20:37 +0800 &
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote: > On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote: > > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote: > > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote: > > > > I think we can show all required information in fdinfo. We open > > > > a namespaces file (/proc/pid/ns/N) and then read > > > > /proc/pid/fdinfo/X for it. > > > > > > Here is a proof-of-concept patch. > > > … > > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY) > > > > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read() > > > pos: 0 > > > flags:010 > > > mnt_id: 2 > > > userns: 4026531837 > > > > > > In [4]: print "/proc/self/ns/user -> %s" % > > > os.readlink("/proc/self/ns/user") > > > /proc/self/ns/user -> user:[4026531837] > > > > can't you just do > > > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/' > > With Andrew's fdinfo approach you know the user namespace owning > /proc/self/ns/pid is 4026531837. That happens to be > /proc/self/ns/user in this case, but doesn't have to be in general. > > > But what Michael was asking about was the parent user_ns of all the > > other namespaces ... I don't think there's any way we can get that > > out of any information in /proc/self/ > > If fdinfo only shows immediate parents, you'd need to walk the tree to > get back to the root. And at each layer of the PID namespace tree > there will be another user-namespace parent branching off). With a > tree like: > > Namespace | Parent | Owning userns > ---+--+--- > Root userns | -| - > Root PID ns | -| Root userns > Child userns | Root usens | Root userns > Child PID ns | Root PID ns | Root userns > Grandchild userns | Child userns | Child userns > Grandchild PID ns | Child PID ns | Grandchild userns > > Walking from the granchild PID namespace would give you: > > Grandchild PID ns > |-- Child PID ns > | |-- Root PID ns > | `-- Root userns > `-- Granchild userns > `-- Child userns > `-- Root userns > > If you only put one level in fdinfo, you're stuck if one of the > namespaces involved has neither bind mounts nor a PID to give you > handle on it [1]. And if you want to put that whole ancestor tree in > fdinfo, you have to come up with some way to handle the two-parent > branching. I think it's a bad idea to draw a tree in fdinfo. Why do we want to know this hierarchy? Probably we will want to access these namespaces (setns), in this case we need to have a way to open them. Maybe we need to extend functionality of the nsfs filesystem (somethink like /proc/PID for namespaces)? > > I'm also not sure how exposing nsfs information [2] would handle > namespaces that had neither a surviving bind mount nor a direct > process. > > If all the information is available (possible after a mechanical patch > [3] makes it more accessible), then it seems easier to put it in a > separate /proc or /sys file. There was a stab at this for PID > namespaces in [4] (the same series that landed NStgid, etc.) with > additional background and alternative approaches in [5]. There were > problems with that patch (and it was trying to do more by also listing > a process's ID in each PID namespace), but the “let's put the whole > tree in a new file” approach seems sound to me. > > Cheers, > Trevor > > [1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536 > Subject: Re: Introspecting userns relationships to other namespaces? > Date: Thu, 7 Jul 2016 13:24:42 -0500 > Message-ID: <20160707182442.ga6...@mail.hallyn.com> > [2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499 > Subject: Re: [CRIU] Introspecting userns relationships to other > namespaces? > Date: Thu, 07 Jul 2016 20:20:05 -0700 > Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com> > [3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537 > Subject: Re: Introspecting userns relationships to other namespaces? > Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com> > Date: Thu, 07 Jul 2016 08:01:52 -0700 > [4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928 > Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace > Date: Tue, 23 Dec 2014 18:20:37 +0800 &
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote: > And if you want to put that whole ancestor tree in fdinfo, you have > to come up with some way to handle the two-parent branching. Going towards the roots is nice, because you know a given namespace will only have two parents, but it leaks information about the system into the container. It's probably better to follow the NStgid, etc. example and only walk toward the leaves. So a (privileged?) process in the root namespace could see the whole tree, while a process in non-root namespaces could only see their namespaces and descendants. In situations where you were part of a namespace that belonged to an external user namespace (e.g. you nsenter a child user namespace but are still in the root PID namespace), you'd want an “unknown” entry for the parent you couldn't see. Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 10:26:50PM -0700, W. Trevor King wrote: > And if you want to put that whole ancestor tree in fdinfo, you have > to come up with some way to handle the two-parent branching. Going towards the roots is nice, because you know a given namespace will only have two parents, but it leaks information about the system into the container. It's probably better to follow the NStgid, etc. example and only walk toward the leaves. So a (privileged?) process in the root namespace could see the whole tree, while a process in non-root namespaces could only see their namespaces and descendants. In situations where you were part of a namespace that belonged to an external user namespace (e.g. you nsenter a child user namespace but are still in the root PID namespace), you'd want an “unknown” entry for the parent you couldn't see. Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote: > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote: > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote: > > > I think we can show all required information in fdinfo. We open > > > a namespaces file (/proc/pid/ns/N) and then read > > > /proc/pid/fdinfo/X for it. > > > > Here is a proof-of-concept patch. > > … > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY) > > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read() > > pos:0 > > flags: 010 > > mnt_id: 2 > > userns: 4026531837 > > > > In [4]: print "/proc/self/ns/user -> %s" % > > os.readlink("/proc/self/ns/user") > > /proc/self/ns/user -> user:[4026531837] > > can't you just do > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/' With Andrew's fdinfo approach you know the user namespace owning /proc/self/ns/pid is 4026531837. That happens to be /proc/self/ns/user in this case, but doesn't have to be in general. > But what Michael was asking about was the parent user_ns of all the > other namespaces ... I don't think there's any way we can get that > out of any information in /proc/self/ If fdinfo only shows immediate parents, you'd need to walk the tree to get back to the root. And at each layer of the PID namespace tree there will be another user-namespace parent branching off). With a tree like: Namespace | Parent | Owning userns ---+--+--- Root userns | -| - Root PID ns | -| Root userns Child userns | Root usens | Root userns Child PID ns | Root PID ns | Root userns Grandchild userns | Child userns | Child userns Grandchild PID ns | Child PID ns | Grandchild userns Walking from the granchild PID namespace would give you: Grandchild PID ns |-- Child PID ns | |-- Root PID ns | `-- Root userns `-- Granchild userns `-- Child userns `-- Root userns If you only put one level in fdinfo, you're stuck if one of the namespaces involved has neither bind mounts nor a PID to give you handle on it [1]. And if you want to put that whole ancestor tree in fdinfo, you have to come up with some way to handle the two-parent branching. I'm also not sure how exposing nsfs information [2] would handle namespaces that had neither a surviving bind mount nor a direct process. If all the information is available (possible after a mechanical patch [3] makes it more accessible), then it seems easier to put it in a separate /proc or /sys file. There was a stab at this for PID namespaces in [4] (the same series that landed NStgid, etc.) with additional background and alternative approaches in [5]. There were problems with that patch (and it was trying to do more by also listing a process's ID in each PID namespace), but the “let's put the whole tree in a new file” approach seems sound to me. Cheers, Trevor [1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536 Subject: Re: Introspecting userns relationships to other namespaces? Date: Thu, 7 Jul 2016 13:24:42 -0500 Message-ID: <20160707182442.ga6...@mail.hallyn.com> [2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499 Subject: Re: [CRIU] Introspecting userns relationships to other namespaces? Date: Thu, 07 Jul 2016 20:20:05 -0700 Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com> [3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537 Subject: Re: Introspecting userns relationships to other namespaces? Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com> Date: Thu, 07 Jul 2016 08:01:52 -0700 [4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928 Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Date: Tue, 23 Dec 2014 18:20:37 +0800 Message-ID: <1419330039-29207-2-git-send-email-chenhanx...@cn.fujitsu.com> [5]: http://thread.gmane.org/gmane.linux.kernel.containers/28105 Subject: [RFC]Pid conversion between pid namespace Date: Thu, 3 Jul 2014 12:18:33 + Message-ID: <5871495633F38949900D2BF2DC04883E55C374@G08CNEXMBPEKD02.g08.fujitsu.local> -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On Thu, Jul 07, 2016 at 08:26:47PM -0700, James Bottomley wrote: > On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote: > > On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote: > > > I think we can show all required information in fdinfo. We open > > > a namespaces file (/proc/pid/ns/N) and then read > > > /proc/pid/fdinfo/X for it. > > > > Here is a proof-of-concept patch. > > … > > In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY) > > > > In [3]: print open("/proc/self/fdinfo/%d" % fd).read() > > pos:0 > > flags: 010 > > mnt_id: 2 > > userns: 4026531837 > > > > In [4]: print "/proc/self/ns/user -> %s" % > > os.readlink("/proc/self/ns/user") > > /proc/self/ns/user -> user:[4026531837] > > can't you just do > > readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/' With Andrew's fdinfo approach you know the user namespace owning /proc/self/ns/pid is 4026531837. That happens to be /proc/self/ns/user in this case, but doesn't have to be in general. > But what Michael was asking about was the parent user_ns of all the > other namespaces ... I don't think there's any way we can get that > out of any information in /proc/self/ If fdinfo only shows immediate parents, you'd need to walk the tree to get back to the root. And at each layer of the PID namespace tree there will be another user-namespace parent branching off). With a tree like: Namespace | Parent | Owning userns ---+--+--- Root userns | -| - Root PID ns | -| Root userns Child userns | Root usens | Root userns Child PID ns | Root PID ns | Root userns Grandchild userns | Child userns | Child userns Grandchild PID ns | Child PID ns | Grandchild userns Walking from the granchild PID namespace would give you: Grandchild PID ns |-- Child PID ns | |-- Root PID ns | `-- Root userns `-- Granchild userns `-- Child userns `-- Root userns If you only put one level in fdinfo, you're stuck if one of the namespaces involved has neither bind mounts nor a PID to give you handle on it [1]. And if you want to put that whole ancestor tree in fdinfo, you have to come up with some way to handle the two-parent branching. I'm also not sure how exposing nsfs information [2] would handle namespaces that had neither a surviving bind mount nor a direct process. If all the information is available (possible after a mechanical patch [3] makes it more accessible), then it seems easier to put it in a separate /proc or /sys file. There was a stab at this for PID namespaces in [4] (the same series that landed NStgid, etc.) with additional background and alternative approaches in [5]. There were problems with that patch (and it was trying to do more by also listing a process's ID in each PID namespace), but the “let's put the whole tree in a new file” approach seems sound to me. Cheers, Trevor [1]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20536 Subject: Re: Introspecting userns relationships to other namespaces? Date: Thu, 7 Jul 2016 13:24:42 -0500 Message-ID: <20160707182442.ga6...@mail.hallyn.com> [2]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=30499 Subject: Re: [CRIU] Introspecting userns relationships to other namespaces? Date: Thu, 07 Jul 2016 20:20:05 -0700 Message-ID: <1467948005.2322.84.ca...@hansenpartnership.com> [3]: http://thread.gmane.org/gmane.linux.kernel.containers/30456/focus=20537 Subject: Re: Introspecting userns relationships to other namespaces? Message-ID: <1467903712.2347.16.ca...@hansenpartnership.com> Date: Thu, 07 Jul 2016 08:01:52 -0700 [4]: http://thread.gmane.org/gmane.linux.kernel.containers/28925/focus=28928 Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Date: Tue, 23 Dec 2014 18:20:37 +0800 Message-ID: <1419330039-29207-2-git-send-email-chenhanx...@cn.fujitsu.com> [5]: http://thread.gmane.org/gmane.linux.kernel.containers/28105 Subject: [RFC]Pid conversion between pid namespace Date: Thu, 3 Jul 2014 12:18:33 + Message-ID: <5871495633F38949900D2BF2DC04883E55C374@G08CNEXMBPEKD02.g08.fujitsu.local> -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: Introspecting userns relationships to other namespaces?
On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote: > On 7 July 2016 at 17:01, James Bottomley >wrote: [Serge already answered the parenting issue] > > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: > > > Hm. Probably best-effort based on the process hierarchy. So > > > yeah you could probably get a tree into a state that would be > > > wrongly recreated. Create a new netns, bind mount it, exit; Have > > > another task create a new user_ns, bind mount it, exit; Third > > > task setns()s first to the new netns then to the new user_ns. I > > > suspect criu will recreate that wrongly. > > > > This is a bit pathological, and you have to be root to do it: so > > root can set up a nesting hierarchy, bind it and destroy the pids > > but I know of no current orchestration system which does this. > > > > Actually, I have to back pedal a bit: the way I currently set up > > architecture emulation containers does precisely this: I set up the > > namespaces unprivileged with child mount namespaces, but then I ask > > root to bind the userns and kill the process that created it so I > > have a permanent handle to enter the namespace by, so I suspect > > that when our current orchestration systems get more sophisticated, > > they might eventually want to do something like this as well. > > > > In theory, we could get nsfs to show this information as an option > > (just add a show_options entry to the superblock ops), but the > > problem is that although each namespace has a parent user_ns, > > there's no way to get it without digging in the namespace specific > > structure. Probably we should restructure to move it into > > ns_common, then we could display it (and enforce all namespaces > > having owning user_ns) but it would be a > > I'm missing something here. Is it not already the case that all > namespaces have an owning user_ns? Um, yes, I don't believe I said they don't. The problem I thought you were having is that there's no way of seeing what it is. nsfs is the Namespace fileystem where bound namespaces appear to a cat of /proc/self/mounts. It can display any information that's in ns_common (the common core of namespaces) but the owning user_ns pointer currently isn't in this structure. Every user namespace has a pointer to it, but they're all privately embedded in the individual namespace specific structures. What I was proposing was that since every current namespace has a pointer somewhere to the owning user namespace, we could abstract this out into ns_common so it's now accessible to be displayed by nsfs, probably as a mount option. James
Re: Introspecting userns relationships to other namespaces?
On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages) wrote: > On 7 July 2016 at 17:01, James Bottomley > wrote: [Serge already answered the parenting issue] > > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: > > > Hm. Probably best-effort based on the process hierarchy. So > > > yeah you could probably get a tree into a state that would be > > > wrongly recreated. Create a new netns, bind mount it, exit; Have > > > another task create a new user_ns, bind mount it, exit; Third > > > task setns()s first to the new netns then to the new user_ns. I > > > suspect criu will recreate that wrongly. > > > > This is a bit pathological, and you have to be root to do it: so > > root can set up a nesting hierarchy, bind it and destroy the pids > > but I know of no current orchestration system which does this. > > > > Actually, I have to back pedal a bit: the way I currently set up > > architecture emulation containers does precisely this: I set up the > > namespaces unprivileged with child mount namespaces, but then I ask > > root to bind the userns and kill the process that created it so I > > have a permanent handle to enter the namespace by, so I suspect > > that when our current orchestration systems get more sophisticated, > > they might eventually want to do something like this as well. > > > > In theory, we could get nsfs to show this information as an option > > (just add a show_options entry to the superblock ops), but the > > problem is that although each namespace has a parent user_ns, > > there's no way to get it without digging in the namespace specific > > structure. Probably we should restructure to move it into > > ns_common, then we could display it (and enforce all namespaces > > having owning user_ns) but it would be a > > I'm missing something here. Is it not already the case that all > namespaces have an owning user_ns? Um, yes, I don't believe I said they don't. The problem I thought you were having is that there's no way of seeing what it is. nsfs is the Namespace fileystem where bound namespaces appear to a cat of /proc/self/mounts. It can display any information that's in ns_common (the common core of namespaces) but the owning user_ns pointer currently isn't in this structure. Every user namespace has a pointer to it, but they're all privately embedded in the individual namespace specific structures. What I was proposing was that since every current namespace has a pointer somewhere to the owning user namespace, we could abstract this out into ns_common so it's now accessible to be displayed by nsfs, probably as a mount option. James
Re: Introspecting userns relationships to other namespaces?
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > On 7 July 2016 at 17:01, James Bottomley >wrote: > > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: > >> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > >> > Hi Serge, > >> > > >> > On 6 July 2016 at 16:13, Serge E. Hallyn wrote: > >> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man > >> > > -pages) wrote: > >> > > > [Rats! Doing now what I should have down to start with. Looping > >> > > > some lists and CRIU and other possibly relevant people into > >> > > > this conversation] > >> > > > > >> > > > Hi Eric, > >> > > > > >> > > > On 5 July 2016 at 23:47, Eric W. Biederman < > >> > > > ebied...@xmission.com> wrote: > >> > > > > "Michael Kerrisk (man-pages)" > >> > > > > writes: > >> > > > > > >> > > > > > Hi Eric, > >> > > > > > > >> > > > > > I have a question. Is there any way currently to discover > >> > > > > > which user namespace a particular nonuser namespace is > >> > > > > > governed by? Maybe I am missing something, but there does > >> > > > > > not seem to be a way to do this. Also, can one discover > >> > > > > > which userns is the parent of a given userns? Again, I > >> > > > > > can't see a way to do this. > >> > > > > > > >> > > > > > The point here is introspecting so that a process might > >> > > > > > determine what its capabilities are when operating on some > >> > > > > > resource governed by a (nonuser) namespace. > >> > > > > > >> > > > > To the best of my knowledge that there is not an interface to > >> > > > > get that information. It would be good to have such an > >> > > > > interface for no other reason than the CRIU folks are going > >> > > > > to need it at some point. I am a bit surprised they have not > >> > > > > complained yet. > >> > > > >> > > I don't think they need it. They do in fact have what they need. > >> > > Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in > >> > > init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 > >> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping, > >> > > does not matter. > >> > > > >> > > At restart, it doesn't matter which task originally created the > >> > > new userns. criu knows T1_1 and T2_1 are in the same userns; it > >> > > creates the userns, sets up the mapping, and T1_1 and T2_1 > >> > > setns() to it. > >> > > >> > I'm missing something here. How does the parental relationships > >> > between the user namespaces get reconstructed? Those relationships > >> > will govern what capabilities a process will have in various user > >> > namespaces. > > > > Actually, you get the parent namespace from the process tree by > > tracking the user namespaces of the parent pids. Currently non-root > > users can't bind the namespace, so the only way to keep a new user_ns > > around if you're not root is to keep the process around, so for > > multiply nested user namespaces you can usually build the user_ns > > hierarchy by looking at the process hierarchy. Conversely, if the > > process is reparented to init, chances are that the user_ns is also > > parented to init_user_ns. > > Yes, but "chances are" == this isn't robust. PR_SET_CHILD_SUBREAPER > further complicates things. > > By the way, is that really what happens? Do child user namespaces get > reparented to the grandparent ns if the parent ns disappears (i.e., The parent ns cannot disappear. The child ns pins the creator's cred, which pins the parent user_ns.
Re: Introspecting userns relationships to other namespaces?
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > On 7 July 2016 at 17:01, James Bottomley > wrote: > > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: > >> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > >> > Hi Serge, > >> > > >> > On 6 July 2016 at 16:13, Serge E. Hallyn wrote: > >> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man > >> > > -pages) wrote: > >> > > > [Rats! Doing now what I should have down to start with. Looping > >> > > > some lists and CRIU and other possibly relevant people into > >> > > > this conversation] > >> > > > > >> > > > Hi Eric, > >> > > > > >> > > > On 5 July 2016 at 23:47, Eric W. Biederman < > >> > > > ebied...@xmission.com> wrote: > >> > > > > "Michael Kerrisk (man-pages)" > >> > > > > writes: > >> > > > > > >> > > > > > Hi Eric, > >> > > > > > > >> > > > > > I have a question. Is there any way currently to discover > >> > > > > > which user namespace a particular nonuser namespace is > >> > > > > > governed by? Maybe I am missing something, but there does > >> > > > > > not seem to be a way to do this. Also, can one discover > >> > > > > > which userns is the parent of a given userns? Again, I > >> > > > > > can't see a way to do this. > >> > > > > > > >> > > > > > The point here is introspecting so that a process might > >> > > > > > determine what its capabilities are when operating on some > >> > > > > > resource governed by a (nonuser) namespace. > >> > > > > > >> > > > > To the best of my knowledge that there is not an interface to > >> > > > > get that information. It would be good to have such an > >> > > > > interface for no other reason than the CRIU folks are going > >> > > > > to need it at some point. I am a bit surprised they have not > >> > > > > complained yet. > >> > > > >> > > I don't think they need it. They do in fact have what they need. > >> > > Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in > >> > > init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 > >> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping, > >> > > does not matter. > >> > > > >> > > At restart, it doesn't matter which task originally created the > >> > > new userns. criu knows T1_1 and T2_1 are in the same userns; it > >> > > creates the userns, sets up the mapping, and T1_1 and T2_1 > >> > > setns() to it. > >> > > >> > I'm missing something here. How does the parental relationships > >> > between the user namespaces get reconstructed? Those relationships > >> > will govern what capabilities a process will have in various user > >> > namespaces. > > > > Actually, you get the parent namespace from the process tree by > > tracking the user namespaces of the parent pids. Currently non-root > > users can't bind the namespace, so the only way to keep a new user_ns > > around if you're not root is to keep the process around, so for > > multiply nested user namespaces you can usually build the user_ns > > hierarchy by looking at the process hierarchy. Conversely, if the > > process is reparented to init, chances are that the user_ns is also > > parented to init_user_ns. > > Yes, but "chances are" == this isn't robust. PR_SET_CHILD_SUBREAPER > further complicates things. > > By the way, is that really what happens? Do child user namespaces get > reparented to the grandparent ns if the parent ns disappears (i.e., The parent ns cannot disappear. The child ns pins the creator's cred, which pins the parent user_ns.
Re: Introspecting userns relationships to other namespaces?
On 7 July 2016 at 17:01, James Bottomleywrote: > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: >> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): >> > Hi Serge, >> > >> > On 6 July 2016 at 16:13, Serge E. Hallyn wrote: >> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man >> > > -pages) wrote: >> > > > [Rats! Doing now what I should have down to start with. Looping >> > > > some lists and CRIU and other possibly relevant people into >> > > > this conversation] >> > > > >> > > > Hi Eric, >> > > > >> > > > On 5 July 2016 at 23:47, Eric W. Biederman < >> > > > ebied...@xmission.com> wrote: >> > > > > "Michael Kerrisk (man-pages)" >> > > > > writes: >> > > > > >> > > > > > Hi Eric, >> > > > > > >> > > > > > I have a question. Is there any way currently to discover >> > > > > > which user namespace a particular nonuser namespace is >> > > > > > governed by? Maybe I am missing something, but there does >> > > > > > not seem to be a way to do this. Also, can one discover >> > > > > > which userns is the parent of a given userns? Again, I >> > > > > > can't see a way to do this. >> > > > > > >> > > > > > The point here is introspecting so that a process might >> > > > > > determine what its capabilities are when operating on some >> > > > > > resource governed by a (nonuser) namespace. >> > > > > >> > > > > To the best of my knowledge that there is not an interface to >> > > > > get that information. It would be good to have such an >> > > > > interface for no other reason than the CRIU folks are going >> > > > > to need it at some point. I am a bit surprised they have not >> > > > > complained yet. >> > > >> > > I don't think they need it. They do in fact have what they need. >> > > Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in >> > > init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 >> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping, >> > > does not matter. >> > > >> > > At restart, it doesn't matter which task originally created the >> > > new userns. criu knows T1_1 and T2_1 are in the same userns; it >> > > creates the userns, sets up the mapping, and T1_1 and T2_1 >> > > setns() to it. >> > >> > I'm missing something here. How does the parental relationships >> > between the user namespaces get reconstructed? Those relationships >> > will govern what capabilities a process will have in various user >> > namespaces. > > Actually, you get the parent namespace from the process tree by > tracking the user namespaces of the parent pids. Currently non-root > users can't bind the namespace, so the only way to keep a new user_ns > around if you're not root is to keep the process around, so for > multiply nested user namespaces you can usually build the user_ns > hierarchy by looking at the process hierarchy. Conversely, if the > process is reparented to init, chances are that the user_ns is also > parented to init_user_ns. Yes, but "chances are" == this isn't robust. PR_SET_CHILD_SUBREAPER further complicates things. By the way, is that really what happens? Do child user namespaces get reparented to the grandparent ns if the parent ns disappears (i.e., ceases to have any members and no bind mounts)? I hadn't thought about that scenario before. It may be worth documenting in user_namespaces(7). >> Hm. Probably best-effort based on the process hierarchy. So yeah >> you could probably get a tree into a state that would be wrongly >> recreated. Create a new netns, bind mount it, exit; Have another >> task create a new user_ns, bind mount it, exit; Third task setns()s >> first to the new netns then to the new user_ns. I suspect criu will >> recreate that wrongly. > > This is a bit pathological, and you have to be root to do it: so root > can set up a nesting hierarchy, bind it and destroy the pids but I know > of no current orchestration system which does this. > > Actually, I have to back pedal a bit: the way I currently set up > architecture emulation containers does precisely this: I set up the > namespaces unprivileged with child mount namespaces, but then I ask > root to bind the userns and kill the process that created it so I have > a permanent handle to enter the namespace by, so I suspect that when > our current orchestration systems get more sophisticated, they might > eventually want to do something like this as well. > > In theory, we could get nsfs to show this information as an option > (just add a show_options entry to the superblock ops), but the problem > is that although each namespace has a parent user_ns, there's no way to > get it without digging in the namespace specific structure. Probably > we should restructure to move it into ns_common, then we could display > it (and enforce all namespaces having owning user_ns) but it would be a I'm missing something here. Is it not already the case that all
Re: Introspecting userns relationships to other namespaces?
On 7 July 2016 at 17:01, James Bottomley wrote: > On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: >> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): >> > Hi Serge, >> > >> > On 6 July 2016 at 16:13, Serge E. Hallyn wrote: >> > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man >> > > -pages) wrote: >> > > > [Rats! Doing now what I should have down to start with. Looping >> > > > some lists and CRIU and other possibly relevant people into >> > > > this conversation] >> > > > >> > > > Hi Eric, >> > > > >> > > > On 5 July 2016 at 23:47, Eric W. Biederman < >> > > > ebied...@xmission.com> wrote: >> > > > > "Michael Kerrisk (man-pages)" >> > > > > writes: >> > > > > >> > > > > > Hi Eric, >> > > > > > >> > > > > > I have a question. Is there any way currently to discover >> > > > > > which user namespace a particular nonuser namespace is >> > > > > > governed by? Maybe I am missing something, but there does >> > > > > > not seem to be a way to do this. Also, can one discover >> > > > > > which userns is the parent of a given userns? Again, I >> > > > > > can't see a way to do this. >> > > > > > >> > > > > > The point here is introspecting so that a process might >> > > > > > determine what its capabilities are when operating on some >> > > > > > resource governed by a (nonuser) namespace. >> > > > > >> > > > > To the best of my knowledge that there is not an interface to >> > > > > get that information. It would be good to have such an >> > > > > interface for no other reason than the CRIU folks are going >> > > > > to need it at some point. I am a bit surprised they have not >> > > > > complained yet. >> > > >> > > I don't think they need it. They do in fact have what they need. >> > > Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in >> > > init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 >> > > which setns()d to T1_1's ns. There's some {handwave} uid mapping, >> > > does not matter. >> > > >> > > At restart, it doesn't matter which task originally created the >> > > new userns. criu knows T1_1 and T2_1 are in the same userns; it >> > > creates the userns, sets up the mapping, and T1_1 and T2_1 >> > > setns() to it. >> > >> > I'm missing something here. How does the parental relationships >> > between the user namespaces get reconstructed? Those relationships >> > will govern what capabilities a process will have in various user >> > namespaces. > > Actually, you get the parent namespace from the process tree by > tracking the user namespaces of the parent pids. Currently non-root > users can't bind the namespace, so the only way to keep a new user_ns > around if you're not root is to keep the process around, so for > multiply nested user namespaces you can usually build the user_ns > hierarchy by looking at the process hierarchy. Conversely, if the > process is reparented to init, chances are that the user_ns is also > parented to init_user_ns. Yes, but "chances are" == this isn't robust. PR_SET_CHILD_SUBREAPER further complicates things. By the way, is that really what happens? Do child user namespaces get reparented to the grandparent ns if the parent ns disappears (i.e., ceases to have any members and no bind mounts)? I hadn't thought about that scenario before. It may be worth documenting in user_namespaces(7). >> Hm. Probably best-effort based on the process hierarchy. So yeah >> you could probably get a tree into a state that would be wrongly >> recreated. Create a new netns, bind mount it, exit; Have another >> task create a new user_ns, bind mount it, exit; Third task setns()s >> first to the new netns then to the new user_ns. I suspect criu will >> recreate that wrongly. > > This is a bit pathological, and you have to be root to do it: so root > can set up a nesting hierarchy, bind it and destroy the pids but I know > of no current orchestration system which does this. > > Actually, I have to back pedal a bit: the way I currently set up > architecture emulation containers does precisely this: I set up the > namespaces unprivileged with child mount namespaces, but then I ask > root to bind the userns and kill the process that created it so I have > a permanent handle to enter the namespace by, so I suspect that when > our current orchestration systems get more sophisticated, they might > eventually want to do something like this as well. > > In theory, we could get nsfs to show this information as an option > (just add a show_options entry to the superblock ops), but the problem > is that although each namespace has a parent user_ns, there's no way to > get it without digging in the namespace specific structure. Probably > we should restructure to move it into ns_common, then we could display > it (and enforce all namespaces having owning user_ns) but it would be a I'm missing something here. Is it not already the case that all namespaces have an owning user_ns? Cheers, Michael > reasonably large (but
Re: Introspecting userns relationships to other namespaces?
On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: > Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > > Hi Serge, > > > > On 6 July 2016 at 16:13, Serge E. Hallynwrote: > > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man > > > -pages) wrote: > > > > [Rats! Doing now what I should have down to start with. Looping > > > > some lists and CRIU and other possibly relevant people into > > > > this conversation] > > > > > > > > Hi Eric, > > > > > > > > On 5 July 2016 at 23:47, Eric W. Biederman < > > > > ebied...@xmission.com> wrote: > > > > > "Michael Kerrisk (man-pages)" > > > > > writes: > > > > > > > > > > > Hi Eric, > > > > > > > > > > > > I have a question. Is there any way currently to discover > > > > > > which user namespace a particular nonuser namespace is > > > > > > governed by? Maybe I am missing something, but there does > > > > > > not seem to be a way to do this. Also, can one discover > > > > > > which userns is the parent of a given userns? Again, I > > > > > > can't see a way to do this. > > > > > > > > > > > > The point here is introspecting so that a process might > > > > > > determine what its capabilities are when operating on some > > > > > > resource governed by a (nonuser) namespace. > > > > > > > > > > To the best of my knowledge that there is not an interface to > > > > > get that information. It would be good to have such an > > > > > interface for no other reason than the CRIU folks are going > > > > > to need it at some point. I am a bit surprised they have not > > > > > complained yet. > > > > > > I don't think they need it. They do in fact have what they need. > > > Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in > > > init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 > > > which setns()d to T1_1's ns. There's some {handwave} uid mapping, > > > does not matter. > > > > > > At restart, it doesn't matter which task originally created the > > > new userns. criu knows T1_1 and T2_1 are in the same userns; it > > > creates the userns, sets up the mapping, and T1_1 and T2_1 > > > setns() to it. > > > > I'm missing something here. How does the parental relationships > > between the user namespaces get reconstructed? Those relationships > > will govern what capabilities a process will have in various user > > namespaces. Actually, you get the parent namespace from the process tree by tracking the user namespaces of the parent pids. Currently non-root users can't bind the namespace, so the only way to keep a new user_ns around if you're not root is to keep the process around, so for multiply nested user namespaces you can usually build the user_ns hierarchy by looking at the process hierarchy. Conversely, if the process is reparented to init, chances are that the user_ns is also parented to init_user_ns. > Hm. Probably best-effort based on the process hierarchy. So yeah > you could probably get a tree into a state that would be wrongly > recreated. Create a new netns, bind mount it, exit; Have another > task create a new user_ns, bind mount it, exit; Third task setns()s > first to the new netns then to the new user_ns. I suspect criu will > recreate that wrongly. This is a bit pathological, and you have to be root to do it: so root can set up a nesting hierarchy, bind it and destroy the pids but I know of no current orchestration system which does this. Actually, I have to back pedal a bit: the way I currently set up architecture emulation containers does precisely this: I set up the namespaces unprivileged with child mount namespaces, but then I ask root to bind the userns and kill the process that created it so I have a permanent handle to enter the namespace by, so I suspect that when our current orchestration systems get more sophisticated, they might eventually want to do something like this as well. In theory, we could get nsfs to show this information as an option (just add a show_options entry to the superblock ops), but the problem is that although each namespace has a parent user_ns, there's no way to get it without digging in the namespace specific structure. Probably we should restructure to move it into ns_common, then we could display it (and enforce all namespaces having owning user_ns) but it would be a reasonably large (but mechanical) change. James
Re: Introspecting userns relationships to other namespaces?
On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote: > Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > > Hi Serge, > > > > On 6 July 2016 at 16:13, Serge E. Hallyn wrote: > > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man > > > -pages) wrote: > > > > [Rats! Doing now what I should have down to start with. Looping > > > > some lists and CRIU and other possibly relevant people into > > > > this conversation] > > > > > > > > Hi Eric, > > > > > > > > On 5 July 2016 at 23:47, Eric W. Biederman < > > > > ebied...@xmission.com> wrote: > > > > > "Michael Kerrisk (man-pages)" > > > > > writes: > > > > > > > > > > > Hi Eric, > > > > > > > > > > > > I have a question. Is there any way currently to discover > > > > > > which user namespace a particular nonuser namespace is > > > > > > governed by? Maybe I am missing something, but there does > > > > > > not seem to be a way to do this. Also, can one discover > > > > > > which userns is the parent of a given userns? Again, I > > > > > > can't see a way to do this. > > > > > > > > > > > > The point here is introspecting so that a process might > > > > > > determine what its capabilities are when operating on some > > > > > > resource governed by a (nonuser) namespace. > > > > > > > > > > To the best of my knowledge that there is not an interface to > > > > > get that information. It would be good to have such an > > > > > interface for no other reason than the CRIU folks are going > > > > > to need it at some point. I am a bit surprised they have not > > > > > complained yet. > > > > > > I don't think they need it. They do in fact have what they need. > > > Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in > > > init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 > > > which setns()d to T1_1's ns. There's some {handwave} uid mapping, > > > does not matter. > > > > > > At restart, it doesn't matter which task originally created the > > > new userns. criu knows T1_1 and T2_1 are in the same userns; it > > > creates the userns, sets up the mapping, and T1_1 and T2_1 > > > setns() to it. > > > > I'm missing something here. How does the parental relationships > > between the user namespaces get reconstructed? Those relationships > > will govern what capabilities a process will have in various user > > namespaces. Actually, you get the parent namespace from the process tree by tracking the user namespaces of the parent pids. Currently non-root users can't bind the namespace, so the only way to keep a new user_ns around if you're not root is to keep the process around, so for multiply nested user namespaces you can usually build the user_ns hierarchy by looking at the process hierarchy. Conversely, if the process is reparented to init, chances are that the user_ns is also parented to init_user_ns. > Hm. Probably best-effort based on the process hierarchy. So yeah > you could probably get a tree into a state that would be wrongly > recreated. Create a new netns, bind mount it, exit; Have another > task create a new user_ns, bind mount it, exit; Third task setns()s > first to the new netns then to the new user_ns. I suspect criu will > recreate that wrongly. This is a bit pathological, and you have to be root to do it: so root can set up a nesting hierarchy, bind it and destroy the pids but I know of no current orchestration system which does this. Actually, I have to back pedal a bit: the way I currently set up architecture emulation containers does precisely this: I set up the namespaces unprivileged with child mount namespaces, but then I ask root to bind the userns and kill the process that created it so I have a permanent handle to enter the namespace by, so I suspect that when our current orchestration systems get more sophisticated, they might eventually want to do something like this as well. In theory, we could get nsfs to show this information as an option (just add a show_options entry to the superblock ops), but the problem is that although each namespace has a parent user_ns, there's no way to get it without digging in the namespace specific structure. Probably we should restructure to move it into ns_common, then we could display it (and enforce all namespaces having owning user_ns) but it would be a reasonably large (but mechanical) change. James
Re: Introspecting userns relationships to other namespaces?
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > Hi Serge, > > On 6 July 2016 at 16:13, Serge E. Hallynwrote: > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: > >> [Rats! Doing now what I should have down to start with. Looping some > >> lists and CRIU and other possibly relevant people into this > >> conversation] > >> > >> Hi Eric, > >> > >> On 5 July 2016 at 23:47, Eric W. Biederman wrote: > >> > "Michael Kerrisk (man-pages)" writes: > >> > > >> >> Hi Eric, > >> >> > >> >> I have a question. Is there any way currently to discover which > >> >> user namespace a particular nonuser namespace is governed by? > >> >> Maybe I am missing something, but there does not seem to be a > >> >> way to do this. Also, can one discover which userns is the > >> >> parent of a given userns? Again, I can't see a way to do this. > >> >> > >> >> The point here is introspecting so that a process might determine > >> >> what its capabilities are when operating on some resource governed > >> >> by a (nonuser) namespace. > >> > > >> > To the best of my knowledge that there is not an interface to get that > >> > information. It would be good to have such an interface for no other > >> > reason than the CRIU folks are going to need it at some point. I am a > >> > bit surprised they have not complained yet. > > > > I don't think they need it. They do in fact have what they need. Assume > > you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 > > spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. > > There's some {handwave} uid mapping, does not matter. > > > > At restart, it doesn't matter which task originally created the new userns. > > criu knows T1_1 and T2_1 are in the same userns; it creates the userns, > > sets > > up the mapping, and T1_1 and T2_1 setns() to it. > > I'm missing something here. How does the parental relationships > between the user namespaces get reconstructed? Those relationships > will govern what capabilities a process will have in various user > namespaces. Hm. Probably best-effort based on the process hierarchy. So yeah you could probably get a tree into a state that would be wrongly recreated. Create a new netns, bind mount it, exit; Have another task create a new user_ns, bind mount it, exit; Third task setns()s first to the new netns then to the new user_ns. I suspect criu will recreate that wrongly.
Re: Introspecting userns relationships to other namespaces?
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com): > Hi Serge, > > On 6 July 2016 at 16:13, Serge E. Hallyn wrote: > > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: > >> [Rats! Doing now what I should have down to start with. Looping some > >> lists and CRIU and other possibly relevant people into this > >> conversation] > >> > >> Hi Eric, > >> > >> On 5 July 2016 at 23:47, Eric W. Biederman wrote: > >> > "Michael Kerrisk (man-pages)" writes: > >> > > >> >> Hi Eric, > >> >> > >> >> I have a question. Is there any way currently to discover which > >> >> user namespace a particular nonuser namespace is governed by? > >> >> Maybe I am missing something, but there does not seem to be a > >> >> way to do this. Also, can one discover which userns is the > >> >> parent of a given userns? Again, I can't see a way to do this. > >> >> > >> >> The point here is introspecting so that a process might determine > >> >> what its capabilities are when operating on some resource governed > >> >> by a (nonuser) namespace. > >> > > >> > To the best of my knowledge that there is not an interface to get that > >> > information. It would be good to have such an interface for no other > >> > reason than the CRIU folks are going to need it at some point. I am a > >> > bit surprised they have not complained yet. > > > > I don't think they need it. They do in fact have what they need. Assume > > you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 > > spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. > > There's some {handwave} uid mapping, does not matter. > > > > At restart, it doesn't matter which task originally created the new userns. > > criu knows T1_1 and T2_1 are in the same userns; it creates the userns, > > sets > > up the mapping, and T1_1 and T2_1 setns() to it. > > I'm missing something here. How does the parental relationships > between the user namespaces get reconstructed? Those relationships > will govern what capabilities a process will have in various user > namespaces. Hm. Probably best-effort based on the process hierarchy. So yeah you could probably get a tree into a state that would be wrongly recreated. Create a new netns, bind mount it, exit; Have another task create a new user_ns, bind mount it, exit; Third task setns()s first to the new netns then to the new user_ns. I suspect criu will recreate that wrongly.
Re: Introspecting userns relationships to other namespaces?
Hi Serge, On 6 July 2016 at 16:13, Serge E. Hallynwrote: > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: >> [Rats! Doing now what I should have down to start with. Looping some >> lists and CRIU and other possibly relevant people into this >> conversation] >> >> Hi Eric, >> >> On 5 July 2016 at 23:47, Eric W. Biederman wrote: >> > "Michael Kerrisk (man-pages)" writes: >> > >> >> Hi Eric, >> >> >> >> I have a question. Is there any way currently to discover which >> >> user namespace a particular nonuser namespace is governed by? >> >> Maybe I am missing something, but there does not seem to be a >> >> way to do this. Also, can one discover which userns is the >> >> parent of a given userns? Again, I can't see a way to do this. >> >> >> >> The point here is introspecting so that a process might determine >> >> what its capabilities are when operating on some resource governed >> >> by a (nonuser) namespace. >> > >> > To the best of my knowledge that there is not an interface to get that >> > information. It would be good to have such an interface for no other >> > reason than the CRIU folks are going to need it at some point. I am a >> > bit surprised they have not complained yet. > > I don't think they need it. They do in fact have what they need. Assume > you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 > spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. > There's some {handwave} uid mapping, does not matter. > > At restart, it doesn't matter which task originally created the new userns. > criu knows T1_1 and T2_1 are in the same userns; it creates the userns, sets > up the mapping, and T1_1 and T2_1 setns() to it. I'm missing something here. How does the parental relationships between the user namespaces get reconstructed? Those relationships will govern what capabilities a process will have in various user namespaces. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Introspecting userns relationships to other namespaces?
Hi Serge, On 6 July 2016 at 16:13, Serge E. Hallyn wrote: > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: >> [Rats! Doing now what I should have down to start with. Looping some >> lists and CRIU and other possibly relevant people into this >> conversation] >> >> Hi Eric, >> >> On 5 July 2016 at 23:47, Eric W. Biederman wrote: >> > "Michael Kerrisk (man-pages)" writes: >> > >> >> Hi Eric, >> >> >> >> I have a question. Is there any way currently to discover which >> >> user namespace a particular nonuser namespace is governed by? >> >> Maybe I am missing something, but there does not seem to be a >> >> way to do this. Also, can one discover which userns is the >> >> parent of a given userns? Again, I can't see a way to do this. >> >> >> >> The point here is introspecting so that a process might determine >> >> what its capabilities are when operating on some resource governed >> >> by a (nonuser) namespace. >> > >> > To the best of my knowledge that there is not an interface to get that >> > information. It would be good to have such an interface for no other >> > reason than the CRIU folks are going to need it at some point. I am a >> > bit surprised they have not complained yet. > > I don't think they need it. They do in fact have what they need. Assume > you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 > spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. > There's some {handwave} uid mapping, does not matter. > > At restart, it doesn't matter which task originally created the new userns. > criu knows T1_1 and T2_1 are in the same userns; it creates the userns, sets > up the mapping, and T1_1 and T2_1 setns() to it. I'm missing something here. How does the parental relationships between the user namespaces get reconstructed? Those relationships will govern what capabilities a process will have in various user namespaces. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Introspecting userns relationships to other namespaces?
"Serge E. Hallyn"writes: > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: >> [Rats! Doing now what I should have down to start with. Looping some >> lists and CRIU and other possibly relevant people into this >> conversation] >> >> Hi Eric, >> >> On 5 July 2016 at 23:47, Eric W. Biederman wrote: >> > "Michael Kerrisk (man-pages)" writes: >> > >> >> Hi Eric, >> >> >> >> I have a question. Is there any way currently to discover which >> >> user namespace a particular nonuser namespace is governed by? >> >> Maybe I am missing something, but there does not seem to be a >> >> way to do this. Also, can one discover which userns is the >> >> parent of a given userns? Again, I can't see a way to do this. >> >> >> >> The point here is introspecting so that a process might determine >> >> what its capabilities are when operating on some resource governed >> >> by a (nonuser) namespace. >> > >> > To the best of my knowledge that there is not an interface to get that >> > information. It would be good to have such an interface for no other >> > reason than the CRIU folks are going to need it at some point. I am a >> > bit surprised they have not complained yet. > > I don't think they need it. They do in fact have what they need. Assume > you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 > spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. > There's some {handwave} uid mapping, does not matter. > > At restart, it doesn't matter which task originally created the new userns. > criu knows T1_1 and T2_1 are in the same userns; it creates the userns, sets > up the mapping, and T1_1 and T2_1 setns() to it. Given that the simple cases are so easy it probably doesn't matter in that sense. However we now have the case where user namespaces own pid namespaces, and uts namespaces, and network namespaces, and ipc namespaces, and filesystems. Throw in some mount propagation and use of setns and things could get confusing. It is something that will need to be figured out if CRIU is going to properly checkpoint containers containing containers containing containers containing containers. Did I mention I like recursion? >> > That said in a normal use scenario I don't think that information is >> > needed. >> > >> > Do you have a particular use case besides checkpoint/restart where this >> > is useful? That might help in coming up with a good userspace interface >> > for this information. >> >> So, I spend a moderate amount of time working with people to introduce >> them to the namespaces infrastructure, and one topic that comes up now >> and this introspection/visualization tools. For example, >> nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields >> in /proc/PID--it's possible to (and someone I was working with did) >> write tools that introspect the PID namespace hierarchy to show all of >> process's and their PIDs in the various namespace instance. It's a >> natural enough thing to want to do, when confronted with the >> complexity of the namespaces. >> >> Someone else then asked me a question that led me to wonder about >> generally introspecting on the parental relationships between user >> namespaces and the association of other namespaces types with user >> namespaces. One use would be visualization, in order to understand the >> running system. Another would be to answer the question I already >> mentioned: what capability does process X have to perform operations >> on a resource governed by namespace Y? > > I agree they'll probably want it, but if we want for a real need and > use case we can do a better job of providing what's needed. That two which is why I mentioned CRIU. But yeah it will probably take a little while to get there. Eric
Re: Introspecting userns relationships to other namespaces?
"Serge E. Hallyn" writes: > On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: >> [Rats! Doing now what I should have down to start with. Looping some >> lists and CRIU and other possibly relevant people into this >> conversation] >> >> Hi Eric, >> >> On 5 July 2016 at 23:47, Eric W. Biederman wrote: >> > "Michael Kerrisk (man-pages)" writes: >> > >> >> Hi Eric, >> >> >> >> I have a question. Is there any way currently to discover which >> >> user namespace a particular nonuser namespace is governed by? >> >> Maybe I am missing something, but there does not seem to be a >> >> way to do this. Also, can one discover which userns is the >> >> parent of a given userns? Again, I can't see a way to do this. >> >> >> >> The point here is introspecting so that a process might determine >> >> what its capabilities are when operating on some resource governed >> >> by a (nonuser) namespace. >> > >> > To the best of my knowledge that there is not an interface to get that >> > information. It would be good to have such an interface for no other >> > reason than the CRIU folks are going to need it at some point. I am a >> > bit surprised they have not complained yet. > > I don't think they need it. They do in fact have what they need. Assume > you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 > spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. > There's some {handwave} uid mapping, does not matter. > > At restart, it doesn't matter which task originally created the new userns. > criu knows T1_1 and T2_1 are in the same userns; it creates the userns, sets > up the mapping, and T1_1 and T2_1 setns() to it. Given that the simple cases are so easy it probably doesn't matter in that sense. However we now have the case where user namespaces own pid namespaces, and uts namespaces, and network namespaces, and ipc namespaces, and filesystems. Throw in some mount propagation and use of setns and things could get confusing. It is something that will need to be figured out if CRIU is going to properly checkpoint containers containing containers containing containers containing containers. Did I mention I like recursion? >> > That said in a normal use scenario I don't think that information is >> > needed. >> > >> > Do you have a particular use case besides checkpoint/restart where this >> > is useful? That might help in coming up with a good userspace interface >> > for this information. >> >> So, I spend a moderate amount of time working with people to introduce >> them to the namespaces infrastructure, and one topic that comes up now >> and this introspection/visualization tools. For example, >> nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields >> in /proc/PID--it's possible to (and someone I was working with did) >> write tools that introspect the PID namespace hierarchy to show all of >> process's and their PIDs in the various namespace instance. It's a >> natural enough thing to want to do, when confronted with the >> complexity of the namespaces. >> >> Someone else then asked me a question that led me to wonder about >> generally introspecting on the parental relationships between user >> namespaces and the association of other namespaces types with user >> namespaces. One use would be visualization, in order to understand the >> running system. Another would be to answer the question I already >> mentioned: what capability does process X have to perform operations >> on a resource governed by namespace Y? > > I agree they'll probably want it, but if we want for a real need and > use case we can do a better job of providing what's needed. That two which is why I mentioned CRIU. But yeah it will probably take a little while to get there. Eric
Re: Introspecting userns relationships to other namespaces?
On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: > [Rats! Doing now what I should have down to start with. Looping some > lists and CRIU and other possibly relevant people into this > conversation] > > Hi Eric, > > On 5 July 2016 at 23:47, Eric W. Biedermanwrote: > > "Michael Kerrisk (man-pages)" writes: > > > >> Hi Eric, > >> > >> I have a question. Is there any way currently to discover which > >> user namespace a particular nonuser namespace is governed by? > >> Maybe I am missing something, but there does not seem to be a > >> way to do this. Also, can one discover which userns is the > >> parent of a given userns? Again, I can't see a way to do this. > >> > >> The point here is introspecting so that a process might determine > >> what its capabilities are when operating on some resource governed > >> by a (nonuser) namespace. > > > > To the best of my knowledge that there is not an interface to get that > > information. It would be good to have such an interface for no other > > reason than the CRIU folks are going to need it at some point. I am a > > bit surprised they have not complained yet. I don't think they need it. They do in fact have what they need. Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. There's some {handwave} uid mapping, does not matter. At restart, it doesn't matter which task originally created the new userns. criu knows T1_1 and T2_1 are in the same userns; it creates the userns, sets up the mapping, and T1_1 and T2_1 setns() to it. > > That said in a normal use scenario I don't think that information is > > needed. > > > > Do you have a particular use case besides checkpoint/restart where this > > is useful? That might help in coming up with a good userspace interface > > for this information. > > So, I spend a moderate amount of time working with people to introduce > them to the namespaces infrastructure, and one topic that comes up now > and this introspection/visualization tools. For example, > nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields > in /proc/PID--it's possible to (and someone I was working with did) > write tools that introspect the PID namespace hierarchy to show all of > process's and their PIDs in the various namespace instance. It's a > natural enough thing to want to do, when confronted with the > complexity of the namespaces. > > Someone else then asked me a question that led me to wonder about > generally introspecting on the parental relationships between user > namespaces and the association of other namespaces types with user > namespaces. One use would be visualization, in order to understand the > running system. Another would be to answer the question I already > mentioned: what capability does process X have to perform operations > on a resource governed by namespace Y? I agree they'll probably want it, but if we want for a real need and use case we can do a better job of providing what's needed. -serge
Re: Introspecting userns relationships to other namespaces?
On Wed, Jul 06, 2016 at 10:41:48AM +0200, Michael Kerrisk (man-pages) wrote: > [Rats! Doing now what I should have down to start with. Looping some > lists and CRIU and other possibly relevant people into this > conversation] > > Hi Eric, > > On 5 July 2016 at 23:47, Eric W. Biederman wrote: > > "Michael Kerrisk (man-pages)" writes: > > > >> Hi Eric, > >> > >> I have a question. Is there any way currently to discover which > >> user namespace a particular nonuser namespace is governed by? > >> Maybe I am missing something, but there does not seem to be a > >> way to do this. Also, can one discover which userns is the > >> parent of a given userns? Again, I can't see a way to do this. > >> > >> The point here is introspecting so that a process might determine > >> what its capabilities are when operating on some resource governed > >> by a (nonuser) namespace. > > > > To the best of my knowledge that there is not an interface to get that > > information. It would be good to have such an interface for no other > > reason than the CRIU folks are going to need it at some point. I am a > > bit surprised they have not complained yet. I don't think they need it. They do in fact have what they need. Assume you have tasks T1, T2, T1_1 and T2_1; T1 and T2 are in init_user_ns; T1 spawned T1_1 in a new userns; T2 spawned T2_1 which setns()d to T1_1's ns. There's some {handwave} uid mapping, does not matter. At restart, it doesn't matter which task originally created the new userns. criu knows T1_1 and T2_1 are in the same userns; it creates the userns, sets up the mapping, and T1_1 and T2_1 setns() to it. > > That said in a normal use scenario I don't think that information is > > needed. > > > > Do you have a particular use case besides checkpoint/restart where this > > is useful? That might help in coming up with a good userspace interface > > for this information. > > So, I spend a moderate amount of time working with people to introduce > them to the namespaces infrastructure, and one topic that comes up now > and this introspection/visualization tools. For example, > nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields > in /proc/PID--it's possible to (and someone I was working with did) > write tools that introspect the PID namespace hierarchy to show all of > process's and their PIDs in the various namespace instance. It's a > natural enough thing to want to do, when confronted with the > complexity of the namespaces. > > Someone else then asked me a question that led me to wonder about > generally introspecting on the parental relationships between user > namespaces and the association of other namespaces types with user > namespaces. One use would be visualization, in order to understand the > running system. Another would be to answer the question I already > mentioned: what capability does process X have to perform operations > on a resource governed by namespace Y? I agree they'll probably want it, but if we want for a real need and use case we can do a better job of providing what's needed. -serge
Re: Introspecting userns relationships to other namespaces?
[Rats! Doing now what I should have down to start with. Looping some lists and CRIU and other possibly relevant people into this conversation] Hi Eric, On 5 July 2016 at 23:47, Eric W. Biedermanwrote: > "Michael Kerrisk (man-pages)" writes: > >> Hi Eric, >> >> I have a question. Is there any way currently to discover which >> user namespace a particular nonuser namespace is governed by? >> Maybe I am missing something, but there does not seem to be a >> way to do this. Also, can one discover which userns is the >> parent of a given userns? Again, I can't see a way to do this. >> >> The point here is introspecting so that a process might determine >> what its capabilities are when operating on some resource governed >> by a (nonuser) namespace. > > To the best of my knowledge that there is not an interface to get that > information. It would be good to have such an interface for no other > reason than the CRIU folks are going to need it at some point. I am a > bit surprised they have not complained yet. > > That said in a normal use scenario I don't think that information is > needed. > > Do you have a particular use case besides checkpoint/restart where this > is useful? That might help in coming up with a good userspace interface > for this information. So, I spend a moderate amount of time working with people to introduce them to the namespaces infrastructure, and one topic that comes up now and this introspection/visualization tools. For example, nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields in /proc/PID--it's possible to (and someone I was working with did) write tools that introspect the PID namespace hierarchy to show all of process's and their PIDs in the various namespace instance. It's a natural enough thing to want to do, when confronted with the complexity of the namespaces. Someone else then asked me a question that led me to wonder about generally introspecting on the parental relationships between user namespaces and the association of other namespaces types with user namespaces. One use would be visualization, in order to understand the running system. Another would be to answer the question I already mentioned: what capability does process X have to perform operations on a resource governed by namespace Y? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Introspecting userns relationships to other namespaces?
[Rats! Doing now what I should have down to start with. Looping some lists and CRIU and other possibly relevant people into this conversation] Hi Eric, On 5 July 2016 at 23:47, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > >> Hi Eric, >> >> I have a question. Is there any way currently to discover which >> user namespace a particular nonuser namespace is governed by? >> Maybe I am missing something, but there does not seem to be a >> way to do this. Also, can one discover which userns is the >> parent of a given userns? Again, I can't see a way to do this. >> >> The point here is introspecting so that a process might determine >> what its capabilities are when operating on some resource governed >> by a (nonuser) namespace. > > To the best of my knowledge that there is not an interface to get that > information. It would be good to have such an interface for no other > reason than the CRIU folks are going to need it at some point. I am a > bit surprised they have not complained yet. > > That said in a normal use scenario I don't think that information is > needed. > > Do you have a particular use case besides checkpoint/restart where this > is useful? That might help in coming up with a good userspace interface > for this information. So, I spend a moderate amount of time working with people to introduce them to the namespaces infrastructure, and one topic that comes up now and this introspection/visualization tools. For example, nowadays--thanks to the (bizarrely misnamed) NStgid and NSpid fields in /proc/PID--it's possible to (and someone I was working with did) write tools that introspect the PID namespace hierarchy to show all of process's and their PIDs in the various namespace instance. It's a natural enough thing to want to do, when confronted with the complexity of the namespaces. Someone else then asked me a question that led me to wonder about generally introspecting on the parental relationships between user namespaces and the association of other namespaces types with user namespaces. One use would be visualization, in order to understand the running system. Another would be to answer the question I already mentioned: what capability does process X have to perform operations on a resource governed by namespace Y? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/