Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Pavel Emelyanov wrote: >> Having access to the same IPCs in different pid namespaces won't work. >> Having access to the same filesystem in different IPC namespaces won't work. >> Having access to the same UID namespace in different VFS namespaces won't >> work. >> Having access to the same namespace in different >> namespace >> wont' work. >> [...] > > > Then explicitly prevent the cases which cannot work in the clone() > calls. Yes, giving people rope to shoot themselves is a Unix tradition > but it's so unnecessary in this case and will only cause support > problems for innocent people. :) > I bet the result will be that if you have a separate PID namespace you > need to enforce every other namespace as well. There are simply too > many dependencies. I think, that Ted's proposal (about the "namespaces compatibility matrix") is better. I'd prefer knowing of what can stop working in case I do something rather that forcedly having my hands off this. Thanks, Pavel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: Having access to the same IPCs in different pid namespaces won't work. Having access to the same filesystem in different IPC namespaces won't work. Having access to the same UID namespace in different VFS namespaces won't work. Having access to the same any namespace in different many others namespace wont' work. [...] Then explicitly prevent the cases which cannot work in the clone() calls. Yes, giving people rope to shoot themselves is a Unix tradition but it's so unnecessary in this case and will only cause support problems for innocent people. :) I bet the result will be that if you have a separate PID namespace you need to enforce every other namespace as well. There are simply too many dependencies. I think, that Ted's proposal (about the namespaces compatibility matrix) is better. I'd prefer knowing of what can stop working in case I do something rather that forcedly having my hands off this. Thanks, Pavel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
[EMAIL PROTECTED] writes: > Pavel Emelianov [EMAIL PROTECTED] wrote: > | Ulrich Drepper wrote: > | > -BEGIN PGP SIGNED MESSAGE- > | > Hash: SHA1 > | > > | > Pavel Emelyanov wrote: > | >>> Isn't it this? > | >>> > | >>> http://lkml.org/lkml/2007/11/1/141 > | >> That was the initial problem, and I already answered to Ingo about > | >> it > | > > | > No, look at my old mail which Ingo referenced in that posting. > | > | You pointed only one problem that is not a variation of "how do > | we handle the case when we pass our pid outside the namespace". > | > | This problem with signals is now being resolved at IBM by Sukadev > | and Serge (I put them in Cc), so this is about to be fixed by the > | time 2.6.24 releases (I hope). > > Yes. We (Oleg, Eric included in Cc) have a patchset to address signals > issues in child pid namespaces. It is being discussed on Containers list: > > https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html > > We will post the patchset to LKML soon. Yes. Getting all of the cross namespace cases working that we can is a goal. Currently I don't know if we can do better with the futexes that have pids in the user/kernel ABI. Plain futexes should be fine. Implementation wise si_pid is a bit of a pain but we should have that one sorted out shortly. It is a well understood and we just need to get the code right. The pids in the sysvipc space should also be fairly simple to handle just do a classic struct pid conversion. And convert from struct pid to a pid_t right at the user/kernel interface. Unless I missed something we already properly handle giving people usable pids from the tty layer. Getting pids working properly for unix domain socket credential when we cross pid namespaces is another case that needs a struct pid conversion to get things working, but the kernel should be able to do the right thing at that point. In summary when pids are stored inside the kernel we have all of the needed infrastructure with struct pid to handle doing the right thing processes communicate between pid namespaces. Right now we just need to go through every place in the kernel make certain we haven't over looked something we can handle. And there are a lot of places where the kernel uses pids Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
[EMAIL PROTECTED] writes: Pavel Emelianov [EMAIL PROTECTED] wrote: | Ulrich Drepper wrote: | -BEGIN PGP SIGNED MESSAGE- | Hash: SHA1 | | Pavel Emelyanov wrote: | Isn't it this? | | http://lkml.org/lkml/2007/11/1/141 | That was the initial problem, and I already answered to Ingo about | it | | No, look at my old mail which Ingo referenced in that posting. | | You pointed only one problem that is not a variation of how do | we handle the case when we pass our pid outside the namespace. | | This problem with signals is now being resolved at IBM by Sukadev | and Serge (I put them in Cc), so this is about to be fixed by the | time 2.6.24 releases (I hope). Yes. We (Oleg, Eric included in Cc) have a patchset to address signals issues in child pid namespaces. It is being discussed on Containers list: https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html We will post the patchset to LKML soon. Yes. Getting all of the cross namespace cases working that we can is a goal. Currently I don't know if we can do better with the futexes that have pids in the user/kernel ABI. Plain futexes should be fine. Implementation wise si_pid is a bit of a pain but we should have that one sorted out shortly. It is a well understood and we just need to get the code right. The pids in the sysvipc space should also be fairly simple to handle just do a classic struct pid conversion. And convert from struct pid to a pid_t right at the user/kernel interface. Unless I missed something we already properly handle giving people usable pids from the tty layer. Getting pids working properly for unix domain socket credential when we cross pid namespaces is another case that needs a struct pid conversion to get things working, but the kernel should be able to do the right thing at that point. In summary when pids are stored inside the kernel we have all of the needed infrastructure with struct pid to handle doing the right thing processes communicate between pid namespaces. Right now we just need to go through every place in the kernel make certain we haven't over looked something we can handle. And there are a lot of places where the kernel uses pids Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Sat, 3 Nov 2007, Arjan van de Ven wrote: On Sat, 3 Nov 2007 15:40:48 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> wrote: I don't understand how you can call this a "PID namespace design bug", when it clearly has nothing what-so-ever to do with pid namespaces, and everything to do with the *futexes* that blithely assume that pid's are unique and that made it part of the user-visible interface. OF COURSE any pid namespace design will always break such assumptions, but that's not because of any PID namespace bugs. It's what the whole *point* of PID namespaces are. If you use pid's (instead of some opaque cookies), you will not be able to use such things across pid-separation. well... kind of. THere are 2 things around pid namespaces: which pids you can see/touch (in proc or signals or otherwise), and the non-uniqueness. For containers you clearly want the first part... but... is there a strong reason to not just *not* create duplicate pids even across namespaces? there's no rule in posix or anything similar to fd's afaik concerning which pids we can hand out... so we could just make then unique globally but just with limited visibility two problems that I can think of 1. the container people would like to eventually have the ability to migrate containers from one system to another (or to suspend a container) in this sort of case trying to fit the allocated PIDs from the container into a running system is a problem if PIDs are not allowed to overlap. 2. it seems to me that there is porobably a latent security issue in having a global PID namespace with just limited visability. the types of bugs that may let you affect a process seem easier to make if the only protection is visability rather then complete seperation. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Sat, 3 Nov 2007 15:40:48 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> wrote: > I don't understand how you can call this a "PID namespace design > bug", when it clearly has nothing what-so-ever to do with pid > namespaces, and everything to do with the *futexes* that blithely > assume that pid's are unique and that made it part of the > user-visible interface. > > OF COURSE any pid namespace design will always break such > assumptions, but that's not because of any PID namespace bugs. It's > what the whole *point* of PID namespaces are. If you use pid's > (instead of some opaque cookies), you will not be able to use such > things across pid-separation. well... kind of. THere are 2 things around pid namespaces: which pids you can see/touch (in proc or signals or otherwise), and the non-uniqueness. For containers you clearly want the first part... but... is there a strong reason to not just *not* create duplicate pids even across namespaces? there's no rule in posix or anything similar to fd's afaik concerning which pids we can hand out... so we could just make then unique globally but just with limited visibility - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Sat, 3 Nov 2007, Ingo Molnar wrote: > > - one problem is that this condition is 'invisible'. If two namespaces > happen to access the same robust futex (say a yum update from two > PID namespaces sharing the same read-mostly filesystem) there's silent > breakage and data corruption due to PID overlap. .. and this is in *no* way different from thousands of applications that write their pid to lock-files, and others decide that it's "stale" because using "kill(pid, 0)" returns that the pid doesn't exist any more. The solution? You can't do that kind of locking over NFS, or across pid namespaces. Nobody blames NFS or pid namespaces for it. > - so via this we isolate an important category of syscalls from > cross-namespace use perhaps forever. So? That's inherent to how those stupid stable mutexes work. I don't understand how you can call this a "PID namespace design bug", when it clearly has nothing what-so-ever to do with pid namespaces, and everything to do with the *futexes* that blithely assume that pid's are unique and that made it part of the user-visible interface. OF COURSE any pid namespace design will always break such assumptions, but that's not because of any PID namespace bugs. It's what the whole *point* of PID namespaces are. If you use pid's (instead of some opaque cookies), you will not be able to use such things across pid-separation. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Fri, 2 Nov 2007, Dave Hansen wrote: > > > > There are certainly more of these, but here is one In the futex > > userspace address, we install the current pid's vnr into a userspace > > address. > > Now, realistically, why not just say "you can't use these things > across namespaces"? Does anybody really care? After all, somebody who > screws this up only screws himself, not anybody else. i see two main categories of problems: - one problem is that this condition is 'invisible'. If two namespaces happen to access the same robust futex (say a yum update from two PID namespaces sharing the same read-mostly filesystem) there's silent breakage and data corruption due to PID overlap. The other namespaces have no such problems. I think the "dont do that" answer is lame because most apps _will_ work across PID namespaces because things like fcntl based locking does work. And there's no valid technical excuse why futexes shouldnt work: it's all controlled by the same native kernel, there's no untrusted network separating the nodes, etc. - so via this we isolate an important category of syscalls from cross-namespace use perhaps forever. Pick just about any other kernel resource and they can be shared between namespaces. But not futexes - which happen to be the most scalable locking primitive and people will almost certainly want to use them across namespaces. A completely new breed of futexes has to be introduced and trickled through userspace and all the architectures to make it work again across namespaces. Who will do that work? Generally the people who introduce a new concept are the ones who should do that. But in this case they are apparently not interested in making it generic enough (they are concentrated on their 'isolate it all' aspect) so nobody else will do and we are stuck with an incomplete concept. The answer of user-space/apps is predictable: they'll gravitate towards the path of least resistance, and that will be "dont use futexes". PID namespaces basically single out an important API category and use the natural pressure of the other 300 syscalls and tens of thousands of apps against this category. Linux is basically used against itself. The counter-force is relatively weak and there's no solution available _at all_ presently so it's not even the fight of patches against each other, it's the sheer lack of a feature which has an obvious end-result. We've already got way too many incomplete concepts and APIs in the kernel. Maybe i'm over-worrying, but i fear we end up like with capabilities or sendfile - code merged too soon and never completed for many years - perhaps never completed at all. VMS and WNT did those things a bit better i think - their API frameworks were/are pervasive and complete, even in the corner cases. Whether it's the right approach to force reasonable perfection of frameworks like this from the get go is another question - but in practice even for relatively popular new APIs like epoll we see a way too slow movement towards the 'completion of the API', and that hinders adoption of new APIs very much. (With splice being a notable exception - there the central concept was so strong that it quickly pushed itself to total completion - combined with a capable maintainer of the API.) But it's not that easy for futexes and we put another roadblock in the path of futexes. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Pavel Emelianov [EMAIL PROTECTED] wrote: | Ulrich Drepper wrote: | > -BEGIN PGP SIGNED MESSAGE- | > Hash: SHA1 | > | > Pavel Emelyanov wrote: | >>> Isn't it this? | >>> | >>> http://lkml.org/lkml/2007/11/1/141 | >> That was the initial problem, and I already answered to Ingo about | >> it | > | > No, look at my old mail which Ingo referenced in that posting. | | You pointed only one problem that is not a variation of "how do | we handle the case when we pass our pid outside the namespace". | | This problem with signals is now being resolved at IBM by Sukadev | and Serge (I put them in Cc), so this is about to be fixed by the | time 2.6.24 releases (I hope). Yes. We (Oleg, Eric included in Cc) have a patchset to address signals issues in child pid namespaces. It is being discussed on Containers list: https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html We will post the patchset to LKML soon. | | As far as the "passing the pid outside the namespace" is concerned, | is my answer "pids should never be used outside the namespace they | came from, otherwise userspace won't work as expected" satisfactory? | | So is "everything else", you mentioned, covered with the problems | above? | | > - -- | > ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ | > -BEGIN PGP SIGNATURE- | > Version: GnuPG v1.4.7 (GNU/Linux) | > | > iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI | > HHz5f7TfM05Dps+ruPRiUrU= | > =IjS4 | > -END PGP SIGNATURE- | > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Pavel Emelianov [EMAIL PROTECTED] wrote: | Ulrich Drepper wrote: | -BEGIN PGP SIGNED MESSAGE- | Hash: SHA1 | | Pavel Emelyanov wrote: | Isn't it this? | | http://lkml.org/lkml/2007/11/1/141 | That was the initial problem, and I already answered to Ingo about | it | | No, look at my old mail which Ingo referenced in that posting. | | You pointed only one problem that is not a variation of how do | we handle the case when we pass our pid outside the namespace. | | This problem with signals is now being resolved at IBM by Sukadev | and Serge (I put them in Cc), so this is about to be fixed by the | time 2.6.24 releases (I hope). Yes. We (Oleg, Eric included in Cc) have a patchset to address signals issues in child pid namespaces. It is being discussed on Containers list: https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html We will post the patchset to LKML soon. | | As far as the passing the pid outside the namespace is concerned, | is my answer pids should never be used outside the namespace they | came from, otherwise userspace won't work as expected satisfactory? | | So is everything else, you mentioned, covered with the problems | above? | | - -- | ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ | -BEGIN PGP SIGNATURE- | Version: GnuPG v1.4.7 (GNU/Linux) | | iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI | HHz5f7TfM05Dps+ruPRiUrU= | =IjS4 | -END PGP SIGNATURE- | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Linus Torvalds [EMAIL PROTECTED] wrote: On Fri, 2 Nov 2007, Dave Hansen wrote: There are certainly more of these, but here is one In the futex userspace address, we install the current pid's vnr into a userspace address. Now, realistically, why not just say you can't use these things across namespaces? Does anybody really care? After all, somebody who screws this up only screws himself, not anybody else. i see two main categories of problems: - one problem is that this condition is 'invisible'. If two namespaces happen to access the same robust futex (say a yum update from two PID namespaces sharing the same read-mostly filesystem) there's silent breakage and data corruption due to PID overlap. The other namespaces have no such problems. I think the dont do that answer is lame because most apps _will_ work across PID namespaces because things like fcntl based locking does work. And there's no valid technical excuse why futexes shouldnt work: it's all controlled by the same native kernel, there's no untrusted network separating the nodes, etc. - so via this we isolate an important category of syscalls from cross-namespace use perhaps forever. Pick just about any other kernel resource and they can be shared between namespaces. But not futexes - which happen to be the most scalable locking primitive and people will almost certainly want to use them across namespaces. A completely new breed of futexes has to be introduced and trickled through userspace and all the architectures to make it work again across namespaces. Who will do that work? Generally the people who introduce a new concept are the ones who should do that. But in this case they are apparently not interested in making it generic enough (they are concentrated on their 'isolate it all' aspect) so nobody else will do and we are stuck with an incomplete concept. The answer of user-space/apps is predictable: they'll gravitate towards the path of least resistance, and that will be dont use futexes. PID namespaces basically single out an important API category and use the natural pressure of the other 300 syscalls and tens of thousands of apps against this category. Linux is basically used against itself. The counter-force is relatively weak and there's no solution available _at all_ presently so it's not even the fight of patches against each other, it's the sheer lack of a feature which has an obvious end-result. We've already got way too many incomplete concepts and APIs in the kernel. Maybe i'm over-worrying, but i fear we end up like with capabilities or sendfile - code merged too soon and never completed for many years - perhaps never completed at all. VMS and WNT did those things a bit better i think - their API frameworks were/are pervasive and complete, even in the corner cases. Whether it's the right approach to force reasonable perfection of frameworks like this from the get go is another question - but in practice even for relatively popular new APIs like epoll we see a way too slow movement towards the 'completion of the API', and that hinders adoption of new APIs very much. (With splice being a notable exception - there the central concept was so strong that it quickly pushed itself to total completion - combined with a capable maintainer of the API.) But it's not that easy for futexes and we put another roadblock in the path of futexes. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Sat, 3 Nov 2007, Ingo Molnar wrote: - one problem is that this condition is 'invisible'. If two namespaces happen to access the same robust futex (say a yum update from two PID namespaces sharing the same read-mostly filesystem) there's silent breakage and data corruption due to PID overlap. .. and this is in *no* way different from thousands of applications that write their pid to lock-files, and others decide that it's stale because using kill(pid, 0) returns that the pid doesn't exist any more. The solution? You can't do that kind of locking over NFS, or across pid namespaces. Nobody blames NFS or pid namespaces for it. - so via this we isolate an important category of syscalls from cross-namespace use perhaps forever. So? That's inherent to how those stupid stable mutexes work. I don't understand how you can call this a PID namespace design bug, when it clearly has nothing what-so-ever to do with pid namespaces, and everything to do with the *futexes* that blithely assume that pid's are unique and that made it part of the user-visible interface. OF COURSE any pid namespace design will always break such assumptions, but that's not because of any PID namespace bugs. It's what the whole *point* of PID namespaces are. If you use pid's (instead of some opaque cookies), you will not be able to use such things across pid-separation. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Sat, 3 Nov 2007 15:40:48 -0700 (PDT) Linus Torvalds [EMAIL PROTECTED] wrote: I don't understand how you can call this a PID namespace design bug, when it clearly has nothing what-so-ever to do with pid namespaces, and everything to do with the *futexes* that blithely assume that pid's are unique and that made it part of the user-visible interface. OF COURSE any pid namespace design will always break such assumptions, but that's not because of any PID namespace bugs. It's what the whole *point* of PID namespaces are. If you use pid's (instead of some opaque cookies), you will not be able to use such things across pid-separation. well... kind of. THere are 2 things around pid namespaces: which pids you can see/touch (in proc or signals or otherwise), and the non-uniqueness. For containers you clearly want the first part... but... is there a strong reason to not just *not* create duplicate pids even across namespaces? there's no rule in posix or anything similar to fd's afaik concerning which pids we can hand out... so we could just make then unique globally but just with limited visibility - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Sat, 3 Nov 2007, Arjan van de Ven wrote: On Sat, 3 Nov 2007 15:40:48 -0700 (PDT) Linus Torvalds [EMAIL PROTECTED] wrote: I don't understand how you can call this a PID namespace design bug, when it clearly has nothing what-so-ever to do with pid namespaces, and everything to do with the *futexes* that blithely assume that pid's are unique and that made it part of the user-visible interface. OF COURSE any pid namespace design will always break such assumptions, but that's not because of any PID namespace bugs. It's what the whole *point* of PID namespaces are. If you use pid's (instead of some opaque cookies), you will not be able to use such things across pid-separation. well... kind of. THere are 2 things around pid namespaces: which pids you can see/touch (in proc or signals or otherwise), and the non-uniqueness. For containers you clearly want the first part... but... is there a strong reason to not just *not* create duplicate pids even across namespaces? there's no rule in posix or anything similar to fd's afaik concerning which pids we can hand out... so we could just make then unique globally but just with limited visibility two problems that I can think of 1. the container people would like to eventually have the ability to migrate containers from one system to another (or to suspend a container) in this sort of case trying to fit the allocated PIDs from the container into a running system is a problem if PIDs are not allowed to overlap. 2. it seems to me that there is porobably a latent security issue in having a global PID namespace with just limited visability. the types of bugs that may let you affect a process seem easier to make if the only protection is visability rather then complete seperation. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: > Having access to the same IPCs in different pid namespaces won't work. > Having access to the same filesystem in different IPC namespaces won't work. > Having access to the same UID namespace in different VFS namespaces won't > work. > Having access to the same namespace in different namespace > wont' work. > [...] Then explicitly prevent the cases which cannot work in the clone() calls. Yes, giving people rope to shoot themselves is a Unix tradition but it's so unnecessary in this case and will only cause support problems for innocent people. I bet the result will be that if you have a separate PID namespace you need to enforce every other namespace as well. There are simply too many dependencies. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHK/pL2ijCOnn/RHQRAtp6AKC8QIRvJa4qVUSx9IVpRq6X+6HPGQCff/hT m2tpKWmeM+xAfS5ICvB0NVk= =5ozn -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 2007-11-02 at 10:39 -0700, Linus Torvalds wrote: > > On Fri, 2 Nov 2007, Dave Hansen wrote: > > > > There are certainly more of these, but here is one In the futex > > userspace address, we install the current pid's vnr into a userspace > > address. > > Now, realistically, why not just say "you can't use these things across > namespaces"? Does anybody really care? After all, somebody who screws this > up only screws himself, not anybody else. > > Linus Accessing the same robust futex from different PID namespaces on the same machine via a shared file mapping is logically equivalent to accessing the same robust futex from different machines via a shared filesystem and there's no reason to expect either operation to work correctly. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, Nov 02, 2007 at 06:58:47PM +0300, Pavel Emelyanov wrote: > Having access to the same IPCs in different pid namespaces won't work. > Having access to the same filesystem in different IPC namespaces won't work. > Having access to the same UID namespace in different VFS namespaces won't > work. > Having access to the same namespace in different namespace > wont' work. > > That's the idea OpenVZ tried to promote when the story with "containers" > started, but most of the other participants decided that we can create > individual namespaces and step-by-step try to make them work in all the > possible combinations. Heh. Well, this won't be the first time that we go around the design circle wiht people objecting with the idea eventually figuring out that the original idea really was the only sane way to do things. :-) Maybe it would be instructive to create a matrix which lists areas where processes that share namespace FOO but not namespace BAR would result in breakage, with an explanation of what breaks in a particular instance? Assuming we continue to go down the path of orthogonal namespace, having a file in Documentation/ which lists places where there different namepsaces have dependencies on each other for correct system call operation would be a Good Thing. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 2 Nov 2007, Dave Hansen wrote: > > There are certainly more of these, but here is one In the futex > userspace address, we install the current pid's vnr into a userspace > address. Now, realistically, why not just say "you can't use these things across namespaces"? Does anybody really care? After all, somebody who screws this up only screws himself, not anybody else. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 2007-11-02 at 01:04 -0700, Andrew Morton wrote: > > > That is the "fix" you were referring to? I was hoping you have a sketch > > > for a real solution. If nobody can think of a way to fix this PID > > > > Looks like we misunderstood each other. Can you please elaborate on > > what exactly is broken in pid namespaces? > > Isn't it this? > > http://lkml.org/lkml/2007/11/1/141 I think we're still a bit murky on exactly what the issues are. Ingo, Ulrich, is this the right track? The kind of issues that you're concerned about? There are certainly more of these, but here is one In the futex userspace address, we install the current pid's vnr into a userspace address. static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, int detect, ktime_t *time, int trylock) { ... newval = task_pid_vnr(current); curval = cmpxchg_futex_value_locked(uaddr, 0, newval); We obviously don't have any restrictions on who else might be mapping that address, so that pid can theoretically leak out to any other task. In another pid namespace, the pid at that userspace address is certainly nonsensical. -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Pavel Emelyanov wrote: >> So is "everything else", you mentioned, covered with the problems >> above? > > No, it's not. If you'd read the mail carefully you'd notice that the > use of PIDs especially in robust futexes is part of the API and that it > simply isn't acceptable to say "don't do that". A robust mutex can be > stored in any file and as long as two processes have access to the same > file (or they can pass each other shared memory) the underlying futex > functionality simply must work. This is the case when you export the pid to the user level outside the namespace. This case is not supposed to work at all. I know it and there's noting we can do with it. (some more comments about this below) > This whole approach to allow switching on and off each of the namespaces > is just wrong. Do it all or nothing, at least for the problematic ones > like NEWPID. Having access to the same filesystem but using separate > PID namespaces is simply not going to work. I'd like to note, that the original reason to switch the namespace off was to help embedded people get rid of the functionality they don't need and save the vmlinux size. Since Ingo proposed to disable the namespace creation in a ... strange way, I noticed, that there will be a more elegant way to do this. This was not the "fix" for cross-namespaces communications. Nevertheless... Having access to the same IPCs in different pid namespaces won't work. Having access to the same filesystem in different IPC namespaces won't work. Having access to the same UID namespace in different VFS namespaces won't work. Having access to the same namespace in different namespace wont' work. That's the idea OpenVZ tried to promote when the story with "containers" started, but most of the other participants decided that we can create individual namespaces and step-by-step try to make them work in all the possible combinations. Right now we have a pid namespace, which a) works fine in the initial namespace (by this I mean that it doesn't introduce *new* bugs); b) mostly works in the sub namespace. some work is to be done and it is being done; c) doesn't work in some ways (but not at all) when tasks communicate across the namespace boundary, but is not going to by definition. I'm also looking for a good solution on how to workaround the "c" case, but I'm not agree with the statement that "the pid namespaces are completely broken". They are not completely broken, but there is just some work to do with the case "b" and some way to be invented to disable the case "c". > You also brush completely over the SysV IPC issue. I did not - this problem is only relevant when you try to setup the IPC communication between processes from different namespaces, but I have already answered this question. If you use IPC within a single namespaces everything works just fine. > And I doubt that I spent enough time thinking about all this to arrive > at the more subtle problems. I don't think especially the PID namespace > is ready at all at this time. > > - -- > ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.7 (GNU/Linux) > > iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v > jgtZG129yBUGBrpQ8fbn7w== > =ho0Z > -END PGP SIGNATURE- > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: > So is "everything else", you mentioned, covered with the problems > above? No, it's not. If you'd read the mail carefully you'd notice that the use of PIDs especially in robust futexes is part of the API and that it simply isn't acceptable to say "don't do that". A robust mutex can be stored in any file and as long as two processes have access to the same file (or they can pass each other shared memory) the underlying futex functionality simply must work. This whole approach to allow switching on and off each of the namespaces is just wrong. Do it all or nothing, at least for the problematic ones like NEWPID. Having access to the same filesystem but using separate PID namespaces is simply not going to work. You also brush completely over the SysV IPC issue. And I doubt that I spent enough time thinking about all this to arrive at the more subtle problems. I don't think especially the PID namespace is ready at all at this time. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v jgtZG129yBUGBrpQ8fbn7w== =ho0Z -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Pavel Emelyanov wrote: >>> Isn't it this? >>> >>> http://lkml.org/lkml/2007/11/1/141 >> That was the initial problem, and I already answered to Ingo about >> it > > No, look at my old mail which Ingo referenced in that posting. You pointed only one problem that is not a variation of "how do we handle the case when we pass our pid outside the namespace". This problem with signals is now being resolved at IBM by Sukadev and Serge (I put them in Cc), so this is about to be fixed by the time 2.6.24 releases (I hope). As far as the "passing the pid outside the namespace" is concerned, is my answer "pids should never be used outside the namespace they came from, otherwise userspace won't work as expected" satisfactory? So is "everything else", you mentioned, covered with the problems above? > - -- > ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.7 (GNU/Linux) > > iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI > HHz5f7TfM05Dps+ruPRiUrU= > =IjS4 > -END PGP SIGNATURE- > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: >> Isn't it this? >> >> http://lkml.org/lkml/2007/11/1/141 > > That was the initial problem, and I already answered to Ingo about > it No, look at my old mail which Ingo referenced in that posting. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI HHz5f7TfM05Dps+ruPRiUrU= =IjS4 -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Andrew Morton wrote: > On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov <[EMAIL PROTECTED]> wrote: > >> Ulrich Drepper wrote: >>> -BEGIN PGP SIGNED MESSAGE- >>> Hash: SHA1 >>> >>> Pavel Emelyanov wrote: The "fix" I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS >>> That is the "fix" you were referring to? I was hoping you have a sketch >>> for a real solution. If nobody can think of a way to fix this PID >> Looks like we misunderstood each other. Can you please elaborate on >> what exactly is broken in pid namespaces? > > Isn't it this? > > http://lkml.org/lkml/2007/11/1/141 That was the initial problem, and I already answered to Ingo about it - pid, obtained in one pid namespace shouldn't be used in another. This is not a design bug, but a design idea. If he managed to get two threads in different namespaces, then we should fix this ability (but I thought that I handled it - the copy_pid_ns call doesn't allow to create a new thread in a new namespace: new_ns = ERR_PTR(-EINVAL); if (flags & CLONE_THREAD) goto out_put; ) I should have first asked Ingo about how he managed to get two threads in different namespaces to fix this, but Ulrich said that "everything else I have seen simply doesn't work without breaking something" so I asked him to elaborate on this - what _else_ doesn't work. Thanks, Pavel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov <[EMAIL PROTECTED]> wrote: > Ulrich Drepper wrote: > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > Pavel Emelyanov wrote: > >> The "fix" I mention is just returning -EINVAL in case user orders > >> CLONE_NEWPIDS > > > > That is the "fix" you were referring to? I was hoping you have a sketch > > for a real solution. If nobody can think of a way to fix this PID > > Looks like we misunderstood each other. Can you please elaborate on > what exactly is broken in pid namespaces? Isn't it this? http://lkml.org/lkml/2007/11/1/141 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Pavel Emelyanov wrote: >> The "fix" I mention is just returning -EINVAL in case user orders >> CLONE_NEWPIDS > > That is the "fix" you were referring to? I was hoping you have a sketch > for a real solution. If nobody can think of a way to fix this PID Looks like we misunderstood each other. Can you please elaborate on what exactly is broken in pid namespaces? > namespaces are IMO not something which should go in at all. We (mainly me and Sukadev) are fixing the issues we are aware of, but looks like somebody found something and didn't notify the authors about it. Thanks, Pavel > - -- > ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.7 (GNU/Linux) > > iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC > /Giyp+en+VbtfFyD8D6v4Xk= > =RnIw > -END PGP SIGNATURE- > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Andrew Morton wrote: On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: Ulrich Drepper wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS That is the fix you were referring to? I was hoping you have a sketch for a real solution. If nobody can think of a way to fix this PID Looks like we misunderstood each other. Can you please elaborate on what exactly is broken in pid namespaces? Isn't it this? http://lkml.org/lkml/2007/11/1/141 That was the initial problem, and I already answered to Ingo about it - pid, obtained in one pid namespace shouldn't be used in another. This is not a design bug, but a design idea. If he managed to get two threads in different namespaces, then we should fix this ability (but I thought that I handled it - the copy_pid_ns call doesn't allow to create a new thread in a new namespace: new_ns = ERR_PTR(-EINVAL); if (flags CLONE_THREAD) goto out_put; ) I should have first asked Ingo about how he managed to get two threads in different namespaces to fix this, but Ulrich said that everything else I have seen simply doesn't work without breaking something so I asked him to elaborate on this - what _else_ doesn't work. Thanks, Pavel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: Ulrich Drepper wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS That is the fix you were referring to? I was hoping you have a sketch for a real solution. If nobody can think of a way to fix this PID Looks like we misunderstood each other. Can you please elaborate on what exactly is broken in pid namespaces? Isn't it this? http://lkml.org/lkml/2007/11/1/141 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: Isn't it this? http://lkml.org/lkml/2007/11/1/141 That was the initial problem, and I already answered to Ingo about it No, look at my old mail which Ingo referenced in that posting. You pointed only one problem that is not a variation of how do we handle the case when we pass our pid outside the namespace. This problem with signals is now being resolved at IBM by Sukadev and Serge (I put them in Cc), so this is about to be fixed by the time 2.6.24 releases (I hope). As far as the passing the pid outside the namespace is concerned, is my answer pids should never be used outside the namespace they came from, otherwise userspace won't work as expected satisfactory? So is everything else, you mentioned, covered with the problems above? - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI HHz5f7TfM05Dps+ruPRiUrU= =IjS4 -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: Isn't it this? http://lkml.org/lkml/2007/11/1/141 That was the initial problem, and I already answered to Ingo about it No, look at my old mail which Ingo referenced in that posting. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI HHz5f7TfM05Dps+ruPRiUrU= =IjS4 -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: So is everything else, you mentioned, covered with the problems above? No, it's not. If you'd read the mail carefully you'd notice that the use of PIDs especially in robust futexes is part of the API and that it simply isn't acceptable to say don't do that. A robust mutex can be stored in any file and as long as two processes have access to the same file (or they can pass each other shared memory) the underlying futex functionality simply must work. This whole approach to allow switching on and off each of the namespaces is just wrong. Do it all or nothing, at least for the problematic ones like NEWPID. Having access to the same filesystem but using separate PID namespaces is simply not going to work. You also brush completely over the SysV IPC issue. And I doubt that I spent enough time thinking about all this to arrive at the more subtle problems. I don't think especially the PID namespace is ready at all at this time. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v jgtZG129yBUGBrpQ8fbn7w== =ho0Z -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 2007-11-02 at 01:04 -0700, Andrew Morton wrote: That is the fix you were referring to? I was hoping you have a sketch for a real solution. If nobody can think of a way to fix this PID Looks like we misunderstood each other. Can you please elaborate on what exactly is broken in pid namespaces? Isn't it this? http://lkml.org/lkml/2007/11/1/141 I think we're still a bit murky on exactly what the issues are. Ingo, Ulrich, is this the right track? The kind of issues that you're concerned about? There are certainly more of these, but here is one In the futex userspace address, we install the current pid's vnr into a userspace address. static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, int detect, ktime_t *time, int trylock) { ... newval = task_pid_vnr(current); curval = cmpxchg_futex_value_locked(uaddr, 0, newval); We obviously don't have any restrictions on who else might be mapping that address, so that pid can theoretically leak out to any other task. In another pid namespace, the pid at that userspace address is certainly nonsensical. -- Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS That is the fix you were referring to? I was hoping you have a sketch for a real solution. If nobody can think of a way to fix this PID Looks like we misunderstood each other. Can you please elaborate on what exactly is broken in pid namespaces? namespaces are IMO not something which should go in at all. We (mainly me and Sukadev) are fixing the issues we are aware of, but looks like somebody found something and didn't notify the authors about it. Thanks, Pavel - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC /Giyp+en+VbtfFyD8D6v4Xk= =RnIw -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 2 Nov 2007, Dave Hansen wrote: There are certainly more of these, but here is one In the futex userspace address, we install the current pid's vnr into a userspace address. Now, realistically, why not just say you can't use these things across namespaces? Does anybody really care? After all, somebody who screws this up only screws himself, not anybody else. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: So is everything else, you mentioned, covered with the problems above? No, it's not. If you'd read the mail carefully you'd notice that the use of PIDs especially in robust futexes is part of the API and that it simply isn't acceptable to say don't do that. A robust mutex can be stored in any file and as long as two processes have access to the same file (or they can pass each other shared memory) the underlying futex functionality simply must work. This is the case when you export the pid to the user level outside the namespace. This case is not supposed to work at all. I know it and there's noting we can do with it. (some more comments about this below) This whole approach to allow switching on and off each of the namespaces is just wrong. Do it all or nothing, at least for the problematic ones like NEWPID. Having access to the same filesystem but using separate PID namespaces is simply not going to work. I'd like to note, that the original reason to switch the namespace off was to help embedded people get rid of the functionality they don't need and save the vmlinux size. Since Ingo proposed to disable the namespace creation in a ... strange way, I noticed, that there will be a more elegant way to do this. This was not the fix for cross-namespaces communications. Nevertheless... Having access to the same IPCs in different pid namespaces won't work. Having access to the same filesystem in different IPC namespaces won't work. Having access to the same UID namespace in different VFS namespaces won't work. Having access to the same any namespace in different many others namespace wont' work. That's the idea OpenVZ tried to promote when the story with containers started, but most of the other participants decided that we can create individual namespaces and step-by-step try to make them work in all the possible combinations. Right now we have a pid namespace, which a) works fine in the initial namespace (by this I mean that it doesn't introduce *new* bugs); b) mostly works in the sub namespace. some work is to be done and it is being done; c) doesn't work in some ways (but not at all) when tasks communicate across the namespace boundary, but is not going to by definition. I'm also looking for a good solution on how to workaround the c case, but I'm not agree with the statement that the pid namespaces are completely broken. They are not completely broken, but there is just some work to do with the case b and some way to be invented to disable the case c. You also brush completely over the SysV IPC issue. I did not - this problem is only relevant when you try to setup the IPC communication between processes from different namespaces, but I have already answered this question. If you use IPC within a single namespaces everything works just fine. And I doubt that I spent enough time thinking about all this to arrive at the more subtle problems. I don't think especially the PID namespace is ready at all at this time. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v jgtZG129yBUGBrpQ8fbn7w== =ho0Z -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, Nov 02, 2007 at 06:58:47PM +0300, Pavel Emelyanov wrote: Having access to the same IPCs in different pid namespaces won't work. Having access to the same filesystem in different IPC namespaces won't work. Having access to the same UID namespace in different VFS namespaces won't work. Having access to the same any namespace in different many others namespace wont' work. That's the idea OpenVZ tried to promote when the story with containers started, but most of the other participants decided that we can create individual namespaces and step-by-step try to make them work in all the possible combinations. Heh. Well, this won't be the first time that we go around the design circle wiht people objecting with the idea eventually figuring out that the original idea really was the only sane way to do things. :-) Maybe it would be instructive to create a matrix which lists areas where processes that share namespace FOO but not namespace BAR would result in breakage, with an explanation of what breaks in a particular instance? Assuming we continue to go down the path of orthogonal namespace, having a file in Documentation/ which lists places where there different namepsaces have dependencies on each other for correct system call operation would be a Good Thing. - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Fri, 2007-11-02 at 10:39 -0700, Linus Torvalds wrote: On Fri, 2 Nov 2007, Dave Hansen wrote: There are certainly more of these, but here is one In the futex userspace address, we install the current pid's vnr into a userspace address. Now, realistically, why not just say you can't use these things across namespaces? Does anybody really care? After all, somebody who screws this up only screws himself, not anybody else. Linus Accessing the same robust futex from different PID namespaces on the same machine via a shared file mapping is logically equivalent to accessing the same robust futex from different machines via a shared filesystem and there's no reason to expect either operation to work correctly. -- Nicholas Miell [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: Having access to the same IPCs in different pid namespaces won't work. Having access to the same filesystem in different IPC namespaces won't work. Having access to the same UID namespace in different VFS namespaces won't work. Having access to the same any namespace in different many others namespace wont' work. [...] Then explicitly prevent the cases which cannot work in the clone() calls. Yes, giving people rope to shoot themselves is a Unix tradition but it's so unnecessary in this case and will only cause support problems for innocent people. I bet the result will be that if you have a separate PID namespace you need to enforce every other namespace as well. There are simply too many dependencies. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHK/pL2ijCOnn/RHQRAtp6AKC8QIRvJa4qVUSx9IVpRq6X+6HPGQCff/hT m2tpKWmeM+xAfS5ICvB0NVk= =5ozn -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ingo Molnar wrote: > but this problem is still present in the code, and it has been recently > committed into mainline via: > > commit 30e49c263e36341b60b735cbef5ca37912549264 > Author: Pavel Emelyanov <[EMAIL PROTECTED]> > Date: Thu Oct 18 23:40:10 2007 -0700 > > pid namespaces: allow cloning of new namespace > > without these problems having been resolved. A full-scale revert is > probably too intrusive, but at minimum we need to turn off user-space > access to this feature via this simple patch. Until this issue is > resolved properly the new PID namespace code needs to be turned off. > Letting this into 2.6.24 would be a disaster. > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Acked-by: Ulrich Drepper <[EMAIL PROTECTED]> - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKm3n2ijCOnn/RHQRAn7dAJ9PhfhLg29mTELwH7qLXwgJcyNi9QCgr7sc WQa4QBNesktzPKh5vcCulhM= =cYnF -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: > The "fix" I mention is just returning -EINVAL in case user orders > CLONE_NEWPIDS That is the "fix" you were referring to? I was hoping you have a sketch for a real solution. If nobody can think of a way to fix this PID namespaces are IMO not something which should go in at all. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC /Giyp+en+VbtfFyD8D6v4Xk= =RnIw -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Theodore Tso <[EMAIL PROTECTED]> wrote: > On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote: > > + if (clone_flags & CLONE_NEWPID) > > + return -ENOSYS; > > I'd use EINVAL instead of ENOSYS... ok, updated patch below. Ingo > From: Ingo Molnar <[EMAIL PROTECTED]> Subject: PID namespaces: turn them off for now while checking recent commits to the kernel core i took a look at the PID namespaces implementation, and it has a fatal flaw: it breaks futexes and various libraries (and other stuff) that use PIDs as the means of identifying tasks, by not providing any means of global identification that works across PID namespaces. (PIDs _are_ a very convenient and global way of identifying contexts.) i asked Ulrich about this and it turns out he has warned about this early on: http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html but this problem is still present in the code, and it has been recently committed into mainline via: commit 30e49c263e36341b60b735cbef5ca37912549264 Author: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Thu Oct 18 23:40:10 2007 -0700 pid namespaces: allow cloning of new namespace without these problems having been resolved. A full-scale revert is probably too intrusive, but at minimum we need to turn off user-space access to this feature via this simple patch. Until this issue is resolved properly the new PID namespace code needs to be turned off. Letting this into 2.6.24 would be a disaster. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/fork.c |9 + 1 file changed, 9 insertions(+) Index: v/kernel/fork.c === --- v.orig/kernel/fork.c +++ v/kernel/fork.c @@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags, int trace = 0; long nr; + /* +* PID namespaces are broken at the moment: they do not allow +* certain PID based syscalls (such as futexes) to be used +* across namespaces. This is broken and must not be allowed, +* so we keep this feature turned off until it's properly fixed. +*/ + if (clone_flags & CLONE_NEWPID) + return -EINVAL; + if (unlikely(current->ptrace)) { trace = fork_traceflag (clone_flags); if (trace) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote: > + if (clone_flags & CLONE_NEWPID) > + return -ENOSYS; I'd use EINVAL instead of ENOSYS... - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Thu, 2007-11-01 at 07:56 -0700, Ulrich Drepper wrote: > Pavel Emelyanov wrote: > > With this set we'll be able to mark pid namespaces as EXPERIMENTAL > > or even BROKEN, so nobody will be able to crate them. So can we, please, > > keep things as they are for now - the appropriate fix will be ready > > soon. > > You sound far too optimistic for my taste. I probably haven't seen the > proposal you have in mind but everything else I have seen simply doesn't > work without breaking something. Yeah, we definitely realize that this inhibits things that were perfectly fine before. As Eric mentioned in his reply to your message last year, the primary goal here is isolation. We'd eventually like to be able to pick a container up and move it to another system. That's going to be awfully hard if the container is sharing a resource with a part of the system which is not moving. Pid namespaces (along with the others) give us the isolation to keep these interactions from happening except in a controlled manner, breaking the ties that might bind it to one particular system. Think of how many user-visible apis deal with files and filenames. However, there can certainly be files that are unavailable to certain processes based on their membership in a particular filesystem namespaces. In fact, we use chroot() to try and _make_ certain files unavailable. -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ingo Molnar wrote: > * Pavel Emelyanov <[EMAIL PROTECTED]> wrote: > >> The "fix" I mention is just returning -EINVAL in case user orders >> CLONE_NEWPIDS and compiling out all the namespace cloning code. This >> is just a more elegant way to get rid of pid namespaces rather than >> Ingo proposed. > > unfortunately i have to NACK that approach. We never allowed broken > user-space visible APIs into the kernel like that because it just gives > a vector for that breakage to become de-facto used and forced upon the > core kernel. Even if they can be .config turned off. That's just a lame > excuse that delays the fixing of it. We may mark features that have a > good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark Pid namespaces have more than a good expectations to be fixed, so feel free to mark the (currently pending) PID_NS config option with "depends on EXPERIMENTAL". All the problems I know are slowly getting fixed, but most of them are just related to "bad pid value is reported to the user space when we work inside some new namespace". Unfortunately, as I can see, all the discussions of pid namespaces happen behind my bask, so all I can is just fix the problems I'm aware of. > drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont > introduce new core syscall features with CONFIG_BROKEN! We never did and > i hope we never will. > > The _only_ way to force the fixing of such type of breakages is to not > offer them _at all_. Really, you are proposing a major new extension to > lots of important core Linux APIs so please try to solve this problem > cleanly, it's really severe. Right now as things stand this containers I'm sure, that no *new* problems appear in case you don't enter the new namespace, so just disable the pid space creation code (with EXPERIMENTAL option) and live with the original kernel. If you point me one, I'd be glad to fix it. > sub-feature is "a little bit pregnant". This is one of the few cases > where we really _must_ say no. > > Ingo > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Pavel Emelyanov <[EMAIL PROTECTED]> wrote: > The "fix" I mention is just returning -EINVAL in case user orders > CLONE_NEWPIDS and compiling out all the namespace cloning code. This > is just a more elegant way to get rid of pid namespaces rather than > Ingo proposed. unfortunately i have to NACK that approach. We never allowed broken user-space visible APIs into the kernel like that because it just gives a vector for that breakage to become de-facto used and forced upon the core kernel. Even if they can be .config turned off. That's just a lame excuse that delays the fixing of it. We may mark features that have a good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont introduce new core syscall features with CONFIG_BROKEN! We never did and i hope we never will. The _only_ way to force the fixing of such type of breakages is to not offer them _at all_. Really, you are proposing a major new extension to lots of important core Linux APIs so please try to solve this problem cleanly, it's really severe. Right now as things stand this containers sub-feature is "a little bit pregnant". This is one of the few cases where we really _must_ say no. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Peter Zijlstra wrote: > On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote: > >> So can we, please, >> keep things as they are for now - the appropriate fix will be ready >> soon. > > Just for the curious, could you outline on how you intend to fix this? I have already answered to Ulrich about this. Just to make this sub-thread consistent: The "fix" I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS and compiling out all the namespace cloning code. This is just a more elegant way to get rid of pid namespaces rather than Ingo proposed. Here's the root of the set: http://lkml.org/lkml/2007/10/31/118 Thanks, Pavel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Pavel Emelyanov wrote: >> With this set we'll be able to mark pid namespaces as EXPERIMENTAL >> or even BROKEN, so nobody will be able to crate them. So can we, please, >> keep things as they are for now - the appropriate fix will be ready >> soon. > > You sound far too optimistic for my taste. I probably haven't seen the > proposal you have in mind but everything else I have seen simply doesn't > work without breaking something. The "fix" I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS and compiling out all the namespace cloning code. This is just a more elegant way to get rid of pid namespaces rather than Ingo proposed. Here's the root of the set: http://lkml.org/lkml/2007/10/31/118 Thanks, Pavel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Ulrich Drepper <[EMAIL PROTECTED]> wrote: > Ingo Molnar wrote: > > + clone_flags &= ~CLONE_NEWPID; > > I think the call should rather fail than silently drop the bit but > aside from that I agree. The problems we'd run into if the feature is > getting used as-is are severe. does the patch below look OK to you? Ingo ---> From: Ingo Molnar <[EMAIL PROTECTED]> Subject: PID namespaces: turn them off for now while checking recent commits to the kernel core i took a look at the PID namespaces implementation, and it has a fatal flaw: it breaks futexes and various libraries (and other stuff) that use PIDs as the means of identifying tasks, by not providing any means of global identification that works across PID namespaces. (PIDs _are_ a very convenient and global way of identifying contexts.) i asked Ulrich about this and it turns out he has warned about this early on: http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html but this problem is still present in the code, and it has been recently committed into mainline via: commit 30e49c263e36341b60b735cbef5ca37912549264 Author: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Thu Oct 18 23:40:10 2007 -0700 pid namespaces: allow cloning of new namespace without these problems having been resolved. A full-scale revert is probably too intrusive, but at minimum we need to turn off user-space access to this feature via this simple patch. Until this issue is resolved properly the new PID namespace code needs to be turned off. Letting this into 2.6.24 would be a disaster. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/fork.c |9 + 1 file changed, 9 insertions(+) Index: v/kernel/fork.c === --- v.orig/kernel/fork.c +++ v/kernel/fork.c @@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags, int trace = 0; long nr; + /* +* PID namespaces are broken at the moment: they do not allow +* certain PID based syscalls (such as futexes) to be used +* across namespaces. This is broken and must not be allowed, +* so we keep this feature turned off until it's properly fixed. +*/ + if (clone_flags & CLONE_NEWPID) + return -ENOSYS; + if (unlikely(current->ptrace)) { trace = fork_traceflag (clone_flags); if (trace) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ingo Molnar wrote: > while checking recent commits to the kernel core i took a look at the > PID namespaces implementation, and it has a fatal flaw: it breaks > futexes and various libraries (and other stuff) that use PIDs as the > means of identifying tasks, by not providing any means of global > identification that works across PID namespaces. (PIDs _are_ a very You're not 100% correct here. The task_pid_nr() does return you a unique pid, so you do have the way to identify the task. Another thing - you should *not* allow tasks to communicate across pid namespaces using any pids - this just breaks the pid namespaces idea. As far as the futexes are concerned - I do not allow threads live in different pid namespaces (more correct fix would be not to allow tasks share the mm_struct across pid namespaces, but this is a one line fix), so the situation when you have two threads in different namespaces is impossible. Thanks, Pavel > convenient and global way of identifying contexts.) > > i asked Ulrich about this and it turns out he has warned about this > early on: > > http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html > > but this problem is still present in the code, and it has been recently > committed into mainline via: > > commit 30e49c263e36341b60b735cbef5ca37912549264 > Author: Pavel Emelyanov <[EMAIL PROTECTED]> > Date: Thu Oct 18 23:40:10 2007 -0700 > > pid namespaces: allow cloning of new namespace > > without these problems having been resolved. A full-scale revert is > probably too intrusive, but at minimum we need to turn off user-space > access to this feature via this simple patch. Until this issue is > resolved properly the new PID namespace code needs to be turned off. > Letting this into 2.6.24 would be a disaster. > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > --- > kernel/fork.c |8 > 1 file changed, 8 insertions(+) > > Index: v/kernel/fork.c > === > --- v.orig/kernel/fork.c > +++ v/kernel/fork.c > @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags, > int trace = 0; > long nr; > > + /* > + * PID namespaces are broken at the moment: they do not allow > + * certain PID based syscalls (such as futexes) to be used > + * across namespaces. This is broken and must not be allowed, > + * so we keep this feature turned off until it's properly fixed. > + */ > + clone_flags &= ~CLONE_NEWPID; > + > if (unlikely(current->ptrace)) { > trace = fork_traceflag (clone_flags); > if (trace) > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote: > So can we, please, > keep things as they are for now - the appropriate fix will be ready > soon. Just for the curious, could you outline on how you intend to fix this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: > With this set we'll be able to mark pid namespaces as EXPERIMENTAL > or even BROKEN, so nobody will be able to crate them. So can we, please, > keep things as they are for now - the appropriate fix will be ready > soon. You sound far too optimistic for my taste. I probably haven't seen the proposal you have in mind but everything else I have seen simply doesn't work without breaking something. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKek22ijCOnn/RHQRAgJIAJ0VYwYHUKdEcnKfZHDdaUr5HTnk9QCgghTH n57LDahLDIIVIIlkrwNVLLQ= =2xU8 -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ingo Molnar wrote: > while checking recent commits to the kernel core i took a look at the > PID namespaces implementation, and it has a fatal flaw: it breaks > futexes and various libraries (and other stuff) that use PIDs as the > means of identifying tasks, by not providing any means of global > identification that works across PID namespaces. (PIDs _are_ a very > convenient and global way of identifying contexts.) > > i asked Ulrich about this and it turns out he has warned about this > early on: > > http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html > > but this problem is still present in the code, and it has been recently > committed into mainline via: > > commit 30e49c263e36341b60b735cbef5ca37912549264 > Author: Pavel Emelyanov <[EMAIL PROTECTED]> > Date: Thu Oct 18 23:40:10 2007 -0700 > > pid namespaces: allow cloning of new namespace > > without these problems having been resolved. A full-scale revert is > probably too intrusive, but at minimum we need to turn off user-space > access to this feature via this simple patch. Until this issue is > resolved properly the new PID namespace code needs to be turned off. > Letting this into 2.6.24 would be a disaster. > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > --- > kernel/fork.c |8 > 1 file changed, 8 insertions(+) > > Index: v/kernel/fork.c > === > --- v.orig/kernel/fork.c > +++ v/kernel/fork.c > @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags, > int trace = 0; > long nr; > > + /* > + * PID namespaces are broken at the moment: they do not allow > + * certain PID based syscalls (such as futexes) to be used > + * across namespaces. This is broken and must not be allowed, > + * so we keep this feature turned off until it's properly fixed. > + */ > + clone_flags &= ~CLONE_NEWPID; > + Well, emm. Eric already tried to solve this issue in the similar way (http://lkml.org/lkml/2007/10/26/414), but I have recently sent a more generic patch set. It turns all the namespaces off with the config options, but Andrew said to wait until the next -mm tree to rework the set. With this set we'll be able to mark pid namespaces as EXPERIMENTAL or even BROKEN, so nobody will be able to crate them. So can we, please, keep things as they are for now - the appropriate fix will be ready soon. Thanks, Pavel > if (unlikely(current->ptrace)) { > trace = fork_traceflag (clone_flags); > if (trace) > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ingo Molnar wrote: > + clone_flags &= ~CLONE_NEWPID; I think the call should rather fail than silently drop the bit but aside from that I agree. The problems we'd run into if the feature is getting used as-is are severe. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKehh2ijCOnn/RHQRAqHAAJkBu7Uj8T5J2ZlLty096zXH7IVcwACfRhlt EpwnZ1UodJXJiPpxGN8FEYo= =S/kB -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ingo Molnar wrote: while checking recent commits to the kernel core i took a look at the PID namespaces implementation, and it has a fatal flaw: it breaks futexes and various libraries (and other stuff) that use PIDs as the means of identifying tasks, by not providing any means of global identification that works across PID namespaces. (PIDs _are_ a very convenient and global way of identifying contexts.) i asked Ulrich about this and it turns out he has warned about this early on: http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html but this problem is still present in the code, and it has been recently committed into mainline via: commit 30e49c263e36341b60b735cbef5ca37912549264 Author: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu Oct 18 23:40:10 2007 -0700 pid namespaces: allow cloning of new namespace without these problems having been resolved. A full-scale revert is probably too intrusive, but at minimum we need to turn off user-space access to this feature via this simple patch. Until this issue is resolved properly the new PID namespace code needs to be turned off. Letting this into 2.6.24 would be a disaster. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- kernel/fork.c |8 1 file changed, 8 insertions(+) Index: v/kernel/fork.c === --- v.orig/kernel/fork.c +++ v/kernel/fork.c @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags, int trace = 0; long nr; + /* + * PID namespaces are broken at the moment: they do not allow + * certain PID based syscalls (such as futexes) to be used + * across namespaces. This is broken and must not be allowed, + * so we keep this feature turned off until it's properly fixed. + */ + clone_flags = ~CLONE_NEWPID; + Well, emm. Eric already tried to solve this issue in the similar way (http://lkml.org/lkml/2007/10/26/414), but I have recently sent a more generic patch set. It turns all the namespaces off with the config options, but Andrew said to wait until the next -mm tree to rework the set. With this set we'll be able to mark pid namespaces as EXPERIMENTAL or even BROKEN, so nobody will be able to crate them. So can we, please, keep things as they are for now - the appropriate fix will be ready soon. Thanks, Pavel if (unlikely(current-ptrace)) { trace = fork_traceflag (clone_flags); if (trace) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ingo Molnar wrote: + clone_flags = ~CLONE_NEWPID; I think the call should rather fail than silently drop the bit but aside from that I agree. The problems we'd run into if the feature is getting used as-is are severe. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKehh2ijCOnn/RHQRAqHAAJkBu7Uj8T5J2ZlLty096zXH7IVcwACfRhlt EpwnZ1UodJXJiPpxGN8FEYo= =S/kB -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote: So can we, please, keep things as they are for now - the appropriate fix will be ready soon. Just for the curious, could you outline on how you intend to fix this? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: With this set we'll be able to mark pid namespaces as EXPERIMENTAL or even BROKEN, so nobody will be able to crate them. So can we, please, keep things as they are for now - the appropriate fix will be ready soon. You sound far too optimistic for my taste. I probably haven't seen the proposal you have in mind but everything else I have seen simply doesn't work without breaking something. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKek22ijCOnn/RHQRAgJIAJ0VYwYHUKdEcnKfZHDdaUr5HTnk9QCgghTH n57LDahLDIIVIIlkrwNVLLQ= =2xU8 -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ingo Molnar wrote: while checking recent commits to the kernel core i took a look at the PID namespaces implementation, and it has a fatal flaw: it breaks futexes and various libraries (and other stuff) that use PIDs as the means of identifying tasks, by not providing any means of global identification that works across PID namespaces. (PIDs _are_ a very You're not 100% correct here. The task_pid_nr() does return you a unique pid, so you do have the way to identify the task. Another thing - you should *not* allow tasks to communicate across pid namespaces using any pids - this just breaks the pid namespaces idea. As far as the futexes are concerned - I do not allow threads live in different pid namespaces (more correct fix would be not to allow tasks share the mm_struct across pid namespaces, but this is a one line fix), so the situation when you have two threads in different namespaces is impossible. Thanks, Pavel convenient and global way of identifying contexts.) i asked Ulrich about this and it turns out he has warned about this early on: http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html but this problem is still present in the code, and it has been recently committed into mainline via: commit 30e49c263e36341b60b735cbef5ca37912549264 Author: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu Oct 18 23:40:10 2007 -0700 pid namespaces: allow cloning of new namespace without these problems having been resolved. A full-scale revert is probably too intrusive, but at minimum we need to turn off user-space access to this feature via this simple patch. Until this issue is resolved properly the new PID namespace code needs to be turned off. Letting this into 2.6.24 would be a disaster. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- kernel/fork.c |8 1 file changed, 8 insertions(+) Index: v/kernel/fork.c === --- v.orig/kernel/fork.c +++ v/kernel/fork.c @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags, int trace = 0; long nr; + /* + * PID namespaces are broken at the moment: they do not allow + * certain PID based syscalls (such as futexes) to be used + * across namespaces. This is broken and must not be allowed, + * so we keep this feature turned off until it's properly fixed. + */ + clone_flags = ~CLONE_NEWPID; + if (unlikely(current-ptrace)) { trace = fork_traceflag (clone_flags); if (trace) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ulrich Drepper wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: With this set we'll be able to mark pid namespaces as EXPERIMENTAL or even BROKEN, so nobody will be able to crate them. So can we, please, keep things as they are for now - the appropriate fix will be ready soon. You sound far too optimistic for my taste. I probably haven't seen the proposal you have in mind but everything else I have seen simply doesn't work without breaking something. The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS and compiling out all the namespace cloning code. This is just a more elegant way to get rid of pid namespaces rather than Ingo proposed. Here's the root of the set: http://lkml.org/lkml/2007/10/31/118 Thanks, Pavel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Ulrich Drepper [EMAIL PROTECTED] wrote: Ingo Molnar wrote: + clone_flags = ~CLONE_NEWPID; I think the call should rather fail than silently drop the bit but aside from that I agree. The problems we'd run into if the feature is getting used as-is are severe. does the patch below look OK to you? Ingo --- From: Ingo Molnar [EMAIL PROTECTED] Subject: PID namespaces: turn them off for now while checking recent commits to the kernel core i took a look at the PID namespaces implementation, and it has a fatal flaw: it breaks futexes and various libraries (and other stuff) that use PIDs as the means of identifying tasks, by not providing any means of global identification that works across PID namespaces. (PIDs _are_ a very convenient and global way of identifying contexts.) i asked Ulrich about this and it turns out he has warned about this early on: http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html but this problem is still present in the code, and it has been recently committed into mainline via: commit 30e49c263e36341b60b735cbef5ca37912549264 Author: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu Oct 18 23:40:10 2007 -0700 pid namespaces: allow cloning of new namespace without these problems having been resolved. A full-scale revert is probably too intrusive, but at minimum we need to turn off user-space access to this feature via this simple patch. Until this issue is resolved properly the new PID namespace code needs to be turned off. Letting this into 2.6.24 would be a disaster. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- kernel/fork.c |9 + 1 file changed, 9 insertions(+) Index: v/kernel/fork.c === --- v.orig/kernel/fork.c +++ v/kernel/fork.c @@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags, int trace = 0; long nr; + /* +* PID namespaces are broken at the moment: they do not allow +* certain PID based syscalls (such as futexes) to be used +* across namespaces. This is broken and must not be allowed, +* so we keep this feature turned off until it's properly fixed. +*/ + if (clone_flags CLONE_NEWPID) + return -ENOSYS; + if (unlikely(current-ptrace)) { trace = fork_traceflag (clone_flags); if (trace) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Peter Zijlstra wrote: On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote: So can we, please, keep things as they are for now - the appropriate fix will be ready soon. Just for the curious, could you outline on how you intend to fix this? I have already answered to Ulrich about this. Just to make this sub-thread consistent: The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS and compiling out all the namespace cloning code. This is just a more elegant way to get rid of pid namespaces rather than Ingo proposed. Here's the root of the set: http://lkml.org/lkml/2007/10/31/118 Thanks, Pavel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Pavel Emelyanov [EMAIL PROTECTED] wrote: The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS and compiling out all the namespace cloning code. This is just a more elegant way to get rid of pid namespaces rather than Ingo proposed. unfortunately i have to NACK that approach. We never allowed broken user-space visible APIs into the kernel like that because it just gives a vector for that breakage to become de-facto used and forced upon the core kernel. Even if they can be .config turned off. That's just a lame excuse that delays the fixing of it. We may mark features that have a good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont introduce new core syscall features with CONFIG_BROKEN! We never did and i hope we never will. The _only_ way to force the fixing of such type of breakages is to not offer them _at all_. Really, you are proposing a major new extension to lots of important core Linux APIs so please try to solve this problem cleanly, it's really severe. Right now as things stand this containers sub-feature is a little bit pregnant. This is one of the few cases where we really _must_ say no. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
Ingo Molnar wrote: * Pavel Emelyanov [EMAIL PROTECTED] wrote: The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS and compiling out all the namespace cloning code. This is just a more elegant way to get rid of pid namespaces rather than Ingo proposed. unfortunately i have to NACK that approach. We never allowed broken user-space visible APIs into the kernel like that because it just gives a vector for that breakage to become de-facto used and forced upon the core kernel. Even if they can be .config turned off. That's just a lame excuse that delays the fixing of it. We may mark features that have a good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark Pid namespaces have more than a good expectations to be fixed, so feel free to mark the (currently pending) PID_NS config option with depends on EXPERIMENTAL. All the problems I know are slowly getting fixed, but most of them are just related to bad pid value is reported to the user space when we work inside some new namespace. Unfortunately, as I can see, all the discussions of pid namespaces happen behind my bask, so all I can is just fix the problems I'm aware of. drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont introduce new core syscall features with CONFIG_BROKEN! We never did and i hope we never will. The _only_ way to force the fixing of such type of breakages is to not offer them _at all_. Really, you are proposing a major new extension to lots of important core Linux APIs so please try to solve this problem cleanly, it's really severe. Right now as things stand this containers I'm sure, that no *new* problems appear in case you don't enter the new namespace, so just disable the pid space creation code (with EXPERIMENTAL option) and live with the original kernel. If you point me one, I'd be glad to fix it. sub-feature is a little bit pregnant. This is one of the few cases where we really _must_ say no. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Thu, 2007-11-01 at 07:56 -0700, Ulrich Drepper wrote: Pavel Emelyanov wrote: With this set we'll be able to mark pid namespaces as EXPERIMENTAL or even BROKEN, so nobody will be able to crate them. So can we, please, keep things as they are for now - the appropriate fix will be ready soon. You sound far too optimistic for my taste. I probably haven't seen the proposal you have in mind but everything else I have seen simply doesn't work without breaking something. Yeah, we definitely realize that this inhibits things that were perfectly fine before. As Eric mentioned in his reply to your message last year, the primary goal here is isolation. We'd eventually like to be able to pick a container up and move it to another system. That's going to be awfully hard if the container is sharing a resource with a part of the system which is not moving. Pid namespaces (along with the others) give us the isolation to keep these interactions from happening except in a controlled manner, breaking the ties that might bind it to one particular system. Think of how many user-visible apis deal with files and filenames. However, there can certainly be files that are unavailable to certain processes based on their membership in a particular filesystem namespaces. In fact, we use chroot() to try and _make_ certain files unavailable. -- Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote: + if (clone_flags CLONE_NEWPID) + return -ENOSYS; I'd use EINVAL instead of ENOSYS... - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
* Theodore Tso [EMAIL PROTECTED] wrote: On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote: + if (clone_flags CLONE_NEWPID) + return -ENOSYS; I'd use EINVAL instead of ENOSYS... ok, updated patch below. Ingo From: Ingo Molnar [EMAIL PROTECTED] Subject: PID namespaces: turn them off for now while checking recent commits to the kernel core i took a look at the PID namespaces implementation, and it has a fatal flaw: it breaks futexes and various libraries (and other stuff) that use PIDs as the means of identifying tasks, by not providing any means of global identification that works across PID namespaces. (PIDs _are_ a very convenient and global way of identifying contexts.) i asked Ulrich about this and it turns out he has warned about this early on: http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html but this problem is still present in the code, and it has been recently committed into mainline via: commit 30e49c263e36341b60b735cbef5ca37912549264 Author: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu Oct 18 23:40:10 2007 -0700 pid namespaces: allow cloning of new namespace without these problems having been resolved. A full-scale revert is probably too intrusive, but at minimum we need to turn off user-space access to this feature via this simple patch. Until this issue is resolved properly the new PID namespace code needs to be turned off. Letting this into 2.6.24 would be a disaster. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- kernel/fork.c |9 + 1 file changed, 9 insertions(+) Index: v/kernel/fork.c === --- v.orig/kernel/fork.c +++ v/kernel/fork.c @@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags, int trace = 0; long nr; + /* +* PID namespaces are broken at the moment: they do not allow +* certain PID based syscalls (such as futexes) to be used +* across namespaces. This is broken and must not be allowed, +* so we keep this feature turned off until it's properly fixed. +*/ + if (clone_flags CLONE_NEWPID) + return -EINVAL; + if (unlikely(current-ptrace)) { trace = fork_traceflag (clone_flags); if (trace) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pavel Emelyanov wrote: The fix I mention is just returning -EINVAL in case user orders CLONE_NEWPIDS That is the fix you were referring to? I was hoping you have a sketch for a real solution. If nobody can think of a way to fix this PID namespaces are IMO not something which should go in at all. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC /Giyp+en+VbtfFyD8D6v4Xk= =RnIw -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PID namespace design bug, workaround
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ingo Molnar wrote: but this problem is still present in the code, and it has been recently committed into mainline via: commit 30e49c263e36341b60b735cbef5ca37912549264 Author: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu Oct 18 23:40:10 2007 -0700 pid namespaces: allow cloning of new namespace without these problems having been resolved. A full-scale revert is probably too intrusive, but at minimum we need to turn off user-space access to this feature via this simple patch. Until this issue is resolved properly the new PID namespace code needs to be turned off. Letting this into 2.6.24 would be a disaster. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Acked-by: Ulrich Drepper [EMAIL PROTECTED] - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHKm3n2ijCOnn/RHQRAn7dAJ9PhfhLg29mTELwH7qLXwgJcyNi9QCgr7sc WQa4QBNesktzPKh5vcCulhM= =cYnF -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/