Re: [patch] PID namespace design bug, workaround

2007-11-06 Thread Pavel Emelyanov
Ulrich Drepper wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Pavel Emelyanov wrote:
>> Having access to the same IPCs in different pid namespaces won't work.
>> Having access to the same filesystem in different IPC namespaces won't work.
>> Having access to the same UID namespace in different VFS namespaces won't 
>> work.
>> Having access to the same  namespace in different  
>> namespace
>>  wont' work.
>> [...]
> 
> 
> Then explicitly prevent the cases which cannot work in the clone()
> calls.  Yes, giving people rope to shoot themselves is a Unix tradition
> but it's so unnecessary in this case and will only cause support
> problems for innocent people.

:)

> I bet the result will be that if you have a separate PID namespace you
> need to enforce every other namespace as well.  There are simply too
> many dependencies.

I think, that Ted's proposal (about the "namespaces compatibility matrix") is 
better. I'd prefer knowing of what can stop working in case I do something
rather that forcedly having my hands off this.

Thanks,
Pavel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-06 Thread Pavel Emelyanov
Ulrich Drepper wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Pavel Emelyanov wrote:
 Having access to the same IPCs in different pid namespaces won't work.
 Having access to the same filesystem in different IPC namespaces won't work.
 Having access to the same UID namespace in different VFS namespaces won't 
 work.
 Having access to the same any namespace in different many others 
 namespace
  wont' work.
 [...]
 
 
 Then explicitly prevent the cases which cannot work in the clone()
 calls.  Yes, giving people rope to shoot themselves is a Unix tradition
 but it's so unnecessary in this case and will only cause support
 problems for innocent people.

:)

 I bet the result will be that if you have a separate PID namespace you
 need to enforce every other namespace as well.  There are simply too
 many dependencies.

I think, that Ted's proposal (about the namespaces compatibility matrix) is 
better. I'd prefer knowing of what can stop working in case I do something
rather that forcedly having my hands off this.

Thanks,
Pavel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-04 Thread Eric W. Biederman
[EMAIL PROTECTED] writes:

> Pavel Emelianov [EMAIL PROTECTED] wrote:
> | Ulrich Drepper wrote:
> | > -BEGIN PGP SIGNED MESSAGE-
> | > Hash: SHA1
> | > 
> | > Pavel Emelyanov wrote:
> | >>> Isn't it this?
> | >>>
> | >>> http://lkml.org/lkml/2007/11/1/141
> | >> That was the initial problem, and I already answered to Ingo about
> | >> it
> | > 
> | > No, look at my old mail which Ingo referenced in that posting.
> | 
> | You pointed only one problem that is not a variation of "how do 
> | we handle the case when we pass our pid outside the namespace".
> | 
> | This problem with signals is now being resolved at IBM by Sukadev 
> | and Serge (I put them in Cc), so this is about to be fixed by the
> | time 2.6.24 releases (I hope).
>
> Yes. We (Oleg, Eric included in Cc) have a patchset to address signals
> issues in child pid namespaces. It is being discussed on Containers list:
>
> https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html
>
> We will post the patchset to LKML soon.

Yes.  Getting all of the cross namespace cases working that we can is a
goal.  Currently I don't know if we can do better with the futexes that
have pids in the user/kernel ABI.  Plain futexes should be fine.

Implementation wise si_pid is a bit of a pain but we should have that
one sorted out shortly.  It is a well understood and we just need to get
the code right.

The pids in the sysvipc space should also be fairly simple to handle
just do a classic struct pid conversion.  And convert from struct pid
to a pid_t right at the user/kernel interface.

Unless I missed something we already properly handle giving people
usable pids from the tty layer.

Getting pids working properly for unix domain socket credential when
we cross pid namespaces is another case that needs a struct pid
conversion to get things working, but the kernel should be able to
do the right thing at that point.

In summary when pids are stored inside the kernel we have all of the
needed infrastructure with struct pid to handle doing the right
thing processes communicate between pid namespaces.

Right now we just need to go through every place in the kernel make
certain we haven't over looked something we can handle.  And there
are a lot of places where the kernel uses pids

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-04 Thread Eric W. Biederman
[EMAIL PROTECTED] writes:

 Pavel Emelianov [EMAIL PROTECTED] wrote:
 | Ulrich Drepper wrote:
 |  -BEGIN PGP SIGNED MESSAGE-
 |  Hash: SHA1
 |  
 |  Pavel Emelyanov wrote:
 |  Isn't it this?
 | 
 |  http://lkml.org/lkml/2007/11/1/141
 |  That was the initial problem, and I already answered to Ingo about
 |  it
 |  
 |  No, look at my old mail which Ingo referenced in that posting.
 | 
 | You pointed only one problem that is not a variation of how do 
 | we handle the case when we pass our pid outside the namespace.
 | 
 | This problem with signals is now being resolved at IBM by Sukadev 
 | and Serge (I put them in Cc), so this is about to be fixed by the
 | time 2.6.24 releases (I hope).

 Yes. We (Oleg, Eric included in Cc) have a patchset to address signals
 issues in child pid namespaces. It is being discussed on Containers list:

 https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html

 We will post the patchset to LKML soon.

Yes.  Getting all of the cross namespace cases working that we can is a
goal.  Currently I don't know if we can do better with the futexes that
have pids in the user/kernel ABI.  Plain futexes should be fine.

Implementation wise si_pid is a bit of a pain but we should have that
one sorted out shortly.  It is a well understood and we just need to get
the code right.

The pids in the sysvipc space should also be fairly simple to handle
just do a classic struct pid conversion.  And convert from struct pid
to a pid_t right at the user/kernel interface.

Unless I missed something we already properly handle giving people
usable pids from the tty layer.

Getting pids working properly for unix domain socket credential when
we cross pid namespaces is another case that needs a struct pid
conversion to get things working, but the kernel should be able to
do the right thing at that point.

In summary when pids are stored inside the kernel we have all of the
needed infrastructure with struct pid to handle doing the right
thing processes communicate between pid namespaces.

Right now we just need to go through every place in the kernel make
certain we haven't over looked something we can handle.  And there
are a lot of places where the kernel uses pids

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread david

On Sat, 3 Nov 2007, Arjan van de Ven wrote:


On Sat, 3 Nov 2007 15:40:48 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:


I don't understand how you can call this a "PID namespace design
bug", when it clearly has nothing what-so-ever to do with pid
namespaces, and everything to do with the *futexes* that blithely
assume that pid's are unique and that made it part of the
user-visible interface.

OF COURSE any pid namespace design will always break such
assumptions, but that's not because of any PID namespace bugs. It's
what the whole *point* of PID namespaces are. If you use pid's
(instead of some opaque cookies), you will not be able to use such
things across pid-separation.


well... kind of.
THere are 2 things around pid namespaces: which pids you can see/touch
(in proc or signals or otherwise), and the non-uniqueness.

For containers you clearly want the first part... but... is there a
strong reason to not just *not* create duplicate pids even across
namespaces? there's no rule in posix or anything similar to fd's afaik
concerning which pids we can hand out... so we could just make then
unique globally but just with limited visibility


two problems that I can think of

1. the container people would like to eventually have the ability to 
migrate containers from one system to another (or to suspend a container) 
in this sort of case trying to fit the allocated PIDs from the container 
into a running system is a problem if PIDs are not allowed to overlap.


2. it seems to me that there is porobably a latent security issue in 
having a global PID namespace with just limited visability. the types of 
bugs that may let you affect a process seem easier to make if the only 
protection is visability rather then complete seperation.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread Arjan van de Ven
On Sat, 3 Nov 2007 15:40:48 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> I don't understand how you can call this a "PID namespace design
> bug", when it clearly has nothing what-so-ever to do with pid
> namespaces, and everything to do with the *futexes* that blithely
> assume that pid's are unique and that made it part of the
> user-visible interface.
> 
> OF COURSE any pid namespace design will always break such
> assumptions, but that's not because of any PID namespace bugs. It's
> what the whole *point* of PID namespaces are. If you use pid's
> (instead of some opaque cookies), you will not be able to use such
> things across pid-separation.

well... kind of.
THere are 2 things around pid namespaces: which pids you can see/touch
(in proc or signals or otherwise), and the non-uniqueness.

For containers you clearly want the first part... but... is there a
strong reason to not just *not* create duplicate pids even across
namespaces? there's no rule in posix or anything similar to fd's afaik
concerning which pids we can hand out... so we could just make then
unique globally but just with limited visibility
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread Linus Torvalds


On Sat, 3 Nov 2007, Ingo Molnar wrote:
> 
> - one problem is that this condition is 'invisible'. If two namespaces 
>   happen to access the same robust futex (say a yum update from two 
>   PID namespaces sharing the same read-mostly filesystem) there's silent
>   breakage and data corruption due to PID overlap.

.. and this is in *no* way different from thousands of applications that 
write their pid to lock-files, and others decide that it's "stale" because 
using "kill(pid, 0)" returns that the pid doesn't exist any more.

The solution? You can't do that kind of locking over NFS, or across pid 
namespaces. Nobody blames NFS or pid namespaces for it. 

> - so via this we isolate an important category of syscalls from
>   cross-namespace use perhaps forever.

So? That's inherent to how those stupid stable mutexes work.

I don't understand how you can call this a "PID namespace design bug", 
when it clearly has nothing what-so-ever to do with pid namespaces, and 
everything to do with the *futexes* that blithely assume that pid's are 
unique and that made it part of the user-visible interface.

OF COURSE any pid namespace design will always break such assumptions, but 
that's not because of any PID namespace bugs. It's what the whole *point* 
of PID namespaces are. If you use pid's (instead of some opaque cookies), 
you will not be able to use such things across pid-separation.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Fri, 2 Nov 2007, Dave Hansen wrote:
> > 
> > There are certainly more of these, but here is one In the futex 
> > userspace address, we install the current pid's vnr into a userspace 
> > address.
> 
> Now, realistically, why not just say "you can't use these things 
> across namespaces"? Does anybody really care? After all, somebody who 
> screws this up only screws himself, not anybody else.

i see two main categories of problems:

- one problem is that this condition is 'invisible'. If two namespaces 
  happen to access the same robust futex (say a yum update from two 
  PID namespaces sharing the same read-mostly filesystem) there's silent
  breakage and data corruption due to PID overlap. The other
  namespaces have no such problems. I think the "dont do that" answer is
  lame because most apps _will_ work across PID namespaces because 
  things like fcntl based locking does work. And there's no valid
  technical excuse why futexes shouldnt work: it's all controlled by the
  same native kernel, there's no untrusted network separating the nodes,
  etc.

- so via this we isolate an important category of syscalls from
  cross-namespace use perhaps forever. Pick just about any other kernel
  resource and they can be shared between namespaces. But not futexes -
  which happen to be the most scalable locking primitive and people will
  almost certainly want to use them across namespaces. A
  completely new breed of futexes has to be introduced and trickled
  through userspace and all the architectures to make it work again
  across namespaces. Who will do that work? Generally the people who
  introduce a new concept are the ones who should do that. But in this
  case they are apparently not interested in making it generic enough
  (they are concentrated on their 'isolate it all' aspect) so
  nobody else will do and we are stuck with an incomplete concept.

The answer of user-space/apps is predictable: they'll gravitate towards 
the path of least resistance, and that will be "dont use futexes". PID 
namespaces basically single out an important API category and use the 
natural pressure of the other 300 syscalls and tens of thousands of apps 
against this category. Linux is basically used against itself. The 
counter-force is relatively weak and there's no solution available _at 
all_ presently so it's not even the fight of patches against each other, 
it's the sheer lack of a feature which has an obvious end-result.

We've already got way too many incomplete concepts and APIs in the 
kernel. Maybe i'm over-worrying, but i fear we end up like with 
capabilities or sendfile - code merged too soon and never completed for 
many years - perhaps never completed at all. VMS and WNT did those 
things a bit better i think - their API frameworks were/are pervasive 
and complete, even in the corner cases.

Whether it's the right approach to force reasonable perfection of 
frameworks like this from the get go is another question - but in 
practice even for relatively popular new APIs like epoll we see a way 
too slow movement towards the 'completion of the API', and that hinders 
adoption of new APIs very much. (With splice being a notable exception - 
there the central concept was so strong that it quickly pushed itself to 
total completion - combined with a capable maintainer of the API.) But 
it's not that easy for futexes and we put another roadblock in the path 
of futexes.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread sukadev
Pavel Emelianov [EMAIL PROTECTED] wrote:
| Ulrich Drepper wrote:
| > -BEGIN PGP SIGNED MESSAGE-
| > Hash: SHA1
| > 
| > Pavel Emelyanov wrote:
| >>> Isn't it this?
| >>>
| >>> http://lkml.org/lkml/2007/11/1/141
| >> That was the initial problem, and I already answered to Ingo about
| >> it
| > 
| > No, look at my old mail which Ingo referenced in that posting.
| 
| You pointed only one problem that is not a variation of "how do 
| we handle the case when we pass our pid outside the namespace".
| 
| This problem with signals is now being resolved at IBM by Sukadev 
| and Serge (I put them in Cc), so this is about to be fixed by the
| time 2.6.24 releases (I hope).

Yes. We (Oleg, Eric included in Cc) have a patchset to address signals
issues in child pid namespaces. It is being discussed on Containers list:

https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html

We will post the patchset to LKML soon.

| 
| As far as the "passing the pid outside the namespace" is concerned, 
| is my answer "pids should never be used outside the namespace they
| came from, otherwise userspace won't work as expected" satisfactory?
| 
| So is "everything else", you mentioned, covered with the problems
| above?
| 
| > - --
| > ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
| > -BEGIN PGP SIGNATURE-
| > Version: GnuPG v1.4.7 (GNU/Linux)
| > 
| > iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI
| > HHz5f7TfM05Dps+ruPRiUrU=
| > =IjS4
| > -END PGP SIGNATURE-
| > 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread sukadev
Pavel Emelianov [EMAIL PROTECTED] wrote:
| Ulrich Drepper wrote:
|  -BEGIN PGP SIGNED MESSAGE-
|  Hash: SHA1
|  
|  Pavel Emelyanov wrote:
|  Isn't it this?
| 
|  http://lkml.org/lkml/2007/11/1/141
|  That was the initial problem, and I already answered to Ingo about
|  it
|  
|  No, look at my old mail which Ingo referenced in that posting.
| 
| You pointed only one problem that is not a variation of how do 
| we handle the case when we pass our pid outside the namespace.
| 
| This problem with signals is now being resolved at IBM by Sukadev 
| and Serge (I put them in Cc), so this is about to be fixed by the
| time 2.6.24 releases (I hope).

Yes. We (Oleg, Eric included in Cc) have a patchset to address signals
issues in child pid namespaces. It is being discussed on Containers list:

https://lists.linux-foundation.org/pipermail/containers/2007-October/008240.html

We will post the patchset to LKML soon.

| 
| As far as the passing the pid outside the namespace is concerned, 
| is my answer pids should never be used outside the namespace they
| came from, otherwise userspace won't work as expected satisfactory?
| 
| So is everything else, you mentioned, covered with the problems
| above?
| 
|  - --
|  ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
|  -BEGIN PGP SIGNATURE-
|  Version: GnuPG v1.4.7 (GNU/Linux)
|  
|  iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI
|  HHz5f7TfM05Dps+ruPRiUrU=
|  =IjS4
|  -END PGP SIGNATURE-
|  
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 On Fri, 2 Nov 2007, Dave Hansen wrote:
  
  There are certainly more of these, but here is one In the futex 
  userspace address, we install the current pid's vnr into a userspace 
  address.
 
 Now, realistically, why not just say you can't use these things 
 across namespaces? Does anybody really care? After all, somebody who 
 screws this up only screws himself, not anybody else.

i see two main categories of problems:

- one problem is that this condition is 'invisible'. If two namespaces 
  happen to access the same robust futex (say a yum update from two 
  PID namespaces sharing the same read-mostly filesystem) there's silent
  breakage and data corruption due to PID overlap. The other
  namespaces have no such problems. I think the dont do that answer is
  lame because most apps _will_ work across PID namespaces because 
  things like fcntl based locking does work. And there's no valid
  technical excuse why futexes shouldnt work: it's all controlled by the
  same native kernel, there's no untrusted network separating the nodes,
  etc.

- so via this we isolate an important category of syscalls from
  cross-namespace use perhaps forever. Pick just about any other kernel
  resource and they can be shared between namespaces. But not futexes -
  which happen to be the most scalable locking primitive and people will
  almost certainly want to use them across namespaces. A
  completely new breed of futexes has to be introduced and trickled
  through userspace and all the architectures to make it work again
  across namespaces. Who will do that work? Generally the people who
  introduce a new concept are the ones who should do that. But in this
  case they are apparently not interested in making it generic enough
  (they are concentrated on their 'isolate it all' aspect) so
  nobody else will do and we are stuck with an incomplete concept.

The answer of user-space/apps is predictable: they'll gravitate towards 
the path of least resistance, and that will be dont use futexes. PID 
namespaces basically single out an important API category and use the 
natural pressure of the other 300 syscalls and tens of thousands of apps 
against this category. Linux is basically used against itself. The 
counter-force is relatively weak and there's no solution available _at 
all_ presently so it's not even the fight of patches against each other, 
it's the sheer lack of a feature which has an obvious end-result.

We've already got way too many incomplete concepts and APIs in the 
kernel. Maybe i'm over-worrying, but i fear we end up like with 
capabilities or sendfile - code merged too soon and never completed for 
many years - perhaps never completed at all. VMS and WNT did those 
things a bit better i think - their API frameworks were/are pervasive 
and complete, even in the corner cases.

Whether it's the right approach to force reasonable perfection of 
frameworks like this from the get go is another question - but in 
practice even for relatively popular new APIs like epoll we see a way 
too slow movement towards the 'completion of the API', and that hinders 
adoption of new APIs very much. (With splice being a notable exception - 
there the central concept was so strong that it quickly pushed itself to 
total completion - combined with a capable maintainer of the API.) But 
it's not that easy for futexes and we put another roadblock in the path 
of futexes.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread Linus Torvalds


On Sat, 3 Nov 2007, Ingo Molnar wrote:
 
 - one problem is that this condition is 'invisible'. If two namespaces 
   happen to access the same robust futex (say a yum update from two 
   PID namespaces sharing the same read-mostly filesystem) there's silent
   breakage and data corruption due to PID overlap.

.. and this is in *no* way different from thousands of applications that 
write their pid to lock-files, and others decide that it's stale because 
using kill(pid, 0) returns that the pid doesn't exist any more.

The solution? You can't do that kind of locking over NFS, or across pid 
namespaces. Nobody blames NFS or pid namespaces for it. 

 - so via this we isolate an important category of syscalls from
   cross-namespace use perhaps forever.

So? That's inherent to how those stupid stable mutexes work.

I don't understand how you can call this a PID namespace design bug, 
when it clearly has nothing what-so-ever to do with pid namespaces, and 
everything to do with the *futexes* that blithely assume that pid's are 
unique and that made it part of the user-visible interface.

OF COURSE any pid namespace design will always break such assumptions, but 
that's not because of any PID namespace bugs. It's what the whole *point* 
of PID namespaces are. If you use pid's (instead of some opaque cookies), 
you will not be able to use such things across pid-separation.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread Arjan van de Ven
On Sat, 3 Nov 2007 15:40:48 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:

 I don't understand how you can call this a PID namespace design
 bug, when it clearly has nothing what-so-ever to do with pid
 namespaces, and everything to do with the *futexes* that blithely
 assume that pid's are unique and that made it part of the
 user-visible interface.
 
 OF COURSE any pid namespace design will always break such
 assumptions, but that's not because of any PID namespace bugs. It's
 what the whole *point* of PID namespaces are. If you use pid's
 (instead of some opaque cookies), you will not be able to use such
 things across pid-separation.

well... kind of.
THere are 2 things around pid namespaces: which pids you can see/touch
(in proc or signals or otherwise), and the non-uniqueness.

For containers you clearly want the first part... but... is there a
strong reason to not just *not* create duplicate pids even across
namespaces? there's no rule in posix or anything similar to fd's afaik
concerning which pids we can hand out... so we could just make then
unique globally but just with limited visibility
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-03 Thread david

On Sat, 3 Nov 2007, Arjan van de Ven wrote:


On Sat, 3 Nov 2007 15:40:48 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:


I don't understand how you can call this a PID namespace design
bug, when it clearly has nothing what-so-ever to do with pid
namespaces, and everything to do with the *futexes* that blithely
assume that pid's are unique and that made it part of the
user-visible interface.

OF COURSE any pid namespace design will always break such
assumptions, but that's not because of any PID namespace bugs. It's
what the whole *point* of PID namespaces are. If you use pid's
(instead of some opaque cookies), you will not be able to use such
things across pid-separation.


well... kind of.
THere are 2 things around pid namespaces: which pids you can see/touch
(in proc or signals or otherwise), and the non-uniqueness.

For containers you clearly want the first part... but... is there a
strong reason to not just *not* create duplicate pids even across
namespaces? there's no rule in posix or anything similar to fd's afaik
concerning which pids we can hand out... so we could just make then
unique globally but just with limited visibility


two problems that I can think of

1. the container people would like to eventually have the ability to 
migrate containers from one system to another (or to suspend a container) 
in this sort of case trying to fit the allocated PIDs from the container 
into a running system is a problem if PIDs are not allowed to overlap.


2. it seems to me that there is porobably a latent security issue in 
having a global PID namespace with just limited visability. the types of 
bugs that may let you affect a process seem easier to make if the only 
protection is visability rather then complete seperation.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
> Having access to the same IPCs in different pid namespaces won't work.
> Having access to the same filesystem in different IPC namespaces won't work.
> Having access to the same UID namespace in different VFS namespaces won't 
> work.
> Having access to the same  namespace in different  namespace
>  wont' work.
> [...]


Then explicitly prevent the cases which cannot work in the clone()
calls.  Yes, giving people rope to shoot themselves is a Unix tradition
but it's so unnecessary in this case and will only cause support
problems for innocent people.

I bet the result will be that if you have a separate PID namespace you
need to enforce every other namespace as well.  There are simply too
many dependencies.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHK/pL2ijCOnn/RHQRAtp6AKC8QIRvJa4qVUSx9IVpRq6X+6HPGQCff/hT
m2tpKWmeM+xAfS5ICvB0NVk=
=5ozn
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Nicholas Miell
On Fri, 2007-11-02 at 10:39 -0700, Linus Torvalds wrote:
> 
> On Fri, 2 Nov 2007, Dave Hansen wrote:
> > 
> > There are certainly more of these, but here is one In the futex
> > userspace address, we install the current pid's vnr into a userspace
> > address.  
> 
> Now, realistically, why not just say "you can't use these things across 
> namespaces"? Does anybody really care? After all, somebody who screws this 
> up only screws himself, not anybody else.
> 
>   Linus

Accessing the same robust futex from different PID namespaces on the
same machine via a shared file mapping is logically equivalent to
accessing the same robust futex from different machines via a shared
filesystem and there's no reason to expect either operation to work
correctly.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Theodore Tso
On Fri, Nov 02, 2007 at 06:58:47PM +0300, Pavel Emelyanov wrote:
> Having access to the same IPCs in different pid namespaces won't work.
> Having access to the same filesystem in different IPC namespaces won't work.
> Having access to the same UID namespace in different VFS namespaces won't 
> work.
> Having access to the same  namespace in different  namespace
>  wont' work.
> 
> That's the idea OpenVZ tried to promote when the story with "containers"
> started, but most of the other participants decided that we can create
> individual namespaces and step-by-step try to make them work in all the
> possible combinations.

Heh.  Well, this won't be the first time that we go around the design
circle wiht people objecting with the idea eventually figuring out
that the original idea really was the only sane way to do things.  :-)

Maybe it would be instructive to create a matrix which lists areas
where processes that share namespace FOO but not namespace BAR would
result in breakage, with an explanation of what breaks in a particular
instance?  Assuming we continue to go down the path of orthogonal
namespace, having a file in Documentation/ which lists places where
there different namepsaces have dependencies on each other for correct
system call operation would be a Good Thing.

  - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Linus Torvalds


On Fri, 2 Nov 2007, Dave Hansen wrote:
> 
> There are certainly more of these, but here is one In the futex
> userspace address, we install the current pid's vnr into a userspace
> address.  

Now, realistically, why not just say "you can't use these things across 
namespaces"? Does anybody really care? After all, somebody who screws this 
up only screws himself, not anybody else.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Dave Hansen
On Fri, 2007-11-02 at 01:04 -0700, Andrew Morton wrote:
> > > That is the "fix" you were referring to?  I was hoping you have a sketch
> > > for a real solution.  If nobody can think of a way to fix this PID
> > 
> > Looks like we misunderstood each other. Can you please elaborate on 
> > what exactly is broken in pid namespaces?
> 
> Isn't it this?
> 
> http://lkml.org/lkml/2007/11/1/141

I think we're still a bit murky on exactly what the issues are.  Ingo,
Ulrich, is this the right track?  The kind of issues that you're
concerned about?

There are certainly more of these, but here is one In the futex
userspace address, we install the current pid's vnr into a userspace
address.  

static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
 int detect, ktime_t *time, int trylock)
{
...
newval = task_pid_vnr(current);
curval = cmpxchg_futex_value_locked(uaddr, 0, newval);

We obviously don't have any restrictions on who else might be mapping
that address, so that pid can theoretically leak out to any other task.
In another pid namespace, the pid at that userspace address is certainly
nonsensical.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Ulrich Drepper wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Pavel Emelyanov wrote:
>> So is "everything else", you mentioned, covered with the problems
>> above?
> 
> No, it's not.  If you'd read the mail carefully you'd notice that the
> use of PIDs especially in robust futexes is part of the API and that it
> simply isn't acceptable to say "don't do that".  A robust mutex can be
> stored in any file and as long as two processes have access to the same
> file (or they can pass each other shared memory) the underlying futex
> functionality simply must work.

This is the case when you export the pid to the user level outside 
the namespace. This case is not supposed to work at all. I know it 
and there's noting we can do with it. (some more comments about this
below)

> This whole approach to allow switching on and off each of the namespaces
> is just wrong.  Do it all or nothing, at least for the problematic ones
> like NEWPID.  Having access to the same filesystem but using separate
> PID namespaces is simply not going to work.

I'd like to note, that the original reason to switch the namespace off 
was to help embedded people get rid of the functionality they don't 
need and save the vmlinux size. Since Ingo proposed to disable the 
namespace creation in a ... strange way, I noticed, that there will be 
a more elegant way to do this. This was not the "fix" for cross-namespaces
communications.

Nevertheless...

Having access to the same IPCs in different pid namespaces won't work.
Having access to the same filesystem in different IPC namespaces won't work.
Having access to the same UID namespace in different VFS namespaces won't work.
Having access to the same  namespace in different  namespace
 wont' work.

That's the idea OpenVZ tried to promote when the story with "containers"
started, but most of the other participants decided that we can create
individual namespaces and step-by-step try to make them work in all the
possible combinations.

Right now we have a pid namespace, which

a) works fine in the initial namespace (by this I mean that it doesn't
   introduce *new* bugs);
b) mostly works in the sub namespace. some work is to be done and it
   is being done;
c) doesn't work in some ways (but not at all) when tasks communicate 
   across the namespace boundary, but is not going to by definition.

I'm also looking for a good solution on how to workaround the
"c" case, but I'm not agree with the statement that "the pid
namespaces are completely broken". They are not completely broken,
but there is just some work to do with the case "b" and some way
to be invented to disable the case "c".

> You also brush completely over the SysV IPC issue.

I did not - this problem is only relevant when you try to setup the
IPC communication between processes from different namespaces, but
I have already answered this question.

If you use IPC within a single namespaces everything works just fine.

> And I doubt that I spent enough time thinking about all this to arrive
> at the more subtle problems.  I don't think especially the PID namespace
> is ready at all at this time.
> 
> - --
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.7 (GNU/Linux)
> 
> iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v
> jgtZG129yBUGBrpQ8fbn7w==
> =ho0Z
> -END PGP SIGNATURE-
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
> So is "everything else", you mentioned, covered with the problems
> above?

No, it's not.  If you'd read the mail carefully you'd notice that the
use of PIDs especially in robust futexes is part of the API and that it
simply isn't acceptable to say "don't do that".  A robust mutex can be
stored in any file and as long as two processes have access to the same
file (or they can pass each other shared memory) the underlying futex
functionality simply must work.

This whole approach to allow switching on and off each of the namespaces
is just wrong.  Do it all or nothing, at least for the problematic ones
like NEWPID.  Having access to the same filesystem but using separate
PID namespaces is simply not going to work.

You also brush completely over the SysV IPC issue.

And I doubt that I spent enough time thinking about all this to arrive
at the more subtle problems.  I don't think especially the PID namespace
is ready at all at this time.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v
jgtZG129yBUGBrpQ8fbn7w==
=ho0Z
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Ulrich Drepper wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Pavel Emelyanov wrote:
>>> Isn't it this?
>>>
>>> http://lkml.org/lkml/2007/11/1/141
>> That was the initial problem, and I already answered to Ingo about
>> it
> 
> No, look at my old mail which Ingo referenced in that posting.

You pointed only one problem that is not a variation of "how do 
we handle the case when we pass our pid outside the namespace".

This problem with signals is now being resolved at IBM by Sukadev 
and Serge (I put them in Cc), so this is about to be fixed by the
time 2.6.24 releases (I hope).

As far as the "passing the pid outside the namespace" is concerned, 
is my answer "pids should never be used outside the namespace they
came from, otherwise userspace won't work as expected" satisfactory?

So is "everything else", you mentioned, covered with the problems
above?

> - --
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.7 (GNU/Linux)
> 
> iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI
> HHz5f7TfM05Dps+ruPRiUrU=
> =IjS4
> -END PGP SIGNATURE-
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
>> Isn't it this?
>>
>> http://lkml.org/lkml/2007/11/1/141
> 
> That was the initial problem, and I already answered to Ingo about
> it

No, look at my old mail which Ingo referenced in that posting.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI
HHz5f7TfM05Dps+ruPRiUrU=
=IjS4
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Andrew Morton wrote:
> On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov <[EMAIL PROTECTED]> wrote:
> 
>> Ulrich Drepper wrote:
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA1
>>>
>>> Pavel Emelyanov wrote:
 The "fix" I mention is just returning -EINVAL in case user orders 
 CLONE_NEWPIDS
>>> That is the "fix" you were referring to?  I was hoping you have a sketch
>>> for a real solution.  If nobody can think of a way to fix this PID
>> Looks like we misunderstood each other. Can you please elaborate on 
>> what exactly is broken in pid namespaces?
> 
> Isn't it this?
> 
> http://lkml.org/lkml/2007/11/1/141

That was the initial problem, and I already answered to Ingo about
it - pid, obtained in one pid namespace shouldn't be used in another.
This is not a design bug, but a design idea. If he managed to get two
threads in different namespaces, then we should fix this ability (but 
I thought that I handled it - the copy_pid_ns call doesn't allow to 
create a new thread in a new namespace:

new_ns = ERR_PTR(-EINVAL);
if (flags & CLONE_THREAD)
goto out_put;

) I should have first asked Ingo about how he managed to get two 
threads in different namespaces to fix this, but Ulrich said that 

 "everything else I have seen simply doesn't work without
  breaking something"

so I asked him to elaborate on this - what _else_ doesn't work.

Thanks,
Pavel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Andrew Morton
On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov <[EMAIL PROTECTED]> wrote:

> Ulrich Drepper wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> > 
> > Pavel Emelyanov wrote:
> >> The "fix" I mention is just returning -EINVAL in case user orders 
> >> CLONE_NEWPIDS
> > 
> > That is the "fix" you were referring to?  I was hoping you have a sketch
> > for a real solution.  If nobody can think of a way to fix this PID
> 
> Looks like we misunderstood each other. Can you please elaborate on 
> what exactly is broken in pid namespaces?

Isn't it this?

http://lkml.org/lkml/2007/11/1/141

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Ulrich Drepper wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Pavel Emelyanov wrote:
>> The "fix" I mention is just returning -EINVAL in case user orders 
>> CLONE_NEWPIDS
> 
> That is the "fix" you were referring to?  I was hoping you have a sketch
> for a real solution.  If nobody can think of a way to fix this PID

Looks like we misunderstood each other. Can you please elaborate on 
what exactly is broken in pid namespaces?

> namespaces are IMO not something which should go in at all.

We (mainly me and Sukadev) are fixing the issues we are aware of, but 
looks like somebody found something and didn't notify the authors 
about it.

Thanks,
Pavel

> - --
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.7 (GNU/Linux)
> 
> iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC
> /Giyp+en+VbtfFyD8D6v4Xk=
> =RnIw
> -END PGP SIGNATURE-
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Andrew Morton wrote:
 On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote:
 
 Ulrich Drepper wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Pavel Emelyanov wrote:
 The fix I mention is just returning -EINVAL in case user orders 
 CLONE_NEWPIDS
 That is the fix you were referring to?  I was hoping you have a sketch
 for a real solution.  If nobody can think of a way to fix this PID
 Looks like we misunderstood each other. Can you please elaborate on 
 what exactly is broken in pid namespaces?
 
 Isn't it this?
 
 http://lkml.org/lkml/2007/11/1/141

That was the initial problem, and I already answered to Ingo about
it - pid, obtained in one pid namespace shouldn't be used in another.
This is not a design bug, but a design idea. If he managed to get two
threads in different namespaces, then we should fix this ability (but 
I thought that I handled it - the copy_pid_ns call doesn't allow to 
create a new thread in a new namespace:

new_ns = ERR_PTR(-EINVAL);
if (flags  CLONE_THREAD)
goto out_put;

) I should have first asked Ingo about how he managed to get two 
threads in different namespaces to fix this, but Ulrich said that 

 everything else I have seen simply doesn't work without
  breaking something

so I asked him to elaborate on this - what _else_ doesn't work.

Thanks,
Pavel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Andrew Morton
On Fri, 02 Nov 2007 10:55:02 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote:

 Ulrich Drepper wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
  
  Pavel Emelyanov wrote:
  The fix I mention is just returning -EINVAL in case user orders 
  CLONE_NEWPIDS
  
  That is the fix you were referring to?  I was hoping you have a sketch
  for a real solution.  If nobody can think of a way to fix this PID
 
 Looks like we misunderstood each other. Can you please elaborate on 
 what exactly is broken in pid namespaces?

Isn't it this?

http://lkml.org/lkml/2007/11/1/141

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Ulrich Drepper wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Pavel Emelyanov wrote:
 Isn't it this?

 http://lkml.org/lkml/2007/11/1/141
 That was the initial problem, and I already answered to Ingo about
 it
 
 No, look at my old mail which Ingo referenced in that posting.

You pointed only one problem that is not a variation of how do 
we handle the case when we pass our pid outside the namespace.

This problem with signals is now being resolved at IBM by Sukadev 
and Serge (I put them in Cc), so this is about to be fixed by the
time 2.6.24 releases (I hope).

As far as the passing the pid outside the namespace is concerned, 
is my answer pids should never be used outside the namespace they
came from, otherwise userspace won't work as expected satisfactory?

So is everything else, you mentioned, covered with the problems
above?

 - --
 ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 
 iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI
 HHz5f7TfM05Dps+ruPRiUrU=
 =IjS4
 -END PGP SIGNATURE-
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
 Isn't it this?

 http://lkml.org/lkml/2007/11/1/141
 
 That was the initial problem, and I already answered to Ingo about
 it

No, look at my old mail which Ingo referenced in that posting.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKy692ijCOnn/RHQRAtYLAJ98EXTGl3HMlCbVXOkL7TJRFfw4DACfcgYI
HHz5f7TfM05Dps+ruPRiUrU=
=IjS4
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
 So is everything else, you mentioned, covered with the problems
 above?

No, it's not.  If you'd read the mail carefully you'd notice that the
use of PIDs especially in robust futexes is part of the API and that it
simply isn't acceptable to say don't do that.  A robust mutex can be
stored in any file and as long as two processes have access to the same
file (or they can pass each other shared memory) the underlying futex
functionality simply must work.

This whole approach to allow switching on and off each of the namespaces
is just wrong.  Do it all or nothing, at least for the problematic ones
like NEWPID.  Having access to the same filesystem but using separate
PID namespaces is simply not going to work.

You also brush completely over the SysV IPC issue.

And I doubt that I spent enough time thinking about all this to arrive
at the more subtle problems.  I don't think especially the PID namespace
is ready at all at this time.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v
jgtZG129yBUGBrpQ8fbn7w==
=ho0Z
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Dave Hansen
On Fri, 2007-11-02 at 01:04 -0700, Andrew Morton wrote:
   That is the fix you were referring to?  I was hoping you have a sketch
   for a real solution.  If nobody can think of a way to fix this PID
  
  Looks like we misunderstood each other. Can you please elaborate on 
  what exactly is broken in pid namespaces?
 
 Isn't it this?
 
 http://lkml.org/lkml/2007/11/1/141

I think we're still a bit murky on exactly what the issues are.  Ingo,
Ulrich, is this the right track?  The kind of issues that you're
concerned about?

There are certainly more of these, but here is one In the futex
userspace address, we install the current pid's vnr into a userspace
address.  

static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
 int detect, ktime_t *time, int trylock)
{
...
newval = task_pid_vnr(current);
curval = cmpxchg_futex_value_locked(uaddr, 0, newval);

We obviously don't have any restrictions on who else might be mapping
that address, so that pid can theoretically leak out to any other task.
In another pid namespace, the pid at that userspace address is certainly
nonsensical.

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Ulrich Drepper wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Pavel Emelyanov wrote:
 The fix I mention is just returning -EINVAL in case user orders 
 CLONE_NEWPIDS
 
 That is the fix you were referring to?  I was hoping you have a sketch
 for a real solution.  If nobody can think of a way to fix this PID

Looks like we misunderstood each other. Can you please elaborate on 
what exactly is broken in pid namespaces?

 namespaces are IMO not something which should go in at all.

We (mainly me and Sukadev) are fixing the issues we are aware of, but 
looks like somebody found something and didn't notify the authors 
about it.

Thanks,
Pavel

 - --
 ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 
 iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC
 /Giyp+en+VbtfFyD8D6v4Xk=
 =RnIw
 -END PGP SIGNATURE-
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Linus Torvalds


On Fri, 2 Nov 2007, Dave Hansen wrote:
 
 There are certainly more of these, but here is one In the futex
 userspace address, we install the current pid's vnr into a userspace
 address.  

Now, realistically, why not just say you can't use these things across 
namespaces? Does anybody really care? After all, somebody who screws this 
up only screws himself, not anybody else.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Pavel Emelyanov
Ulrich Drepper wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Pavel Emelyanov wrote:
 So is everything else, you mentioned, covered with the problems
 above?
 
 No, it's not.  If you'd read the mail carefully you'd notice that the
 use of PIDs especially in robust futexes is part of the API and that it
 simply isn't acceptable to say don't do that.  A robust mutex can be
 stored in any file and as long as two processes have access to the same
 file (or they can pass each other shared memory) the underlying futex
 functionality simply must work.

This is the case when you export the pid to the user level outside 
the namespace. This case is not supposed to work at all. I know it 
and there's noting we can do with it. (some more comments about this
below)

 This whole approach to allow switching on and off each of the namespaces
 is just wrong.  Do it all or nothing, at least for the problematic ones
 like NEWPID.  Having access to the same filesystem but using separate
 PID namespaces is simply not going to work.

I'd like to note, that the original reason to switch the namespace off 
was to help embedded people get rid of the functionality they don't 
need and save the vmlinux size. Since Ingo proposed to disable the 
namespace creation in a ... strange way, I noticed, that there will be 
a more elegant way to do this. This was not the fix for cross-namespaces
communications.

Nevertheless...

Having access to the same IPCs in different pid namespaces won't work.
Having access to the same filesystem in different IPC namespaces won't work.
Having access to the same UID namespace in different VFS namespaces won't work.
Having access to the same any namespace in different many others namespace
 wont' work.

That's the idea OpenVZ tried to promote when the story with containers
started, but most of the other participants decided that we can create
individual namespaces and step-by-step try to make them work in all the
possible combinations.

Right now we have a pid namespace, which

a) works fine in the initial namespace (by this I mean that it doesn't
   introduce *new* bugs);
b) mostly works in the sub namespace. some work is to be done and it
   is being done;
c) doesn't work in some ways (but not at all) when tasks communicate 
   across the namespace boundary, but is not going to by definition.

I'm also looking for a good solution on how to workaround the
c case, but I'm not agree with the statement that the pid
namespaces are completely broken. They are not completely broken,
but there is just some work to do with the case b and some way
to be invented to disable the case c.

 You also brush completely over the SysV IPC issue.

I did not - this problem is only relevant when you try to setup the
IPC communication between processes from different namespaces, but
I have already answered this question.

If you use IPC within a single namespaces everything works just fine.

 And I doubt that I spent enough time thinking about all this to arrive
 at the more subtle problems.  I don't think especially the PID namespace
 is ready at all at this time.
 
 - --
 ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 
 iD4DBQFHK0N42ijCOnn/RHQRAkPyAJiDR9ZEPUbCdEa2xk+Te80B7avDAJ4mgy7v
 jgtZG129yBUGBrpQ8fbn7w==
 =ho0Z
 -END PGP SIGNATURE-
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Theodore Tso
On Fri, Nov 02, 2007 at 06:58:47PM +0300, Pavel Emelyanov wrote:
 Having access to the same IPCs in different pid namespaces won't work.
 Having access to the same filesystem in different IPC namespaces won't work.
 Having access to the same UID namespace in different VFS namespaces won't 
 work.
 Having access to the same any namespace in different many others namespace
  wont' work.
 
 That's the idea OpenVZ tried to promote when the story with containers
 started, but most of the other participants decided that we can create
 individual namespaces and step-by-step try to make them work in all the
 possible combinations.

Heh.  Well, this won't be the first time that we go around the design
circle wiht people objecting with the idea eventually figuring out
that the original idea really was the only sane way to do things.  :-)

Maybe it would be instructive to create a matrix which lists areas
where processes that share namespace FOO but not namespace BAR would
result in breakage, with an explanation of what breaks in a particular
instance?  Assuming we continue to go down the path of orthogonal
namespace, having a file in Documentation/ which lists places where
there different namepsaces have dependencies on each other for correct
system call operation would be a Good Thing.

  - Ted
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Nicholas Miell
On Fri, 2007-11-02 at 10:39 -0700, Linus Torvalds wrote:
 
 On Fri, 2 Nov 2007, Dave Hansen wrote:
  
  There are certainly more of these, but here is one In the futex
  userspace address, we install the current pid's vnr into a userspace
  address.  
 
 Now, realistically, why not just say you can't use these things across 
 namespaces? Does anybody really care? After all, somebody who screws this 
 up only screws himself, not anybody else.
 
   Linus

Accessing the same robust futex from different PID namespaces on the
same machine via a shared file mapping is logically equivalent to
accessing the same robust futex from different machines via a shared
filesystem and there's no reason to expect either operation to work
correctly.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
 Having access to the same IPCs in different pid namespaces won't work.
 Having access to the same filesystem in different IPC namespaces won't work.
 Having access to the same UID namespace in different VFS namespaces won't 
 work.
 Having access to the same any namespace in different many others namespace
  wont' work.
 [...]


Then explicitly prevent the cases which cannot work in the clone()
calls.  Yes, giving people rope to shoot themselves is a Unix tradition
but it's so unnecessary in this case and will only cause support
problems for innocent people.

I bet the result will be that if you have a separate PID namespace you
need to enforce every other namespace as well.  There are simply too
many dependencies.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHK/pL2ijCOnn/RHQRAtp6AKC8QIRvJa4qVUSx9IVpRq6X+6HPGQCff/hT
m2tpKWmeM+xAfS5ICvB0NVk=
=5ozn
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ingo Molnar wrote:

> but this problem is still present in the code, and it has been recently 
> committed into mainline via:
> 
>   commit 30e49c263e36341b60b735cbef5ca37912549264
>   Author: Pavel Emelyanov <[EMAIL PROTECTED]>
>   Date:   Thu Oct 18 23:40:10 2007 -0700
> 
>   pid namespaces: allow cloning of new namespace
> 
> without these problems having been resolved. A full-scale revert is 
> probably too intrusive, but at minimum we need to turn off user-space 
> access to this feature via this simple patch. Until this issue is 
> resolved properly the new PID namespace code needs to be turned off. 
> Letting this into 2.6.24 would be a disaster.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

Acked-by: Ulrich Drepper <[EMAIL PROTECTED]>


- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKm3n2ijCOnn/RHQRAn7dAJ9PhfhLg29mTELwH7qLXwgJcyNi9QCgr7sc
WQa4QBNesktzPKh5vcCulhM=
=cYnF
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
> The "fix" I mention is just returning -EINVAL in case user orders 
> CLONE_NEWPIDS

That is the "fix" you were referring to?  I was hoping you have a sketch
for a real solution.  If nobody can think of a way to fix this PID
namespaces are IMO not something which should go in at all.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC
/Giyp+en+VbtfFyD8D6v4Xk=
=RnIw
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ingo Molnar

* Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote:
> > +   if (clone_flags & CLONE_NEWPID)
> > +   return -ENOSYS;
> 
> I'd use EINVAL instead of ENOSYS...

ok, updated patch below.

Ingo

>
From: Ingo Molnar <[EMAIL PROTECTED]>
Subject: PID namespaces: turn them off for now

while checking recent commits to the kernel core i took a look at the 
PID namespaces implementation, and it has a fatal flaw: it breaks 
futexes and various libraries (and other stuff) that use PIDs as the 
means of identifying tasks, by not providing any means of global 
identification that works across PID namespaces. (PIDs _are_ a very 
convenient and global way of identifying contexts.)

i asked Ulrich about this and it turns out he has warned about this 
early on:

  http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html

but this problem is still present in the code, and it has been recently 
committed into mainline via:

  commit 30e49c263e36341b60b735cbef5ca37912549264
  Author: Pavel Emelyanov <[EMAIL PROTECTED]>
  Date:   Thu Oct 18 23:40:10 2007 -0700

  pid namespaces: allow cloning of new namespace

without these problems having been resolved. A full-scale revert is 
probably too intrusive, but at minimum we need to turn off user-space 
access to this feature via this simple patch. Until this issue is 
resolved properly the new PID namespace code needs to be turned off. 
Letting this into 2.6.24 would be a disaster.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/fork.c |9 +
 1 file changed, 9 insertions(+)

Index: v/kernel/fork.c
===
--- v.orig/kernel/fork.c
+++ v/kernel/fork.c
@@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags,
int trace = 0;
long nr;
 
+   /*
+* PID namespaces are broken at the moment: they do not allow
+* certain PID based syscalls (such as futexes) to be used
+* across namespaces. This is broken and must not be allowed,
+* so we keep this feature turned off until it's properly fixed.
+*/
+   if (clone_flags & CLONE_NEWPID)
+   return -EINVAL;
+
if (unlikely(current->ptrace)) {
trace = fork_traceflag (clone_flags);
if (trace)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Theodore Tso
On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote:
> + if (clone_flags & CLONE_NEWPID)
> + return -ENOSYS;

I'd use EINVAL instead of ENOSYS...

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Dave Hansen
On Thu, 2007-11-01 at 07:56 -0700, Ulrich Drepper wrote:
> Pavel Emelyanov wrote:
> > With this set we'll be able to mark pid namespaces as EXPERIMENTAL
> > or even BROKEN, so nobody will be able to crate them. So can we, please, 
> > keep things as they are for now - the appropriate fix will be ready 
> > soon.
> 
> You sound far too optimistic for my taste.  I probably haven't seen the
> proposal you have in mind but everything else I have seen simply doesn't
> work without breaking something.

Yeah, we definitely realize that this inhibits things that were
perfectly fine before.  

As Eric mentioned in his reply to your message last year, the primary
goal here is isolation.  We'd eventually like to be able to pick a
container up and move it to another system.  That's going to be awfully
hard if the container is sharing a resource with a part of the system
which is not moving.

Pid namespaces (along with the others) give us the isolation to keep
these interactions from happening except in a controlled manner,
breaking the ties that might bind it to one particular system.

Think of how many user-visible apis deal with files and filenames.
However, there can certainly be files that are unavailable to certain
processes based on their membership in a particular filesystem
namespaces.  In fact, we use chroot() to try and _make_ certain files
unavailable.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ingo Molnar wrote:
> * Pavel Emelyanov <[EMAIL PROTECTED]> wrote:
> 
>> The "fix" I mention is just returning -EINVAL in case user orders 
>> CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
>> is just a more elegant way to get rid of pid namespaces rather than 
>> Ingo proposed.
> 
> unfortunately i have to NACK that approach. We never allowed broken 
> user-space visible APIs into the kernel like that because it just gives 
> a vector for that breakage to become de-facto used and forced upon the 
> core kernel. Even if they can be .config turned off. That's just a lame 
> excuse that delays the fixing of it. We may mark features that have a 
> good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark 

Pid namespaces have more than a good expectations to be fixed, so
feel free to mark the (currently pending) PID_NS config option with
"depends on EXPERIMENTAL".

All the problems I know are slowly getting fixed, but most of them are
just related to "bad pid value is reported to the user space when we 
work inside some new namespace".

Unfortunately, as I can see, all the discussions of pid namespaces 
happen behind my bask, so all I can is just fix the problems I'm 
aware of.

> drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont 
> introduce new core syscall features with CONFIG_BROKEN! We never did and 
> i hope we never will.
> 
> The _only_ way to force the fixing of such type of breakages is to not 
> offer them _at all_. Really, you are proposing a major new extension to 
> lots of important core Linux APIs so please try to solve this problem 
> cleanly, it's really severe. Right now as things stand this containers 

I'm sure, that no *new* problems appear in case you don't enter the new
namespace, so just disable the pid space creation code (with EXPERIMENTAL 
option) and live with the original kernel.

If you point me one, I'd be glad to fix it.

> sub-feature is "a little bit pregnant". This is one of the few cases 
> where we really _must_ say no.
> 
>   Ingo
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ingo Molnar

* Pavel Emelyanov <[EMAIL PROTECTED]> wrote:

> The "fix" I mention is just returning -EINVAL in case user orders 
> CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
> is just a more elegant way to get rid of pid namespaces rather than 
> Ingo proposed.

unfortunately i have to NACK that approach. We never allowed broken 
user-space visible APIs into the kernel like that because it just gives 
a vector for that breakage to become de-facto used and forced upon the 
core kernel. Even if they can be .config turned off. That's just a lame 
excuse that delays the fixing of it. We may mark features that have a 
good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark 
drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont 
introduce new core syscall features with CONFIG_BROKEN! We never did and 
i hope we never will.

The _only_ way to force the fixing of such type of breakages is to not 
offer them _at all_. Really, you are proposing a major new extension to 
lots of important core Linux APIs so please try to solve this problem 
cleanly, it's really severe. Right now as things stand this containers 
sub-feature is "a little bit pregnant". This is one of the few cases 
where we really _must_ say no.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Peter Zijlstra wrote:
> On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote:
> 
>>  So can we, please, 
>> keep things as they are for now - the appropriate fix will be ready 
>> soon.
> 
> Just for the curious, could you outline on how you intend to fix this?

I have already answered to Ulrich about this. Just to
make this sub-thread consistent:

The "fix" I mention is just returning -EINVAL in case user orders 
CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
is just a more elegant way to get rid of pid namespaces rather than 
Ingo proposed.

Here's the root of the set:
http://lkml.org/lkml/2007/10/31/118

Thanks,
Pavel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ulrich Drepper wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Pavel Emelyanov wrote:
>> With this set we'll be able to mark pid namespaces as EXPERIMENTAL
>> or even BROKEN, so nobody will be able to crate them. So can we, please, 
>> keep things as they are for now - the appropriate fix will be ready 
>> soon.
> 
> You sound far too optimistic for my taste.  I probably haven't seen the
> proposal you have in mind but everything else I have seen simply doesn't
> work without breaking something.

The "fix" I mention is just returning -EINVAL in case user orders 
CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
is just a more elegant way to get rid of pid namespaces rather than 
Ingo proposed.

Here's the root of the set:
http://lkml.org/lkml/2007/10/31/118

Thanks,
Pavel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ingo Molnar

* Ulrich Drepper <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> > +   clone_flags &= ~CLONE_NEWPID;
> 
> I think the call should rather fail than silently drop the bit but 
> aside from that I agree.  The problems we'd run into if the feature is 
> getting used as-is are severe.

does the patch below look OK to you?

Ingo

--->
From: Ingo Molnar <[EMAIL PROTECTED]>
Subject: PID namespaces: turn them off for now

while checking recent commits to the kernel core i took a look at the 
PID namespaces implementation, and it has a fatal flaw: it breaks 
futexes and various libraries (and other stuff) that use PIDs as the 
means of identifying tasks, by not providing any means of global 
identification that works across PID namespaces. (PIDs _are_ a very 
convenient and global way of identifying contexts.)

i asked Ulrich about this and it turns out he has warned about this 
early on:

  http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html

but this problem is still present in the code, and it has been recently 
committed into mainline via:

  commit 30e49c263e36341b60b735cbef5ca37912549264
  Author: Pavel Emelyanov <[EMAIL PROTECTED]>
  Date:   Thu Oct 18 23:40:10 2007 -0700

  pid namespaces: allow cloning of new namespace

without these problems having been resolved. A full-scale revert is 
probably too intrusive, but at minimum we need to turn off user-space 
access to this feature via this simple patch. Until this issue is 
resolved properly the new PID namespace code needs to be turned off. 
Letting this into 2.6.24 would be a disaster.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/fork.c |9 +
 1 file changed, 9 insertions(+)

Index: v/kernel/fork.c
===
--- v.orig/kernel/fork.c
+++ v/kernel/fork.c
@@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags,
int trace = 0;
long nr;
 
+   /*
+* PID namespaces are broken at the moment: they do not allow
+* certain PID based syscalls (such as futexes) to be used
+* across namespaces. This is broken and must not be allowed,
+* so we keep this feature turned off until it's properly fixed.
+*/
+   if (clone_flags & CLONE_NEWPID)
+   return -ENOSYS;
+
if (unlikely(current->ptrace)) {
trace = fork_traceflag (clone_flags);
if (trace)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ingo Molnar wrote:
> while checking recent commits to the kernel core i took a look at the 
> PID namespaces implementation, and it has a fatal flaw: it breaks 
> futexes and various libraries (and other stuff) that use PIDs as the 
> means of identifying tasks, by not providing any means of global 
> identification that works across PID namespaces. (PIDs _are_ a very 

You're not 100% correct here. The task_pid_nr() does return you a
unique pid, so you do have the way to identify the task.

Another thing - you should *not* allow tasks to communicate across
pid namespaces using any pids - this just breaks the pid namespaces
idea.

As far as the futexes are concerned - I do not allow threads live 
in different pid namespaces (more correct fix would be not to allow 
tasks share the mm_struct across pid namespaces, but this is a one
line fix), so the situation when you have two threads in different 
namespaces is impossible.

Thanks,
Pavel

> convenient and global way of identifying contexts.)
> 
> i asked Ulrich about this and it turns out he has warned about this 
> early on:
> 
>   http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html
> 
> but this problem is still present in the code, and it has been recently 
> committed into mainline via:
> 
>   commit 30e49c263e36341b60b735cbef5ca37912549264
>   Author: Pavel Emelyanov <[EMAIL PROTECTED]>
>   Date:   Thu Oct 18 23:40:10 2007 -0700
> 
>   pid namespaces: allow cloning of new namespace
> 
> without these problems having been resolved. A full-scale revert is 
> probably too intrusive, but at minimum we need to turn off user-space 
> access to this feature via this simple patch. Until this issue is 
> resolved properly the new PID namespace code needs to be turned off. 
> Letting this into 2.6.24 would be a disaster.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> ---
>  kernel/fork.c |8 
>  1 file changed, 8 insertions(+)
> 
> Index: v/kernel/fork.c
> ===
> --- v.orig/kernel/fork.c
> +++ v/kernel/fork.c
> @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags,
>   int trace = 0;
>   long nr;
>  
> + /*
> +  * PID namespaces are broken at the moment: they do not allow
> +  * certain PID based syscalls (such as futexes) to be used
> +  * across namespaces. This is broken and must not be allowed,
> +  * so we keep this feature turned off until it's properly fixed.
> +  */
> + clone_flags &= ~CLONE_NEWPID;
> +
>   if (unlikely(current->ptrace)) {
>   trace = fork_traceflag (clone_flags);
>   if (trace)
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Peter Zijlstra
On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote:

>  So can we, please, 
> keep things as they are for now - the appropriate fix will be ready 
> soon.

Just for the curious, could you outline on how you intend to fix this?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
> With this set we'll be able to mark pid namespaces as EXPERIMENTAL
> or even BROKEN, so nobody will be able to crate them. So can we, please, 
> keep things as they are for now - the appropriate fix will be ready 
> soon.

You sound far too optimistic for my taste.  I probably haven't seen the
proposal you have in mind but everything else I have seen simply doesn't
work without breaking something.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKek22ijCOnn/RHQRAgJIAJ0VYwYHUKdEcnKfZHDdaUr5HTnk9QCgghTH
n57LDahLDIIVIIlkrwNVLLQ=
=2xU8
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ingo Molnar wrote:
> while checking recent commits to the kernel core i took a look at the 
> PID namespaces implementation, and it has a fatal flaw: it breaks 
> futexes and various libraries (and other stuff) that use PIDs as the 
> means of identifying tasks, by not providing any means of global 
> identification that works across PID namespaces. (PIDs _are_ a very 
> convenient and global way of identifying contexts.)
> 
> i asked Ulrich about this and it turns out he has warned about this 
> early on:
> 
>   http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html
> 
> but this problem is still present in the code, and it has been recently 
> committed into mainline via:
> 
>   commit 30e49c263e36341b60b735cbef5ca37912549264
>   Author: Pavel Emelyanov <[EMAIL PROTECTED]>
>   Date:   Thu Oct 18 23:40:10 2007 -0700
> 
>   pid namespaces: allow cloning of new namespace
> 
> without these problems having been resolved. A full-scale revert is 
> probably too intrusive, but at minimum we need to turn off user-space 
> access to this feature via this simple patch. Until this issue is 
> resolved properly the new PID namespace code needs to be turned off. 
> Letting this into 2.6.24 would be a disaster.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> ---
>  kernel/fork.c |8 
>  1 file changed, 8 insertions(+)
> 
> Index: v/kernel/fork.c
> ===
> --- v.orig/kernel/fork.c
> +++ v/kernel/fork.c
> @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags,
>   int trace = 0;
>   long nr;
>  
> + /*
> +  * PID namespaces are broken at the moment: they do not allow
> +  * certain PID based syscalls (such as futexes) to be used
> +  * across namespaces. This is broken and must not be allowed,
> +  * so we keep this feature turned off until it's properly fixed.
> +  */
> + clone_flags &= ~CLONE_NEWPID;
> +

Well, emm. Eric already tried to solve this issue in the similar way
(http://lkml.org/lkml/2007/10/26/414), but I have recently sent a
more generic patch set. It turns all the namespaces off with the 
config options, but Andrew said to wait until the next -mm tree to 
rework the set.

With this set we'll be able to mark pid namespaces as EXPERIMENTAL
or even BROKEN, so nobody will be able to crate them. So can we, please, 
keep things as they are for now - the appropriate fix will be ready 
soon.

Thanks,
Pavel

>   if (unlikely(current->ptrace)) {
>   trace = fork_traceflag (clone_flags);
>   if (trace)
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ingo Molnar wrote:
> + clone_flags &= ~CLONE_NEWPID;

I think the call should rather fail than silently drop the bit but aside
from that I agree.  The problems we'd run into if the feature is getting
used as-is are severe.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKehh2ijCOnn/RHQRAqHAAJkBu7Uj8T5J2ZlLty096zXH7IVcwACfRhlt
EpwnZ1UodJXJiPpxGN8FEYo=
=S/kB
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ingo Molnar wrote:
 while checking recent commits to the kernel core i took a look at the 
 PID namespaces implementation, and it has a fatal flaw: it breaks 
 futexes and various libraries (and other stuff) that use PIDs as the 
 means of identifying tasks, by not providing any means of global 
 identification that works across PID namespaces. (PIDs _are_ a very 
 convenient and global way of identifying contexts.)
 
 i asked Ulrich about this and it turns out he has warned about this 
 early on:
 
   http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html
 
 but this problem is still present in the code, and it has been recently 
 committed into mainline via:
 
   commit 30e49c263e36341b60b735cbef5ca37912549264
   Author: Pavel Emelyanov [EMAIL PROTECTED]
   Date:   Thu Oct 18 23:40:10 2007 -0700
 
   pid namespaces: allow cloning of new namespace
 
 without these problems having been resolved. A full-scale revert is 
 probably too intrusive, but at minimum we need to turn off user-space 
 access to this feature via this simple patch. Until this issue is 
 resolved properly the new PID namespace code needs to be turned off. 
 Letting this into 2.6.24 would be a disaster.
 
 Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
 ---
  kernel/fork.c |8 
  1 file changed, 8 insertions(+)
 
 Index: v/kernel/fork.c
 ===
 --- v.orig/kernel/fork.c
 +++ v/kernel/fork.c
 @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags,
   int trace = 0;
   long nr;
  
 + /*
 +  * PID namespaces are broken at the moment: they do not allow
 +  * certain PID based syscalls (such as futexes) to be used
 +  * across namespaces. This is broken and must not be allowed,
 +  * so we keep this feature turned off until it's properly fixed.
 +  */
 + clone_flags = ~CLONE_NEWPID;
 +

Well, emm. Eric already tried to solve this issue in the similar way
(http://lkml.org/lkml/2007/10/26/414), but I have recently sent a
more generic patch set. It turns all the namespaces off with the 
config options, but Andrew said to wait until the next -mm tree to 
rework the set.

With this set we'll be able to mark pid namespaces as EXPERIMENTAL
or even BROKEN, so nobody will be able to crate them. So can we, please, 
keep things as they are for now - the appropriate fix will be ready 
soon.

Thanks,
Pavel

   if (unlikely(current-ptrace)) {
   trace = fork_traceflag (clone_flags);
   if (trace)
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ingo Molnar wrote:
 + clone_flags = ~CLONE_NEWPID;

I think the call should rather fail than silently drop the bit but aside
from that I agree.  The problems we'd run into if the feature is getting
used as-is are severe.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKehh2ijCOnn/RHQRAqHAAJkBu7Uj8T5J2ZlLty096zXH7IVcwACfRhlt
EpwnZ1UodJXJiPpxGN8FEYo=
=S/kB
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Peter Zijlstra
On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote:

  So can we, please, 
 keep things as they are for now - the appropriate fix will be ready 
 soon.

Just for the curious, could you outline on how you intend to fix this?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
 With this set we'll be able to mark pid namespaces as EXPERIMENTAL
 or even BROKEN, so nobody will be able to crate them. So can we, please, 
 keep things as they are for now - the appropriate fix will be ready 
 soon.

You sound far too optimistic for my taste.  I probably haven't seen the
proposal you have in mind but everything else I have seen simply doesn't
work without breaking something.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKek22ijCOnn/RHQRAgJIAJ0VYwYHUKdEcnKfZHDdaUr5HTnk9QCgghTH
n57LDahLDIIVIIlkrwNVLLQ=
=2xU8
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ingo Molnar wrote:
 while checking recent commits to the kernel core i took a look at the 
 PID namespaces implementation, and it has a fatal flaw: it breaks 
 futexes and various libraries (and other stuff) that use PIDs as the 
 means of identifying tasks, by not providing any means of global 
 identification that works across PID namespaces. (PIDs _are_ a very 

You're not 100% correct here. The task_pid_nr() does return you a
unique pid, so you do have the way to identify the task.

Another thing - you should *not* allow tasks to communicate across
pid namespaces using any pids - this just breaks the pid namespaces
idea.

As far as the futexes are concerned - I do not allow threads live 
in different pid namespaces (more correct fix would be not to allow 
tasks share the mm_struct across pid namespaces, but this is a one
line fix), so the situation when you have two threads in different 
namespaces is impossible.

Thanks,
Pavel

 convenient and global way of identifying contexts.)
 
 i asked Ulrich about this and it turns out he has warned about this 
 early on:
 
   http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html
 
 but this problem is still present in the code, and it has been recently 
 committed into mainline via:
 
   commit 30e49c263e36341b60b735cbef5ca37912549264
   Author: Pavel Emelyanov [EMAIL PROTECTED]
   Date:   Thu Oct 18 23:40:10 2007 -0700
 
   pid namespaces: allow cloning of new namespace
 
 without these problems having been resolved. A full-scale revert is 
 probably too intrusive, but at minimum we need to turn off user-space 
 access to this feature via this simple patch. Until this issue is 
 resolved properly the new PID namespace code needs to be turned off. 
 Letting this into 2.6.24 would be a disaster.
 
 Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
 ---
  kernel/fork.c |8 
  1 file changed, 8 insertions(+)
 
 Index: v/kernel/fork.c
 ===
 --- v.orig/kernel/fork.c
 +++ v/kernel/fork.c
 @@ -1420,6 +1420,14 @@ long do_fork(unsigned long clone_flags,
   int trace = 0;
   long nr;
  
 + /*
 +  * PID namespaces are broken at the moment: they do not allow
 +  * certain PID based syscalls (such as futexes) to be used
 +  * across namespaces. This is broken and must not be allowed,
 +  * so we keep this feature turned off until it's properly fixed.
 +  */
 + clone_flags = ~CLONE_NEWPID;
 +
   if (unlikely(current-ptrace)) {
   trace = fork_traceflag (clone_flags);
   if (trace)
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ulrich Drepper wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Pavel Emelyanov wrote:
 With this set we'll be able to mark pid namespaces as EXPERIMENTAL
 or even BROKEN, so nobody will be able to crate them. So can we, please, 
 keep things as they are for now - the appropriate fix will be ready 
 soon.
 
 You sound far too optimistic for my taste.  I probably haven't seen the
 proposal you have in mind but everything else I have seen simply doesn't
 work without breaking something.

The fix I mention is just returning -EINVAL in case user orders 
CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
is just a more elegant way to get rid of pid namespaces rather than 
Ingo proposed.

Here's the root of the set:
http://lkml.org/lkml/2007/10/31/118

Thanks,
Pavel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ingo Molnar

* Ulrich Drepper [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
  +   clone_flags = ~CLONE_NEWPID;
 
 I think the call should rather fail than silently drop the bit but 
 aside from that I agree.  The problems we'd run into if the feature is 
 getting used as-is are severe.

does the patch below look OK to you?

Ingo

---
From: Ingo Molnar [EMAIL PROTECTED]
Subject: PID namespaces: turn them off for now

while checking recent commits to the kernel core i took a look at the 
PID namespaces implementation, and it has a fatal flaw: it breaks 
futexes and various libraries (and other stuff) that use PIDs as the 
means of identifying tasks, by not providing any means of global 
identification that works across PID namespaces. (PIDs _are_ a very 
convenient and global way of identifying contexts.)

i asked Ulrich about this and it turns out he has warned about this 
early on:

  http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html

but this problem is still present in the code, and it has been recently 
committed into mainline via:

  commit 30e49c263e36341b60b735cbef5ca37912549264
  Author: Pavel Emelyanov [EMAIL PROTECTED]
  Date:   Thu Oct 18 23:40:10 2007 -0700

  pid namespaces: allow cloning of new namespace

without these problems having been resolved. A full-scale revert is 
probably too intrusive, but at minimum we need to turn off user-space 
access to this feature via this simple patch. Until this issue is 
resolved properly the new PID namespace code needs to be turned off. 
Letting this into 2.6.24 would be a disaster.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 kernel/fork.c |9 +
 1 file changed, 9 insertions(+)

Index: v/kernel/fork.c
===
--- v.orig/kernel/fork.c
+++ v/kernel/fork.c
@@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags,
int trace = 0;
long nr;
 
+   /*
+* PID namespaces are broken at the moment: they do not allow
+* certain PID based syscalls (such as futexes) to be used
+* across namespaces. This is broken and must not be allowed,
+* so we keep this feature turned off until it's properly fixed.
+*/
+   if (clone_flags  CLONE_NEWPID)
+   return -ENOSYS;
+
if (unlikely(current-ptrace)) {
trace = fork_traceflag (clone_flags);
if (trace)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Peter Zijlstra wrote:
 On Thu, 2007-11-01 at 17:51 +0300, Pavel Emelyanov wrote:
 
  So can we, please, 
 keep things as they are for now - the appropriate fix will be ready 
 soon.
 
 Just for the curious, could you outline on how you intend to fix this?

I have already answered to Ulrich about this. Just to
make this sub-thread consistent:

The fix I mention is just returning -EINVAL in case user orders 
CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
is just a more elegant way to get rid of pid namespaces rather than 
Ingo proposed.

Here's the root of the set:
http://lkml.org/lkml/2007/10/31/118

Thanks,
Pavel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ingo Molnar

* Pavel Emelyanov [EMAIL PROTECTED] wrote:

 The fix I mention is just returning -EINVAL in case user orders 
 CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
 is just a more elegant way to get rid of pid namespaces rather than 
 Ingo proposed.

unfortunately i have to NACK that approach. We never allowed broken 
user-space visible APIs into the kernel like that because it just gives 
a vector for that breakage to become de-facto used and forced upon the 
core kernel. Even if they can be .config turned off. That's just a lame 
excuse that delays the fixing of it. We may mark features that have a 
good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark 
drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont 
introduce new core syscall features with CONFIG_BROKEN! We never did and 
i hope we never will.

The _only_ way to force the fixing of such type of breakages is to not 
offer them _at all_. Really, you are proposing a major new extension to 
lots of important core Linux APIs so please try to solve this problem 
cleanly, it's really severe. Right now as things stand this containers 
sub-feature is a little bit pregnant. This is one of the few cases 
where we really _must_ say no.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Pavel Emelyanov
Ingo Molnar wrote:
 * Pavel Emelyanov [EMAIL PROTECTED] wrote:
 
 The fix I mention is just returning -EINVAL in case user orders 
 CLONE_NEWPIDS and compiling out all the namespace cloning code. This 
 is just a more elegant way to get rid of pid namespaces rather than 
 Ingo proposed.
 
 unfortunately i have to NACK that approach. We never allowed broken 
 user-space visible APIs into the kernel like that because it just gives 
 a vector for that breakage to become de-facto used and forced upon the 
 core kernel. Even if they can be .config turned off. That's just a lame 
 excuse that delays the fixing of it. We may mark features that have a 
 good expectation to be fixed as CONFIG_EXPERIMENTAL, and we may mark 

Pid namespaces have more than a good expectations to be fixed, so
feel free to mark the (currently pending) PID_NS config option with
depends on EXPERIMENTAL.

All the problems I know are slowly getting fixed, but most of them are
just related to bad pid value is reported to the user space when we 
work inside some new namespace.

Unfortunately, as I can see, all the discussions of pid namespaces 
happen behind my bask, so all I can is just fix the problems I'm 
aware of.

 drivers that nobody maintains anymore as CONFIG_BROKEN, but we dont 
 introduce new core syscall features with CONFIG_BROKEN! We never did and 
 i hope we never will.
 
 The _only_ way to force the fixing of such type of breakages is to not 
 offer them _at all_. Really, you are proposing a major new extension to 
 lots of important core Linux APIs so please try to solve this problem 
 cleanly, it's really severe. Right now as things stand this containers 

I'm sure, that no *new* problems appear in case you don't enter the new
namespace, so just disable the pid space creation code (with EXPERIMENTAL 
option) and live with the original kernel.

If you point me one, I'd be glad to fix it.

 sub-feature is a little bit pregnant. This is one of the few cases 
 where we really _must_ say no.
 
   Ingo
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Dave Hansen
On Thu, 2007-11-01 at 07:56 -0700, Ulrich Drepper wrote:
 Pavel Emelyanov wrote:
  With this set we'll be able to mark pid namespaces as EXPERIMENTAL
  or even BROKEN, so nobody will be able to crate them. So can we, please, 
  keep things as they are for now - the appropriate fix will be ready 
  soon.
 
 You sound far too optimistic for my taste.  I probably haven't seen the
 proposal you have in mind but everything else I have seen simply doesn't
 work without breaking something.

Yeah, we definitely realize that this inhibits things that were
perfectly fine before.  

As Eric mentioned in his reply to your message last year, the primary
goal here is isolation.  We'd eventually like to be able to pick a
container up and move it to another system.  That's going to be awfully
hard if the container is sharing a resource with a part of the system
which is not moving.

Pid namespaces (along with the others) give us the isolation to keep
these interactions from happening except in a controlled manner,
breaking the ties that might bind it to one particular system.

Think of how many user-visible apis deal with files and filenames.
However, there can certainly be files that are unavailable to certain
processes based on their membership in a particular filesystem
namespaces.  In fact, we use chroot() to try and _make_ certain files
unavailable.

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Theodore Tso
On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote:
 + if (clone_flags  CLONE_NEWPID)
 + return -ENOSYS;

I'd use EINVAL instead of ENOSYS...

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ingo Molnar

* Theodore Tso [EMAIL PROTECTED] wrote:

 On Thu, Nov 01, 2007 at 04:05:37PM +0100, Ingo Molnar wrote:
  +   if (clone_flags  CLONE_NEWPID)
  +   return -ENOSYS;
 
 I'd use EINVAL instead of ENOSYS...

ok, updated patch below.

Ingo


From: Ingo Molnar [EMAIL PROTECTED]
Subject: PID namespaces: turn them off for now

while checking recent commits to the kernel core i took a look at the 
PID namespaces implementation, and it has a fatal flaw: it breaks 
futexes and various libraries (and other stuff) that use PIDs as the 
means of identifying tasks, by not providing any means of global 
identification that works across PID namespaces. (PIDs _are_ a very 
convenient and global way of identifying contexts.)

i asked Ulrich about this and it turns out he has warned about this 
early on:

  http://www.nabble.com/Re%3A-question%3A-pid-space-semantics.-p3409990.html

but this problem is still present in the code, and it has been recently 
committed into mainline via:

  commit 30e49c263e36341b60b735cbef5ca37912549264
  Author: Pavel Emelyanov [EMAIL PROTECTED]
  Date:   Thu Oct 18 23:40:10 2007 -0700

  pid namespaces: allow cloning of new namespace

without these problems having been resolved. A full-scale revert is 
probably too intrusive, but at minimum we need to turn off user-space 
access to this feature via this simple patch. Until this issue is 
resolved properly the new PID namespace code needs to be turned off. 
Letting this into 2.6.24 would be a disaster.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 kernel/fork.c |9 +
 1 file changed, 9 insertions(+)

Index: v/kernel/fork.c
===
--- v.orig/kernel/fork.c
+++ v/kernel/fork.c
@@ -1420,6 +1420,15 @@ long do_fork(unsigned long clone_flags,
int trace = 0;
long nr;
 
+   /*
+* PID namespaces are broken at the moment: they do not allow
+* certain PID based syscalls (such as futexes) to be used
+* across namespaces. This is broken and must not be allowed,
+* so we keep this feature turned off until it's properly fixed.
+*/
+   if (clone_flags  CLONE_NEWPID)
+   return -EINVAL;
+
if (unlikely(current-ptrace)) {
trace = fork_traceflag (clone_flags);
if (trace)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pavel Emelyanov wrote:
 The fix I mention is just returning -EINVAL in case user orders 
 CLONE_NEWPIDS

That is the fix you were referring to?  I was hoping you have a sketch
for a real solution.  If nobody can think of a way to fix this PID
namespaces are IMO not something which should go in at all.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKm2R2ijCOnn/RHQRAgjXAKCkU9lcWC9aTR0nG89x47AZO9pVfwCgiaVC
/Giyp+en+VbtfFyD8D6v4Xk=
=RnIw
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PID namespace design bug, workaround

2007-11-01 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ingo Molnar wrote:

 but this problem is still present in the code, and it has been recently 
 committed into mainline via:
 
   commit 30e49c263e36341b60b735cbef5ca37912549264
   Author: Pavel Emelyanov [EMAIL PROTECTED]
   Date:   Thu Oct 18 23:40:10 2007 -0700
 
   pid namespaces: allow cloning of new namespace
 
 without these problems having been resolved. A full-scale revert is 
 probably too intrusive, but at minimum we need to turn off user-space 
 access to this feature via this simple patch. Until this issue is 
 resolved properly the new PID namespace code needs to be turned off. 
 Letting this into 2.6.24 would be a disaster.
 
 Signed-off-by: Ingo Molnar [EMAIL PROTECTED]

Acked-by: Ulrich Drepper [EMAIL PROTECTED]


- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHKm3n2ijCOnn/RHQRAn7dAJ9PhfhLg29mTELwH7qLXwgJcyNi9QCgr7sc
WQa4QBNesktzPKh5vcCulhM=
=cYnF
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/