Re: How to implement jail-aware SysV IPC (with my nasty patch)

2015-06-15 Thread Mateusz Guzik
On Mon, Jun 15, 2015 at 09:53:53AM +, Bjoern A. Zeeb wrote:
 Hi,
 
 removed hackers, added virtualization.
 
 
  On 12 Jun 2015, at 01:17 , kikuc...@uranus.dti.ne.jp wrote:
  
  Hello,
  
  I’m (still) trying to figure out how jail-aware SysV IPC mechanism should 
  be.
 
 The best way probably is to finally get the “common” VIMAGE framework into 
 HEAD to allow easy virtualisation of other services.  That work has been 
 sitting in perforce for a few years and simply needs updating for sysctls I 
 think.
 
 Then use that to virtualise things and have a vipc like we have vnets.  The 
 good news is that you have identified most places and have the cleanup 
 functions already so it’d be a matter of transforming your changes (assuming 
 they are correct and working fine; haven’t actually read the patch in 
 detail;-)  to the different infrastructure.  And that’s the easiest part.
 
 

I have not looked at vimage too closely, maybe indeed it's the right to
go. Would definitely be interested in seeing it cleaned up and in
action.

In the meantime, as I tried to explain in the previous thread, a
jail-aware sysvshm poses several questions which need to be
answered/taken care of before it can hit the tree. I doubt any
reasonable implementation can magically avoid problems they pose and I
definitely want to get an analysis how proposed implementation behaves
(or how it prevents given scenario from occuring).

Fundamentally the basic question is how does the implementation cope
with processes having sysvshm mappings obtained from 2 different jails
(provided they use different sysvshms).

Preferably the whole business would be /prevented/. Prevention mechanism
would have to deal with shared address spaces (rfork(2) + RFMEM),
threads and pre-existing mappings.

The patch posted here just puts permission checks in several places,
while leaving the namespace shared, which I find to be a user-visible
hack with no good justification. There is also no analysis how this
behaves when presented with aforementioned scenario. Even if it turns
out the resut is harmless with resulting code, this leaves us with a
very error-prone scheme.

There is no technical problem adding a pointer to struct prison and
dereferencing it instead of current global vars. Adding proper sysctls
dumping the content for given jail is trivial and so is providing
resource limits when creating a first-level jail with a separate
sysvshm. Something which cannot be as easily achieved with the patch in
question.

Possible later switch to vimage would be transparent to users.

-- 
Mateusz Guzik mjguzik gmail.com
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org

Re: How to implement jail-aware SysV IPC (with my nasty patch)

2015-06-15 Thread Bjoern A. Zeeb

 On 15 Jun 2015, at 17:10 , kikuc...@uranus.dti.ne.jp wrote:
 
 On Mon, 15 Jun 2015 09:53:53 +, Bjoern A. Zeeb 
 bzeeb-li...@lists.zabbadoz.net wrote:
 Hi,
 
 removed hackers, added virtualization.
 
 
 On 12 Jun 2015, at 01:17 , kikuc...@uranus.dti.ne.jp wrote:
 
 Hello,
 
 I’m (still) trying to figure out how jail-aware SysV IPC mechanism should 
 be.
 
 The best way probably is to finally get the “common” VIMAGE framework into 
 HEAD to allow easy virtualisation of other services.  That work has been 
 sitting in perforce for a few years and simply needs updating for sysctls I 
 think.
 
 Then use that to virtualise things and have a vipc like we have vnets.  The 
 good news is that you have identified most places and have the cleanup 
 functions already so it’d be a matter of transforming your changes (assuming 
 they are correct and working fine; haven’t actually read the patch in 
 detail;-)  to the different infrastructure.  And that’s the easiest part.
 
 
 Bjoern
 
 Hi Bjoern,
 Thank you for your reply.
 
 The common VIMAGE framework sounds good, I really want it.
 
 I want to know what the IPC system looks like for user-land after virtualized,
 and what happen if vnet like vipc is implemented.
 
 For example, jail 1, 2, 3 join vipc group A, and jail 4, 5, 6 join vipc group 
 B ??
 Hmm, it looks good.


That’s not exactly how it works currently and I think the mixing of options 
will be harder and something we’l have to figure out more carefully.
You would be able to say jail 1 has a vipc and jail 2 and 3 and “child jails” 
and inherit it.  (similar for 4 + 5,6) so it’s nested but not side-by-side.

If we want more of the “mixing” and independentness we’ll have to re-think the 
way we “manage” jails.

Bjoern
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org

Re: How to implement jail-aware SysV IPC (with my nasty patch)

2015-06-15 Thread kikuchan
On Mon, 15 Jun 2015 12:49:16 +0200, Mateusz Guzik mjgu...@gmail.com wrote:
 On Mon, Jun 15, 2015 at 09:53:53AM +, Bjoern A. Zeeb wrote:
 Hi,
 
 removed hackers, added virtualization.
 
 
  On 12 Jun 2015, at 01:17 , kikuc...@uranus.dti.ne.jp wrote:
  
  Hello,
  
  I’m (still) trying to figure out how jail-aware SysV IPC mechanism should 
  be.
 
 The best way probably is to finally get the “common” VIMAGE framework into 
 HEAD to allow easy virtualisation of other services.  That work has been 
 sitting in perforce for a few years and simply needs updating for sysctls I 
 think.
 
 Then use that to virtualise things and have a vipc like we have vnets.  The 
 good news is that you have identified most places and have the cleanup 
 functions already so it’d be a matter of transforming your changes (assuming 
 they are correct and working fine; haven’t actually read the patch in 
 detail;-)  to the different infrastructure.  And that’s the easiest part.
 
 
 
 I have not looked at vimage too closely, maybe indeed it's the right to
 go. Would definitely be interested in seeing it cleaned up and in
 action.
 
 In the meantime, as I tried to explain in the previous thread, a
 jail-aware sysvshm poses several questions which need to be
 answered/taken care of before it can hit the tree. I doubt any
 reasonable implementation can magically avoid problems they pose and I
 definitely want to get an analysis how proposed implementation behaves
 (or how it prevents given scenario from occuring).
 
 Fundamentally the basic question is how does the implementation cope
 with processes having sysvshm mappings obtained from 2 different jails
 (provided they use different sysvshms).
 
 Preferably the whole business would be /prevented/. Prevention mechanism
 would have to deal with shared address spaces (rfork(2) + RFMEM),
 threads and pre-existing mappings.
 
 The patch posted here just puts permission checks in several places,
 while leaving the namespace shared, which I find to be a user-visible
 hack with no good justification. There is also no analysis how this
 behaves when presented with aforementioned scenario. Even if it turns
 out the resut is harmless with resulting code, this leaves us with a
 very error-prone scheme.
 
 There is no technical problem adding a pointer to struct prison and
 dereferencing it instead of current global vars. Adding proper sysctls
 dumping the content for given jail is trivial and so is providing
 resource limits when creating a first-level jail with a separate
 sysvshm. Something which cannot be as easily achieved with the patch in
 question.
 
 Possible later switch to vimage would be transparent to users.

Dear Mateusz,

I'm sorry if I'm annoying you, but I really want to solve this problems.

 Fundamentally the basic question is how does the implementation cope
 with processes having sysvshm mappings obtained from 2 different jails
 (provided they use different sysvshms).
 
 Preferably the whole business would be /prevented/. Prevention mechanism
 would have to deal with shared address spaces (rfork(2) + RFMEM),
 threads and pre-existing mappings.
 
 The patch posted here just puts permission checks in several places,
 while leaving the namespace shared, which I find to be a user-visible
 hack with no good justification. There is also no analysis how this
 behaves when presented with aforementioned scenario. Even if it turns
 out the resut is harmless with resulting code, this leaves us with a
 very error-prone scheme.
 
 There is no technical problem adding a pointer to struct prison and
 dereferencing it instead of current global vars. Adding proper sysctls
 dumping the content for given jail is trivial and so is providing
 resource limits when creating a first-level jail with a separate
 sysvshm. Something which cannot be as easily achieved with the patch in
 question.

Could you try the latest patch, please?
I justify user-visibility, make it hierarchical jail friendly, and use EINVAL 
instead of EACCES to conceal information leak.
https://bz-attachments.freebsd.org/attachment.cgi?id=157661 (typo fixed)


I realized my method is a bit better, when I'm trying to port/write the real 
namespace separation.
Let me explain (again) why I choose this method for sysv ipc, and could you 
tell me how it should be, please?

struct shmmap_state {
vm_offset_t va;
int shmid;
};

In sysv_shm.c, struct shmmap_state, exist per process as p-p_vmspace-vm_shm, 
is a lookup-table for va - shm object lookup.
The shmmap_state entry holds a reference (here, shmid) to shm object for 
further detach, and entries are simply copied on fork.

If you split namespace (includes shmid space) completely, shmid would be no 
longer a unique identifier for IPC object in kernel.
To make it unique, adding a reference to prison into shmmap_state like this;

struct shmmap_state {
vm_offset_t va;
struct prison *prison;
int shmid;
};

would be bad idea, because after a