On Mon, 15 Jun 2015 12:49:16 +0200, Mateusz Guzik mjgu...@gmail.com wrote:
On Mon, Jun 15, 2015 at 09:53:53AM +, Bjoern A. Zeeb wrote:
Hi,
removed hackers, added virtualization.
On 12 Jun 2015, at 01:17 , kikuc...@uranus.dti.ne.jp wrote:
Hello,
I’m (still) trying to figure out how jail-aware SysV IPC mechanism should
be.
The best way probably is to finally get the “common” VIMAGE framework into
HEAD to allow easy virtualisation of other services. That work has been
sitting in perforce for a few years and simply needs updating for sysctls I
think.
Then use that to virtualise things and have a vipc like we have vnets. The
good news is that you have identified most places and have the cleanup
functions already so it’d be a matter of transforming your changes (assuming
they are correct and working fine; haven’t actually read the patch in
detail;-) to the different infrastructure. And that’s the easiest part.
I have not looked at vimage too closely, maybe indeed it's the right to
go. Would definitely be interested in seeing it cleaned up and in
action.
In the meantime, as I tried to explain in the previous thread, a
jail-aware sysvshm poses several questions which need to be
answered/taken care of before it can hit the tree. I doubt any
reasonable implementation can magically avoid problems they pose and I
definitely want to get an analysis how proposed implementation behaves
(or how it prevents given scenario from occuring).
Fundamentally the basic question is how does the implementation cope
with processes having sysvshm mappings obtained from 2 different jails
(provided they use different sysvshms).
Preferably the whole business would be /prevented/. Prevention mechanism
would have to deal with shared address spaces (rfork(2) + RFMEM),
threads and pre-existing mappings.
The patch posted here just puts permission checks in several places,
while leaving the namespace shared, which I find to be a user-visible
hack with no good justification. There is also no analysis how this
behaves when presented with aforementioned scenario. Even if it turns
out the resut is harmless with resulting code, this leaves us with a
very error-prone scheme.
There is no technical problem adding a pointer to struct prison and
dereferencing it instead of current global vars. Adding proper sysctls
dumping the content for given jail is trivial and so is providing
resource limits when creating a first-level jail with a separate
sysvshm. Something which cannot be as easily achieved with the patch in
question.
Possible later switch to vimage would be transparent to users.
Dear Mateusz,
I'm sorry if I'm annoying you, but I really want to solve this problems.
Fundamentally the basic question is how does the implementation cope
with processes having sysvshm mappings obtained from 2 different jails
(provided they use different sysvshms).
Preferably the whole business would be /prevented/. Prevention mechanism
would have to deal with shared address spaces (rfork(2) + RFMEM),
threads and pre-existing mappings.
The patch posted here just puts permission checks in several places,
while leaving the namespace shared, which I find to be a user-visible
hack with no good justification. There is also no analysis how this
behaves when presented with aforementioned scenario. Even if it turns
out the resut is harmless with resulting code, this leaves us with a
very error-prone scheme.
There is no technical problem adding a pointer to struct prison and
dereferencing it instead of current global vars. Adding proper sysctls
dumping the content for given jail is trivial and so is providing
resource limits when creating a first-level jail with a separate
sysvshm. Something which cannot be as easily achieved with the patch in
question.
Could you try the latest patch, please?
I justify user-visibility, make it hierarchical jail friendly, and use EINVAL
instead of EACCES to conceal information leak.
https://bz-attachments.freebsd.org/attachment.cgi?id=157661 (typo fixed)
I realized my method is a bit better, when I'm trying to port/write the real
namespace separation.
Let me explain (again) why I choose this method for sysv ipc, and could you
tell me how it should be, please?
struct shmmap_state {
vm_offset_t va;
int shmid;
};
In sysv_shm.c, struct shmmap_state, exist per process as p-p_vmspace-vm_shm,
is a lookup-table for va - shm object lookup.
The shmmap_state entry holds a reference (here, shmid) to shm object for
further detach, and entries are simply copied on fork.
If you split namespace (includes shmid space) completely, shmid would be no
longer a unique identifier for IPC object in kernel.
To make it unique, adding a reference to prison into shmmap_state like this;
struct shmmap_state {
vm_offset_t va;
struct prison *prison;
int shmid;
};
would be bad idea, because after a