Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Pekka J Enberg
Hi,

On Wed, 11 Jul 2007, Al Viro wrote:
> The fundamental issue here is that even if you do find struct file,
> you can't blindly rip its ->f_mapping since it can be in the middle
> of ->read(), ->write(), pageout, etc.  And even if you do manage
> that, you still have the ability to do fchmod() later.

Then we would need to change the VFS and relevant parts so that we can 
take down ->f_mapping. I don't see how we could do that without affecting 
current hotpaths. Hmm. I suppose what we really need to do is cannibalize 
the actual inode (remove from inode cache, detach from dentry and take 
down the mapping) so that we don't have to touch existing struct file 
pointers at all.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Al Viro
On Wed, Jul 11, 2007 at 01:01:07PM +0300, Pekka J Enberg wrote:
> On Wed, 11 Jul 2007, Al Viro wrote:
> > BTW, read() or write() in progress might get rather unhappy if your
> > live replacement of ->f_mapping races with them...
> 
> For writes, we (1) never start any new operations after we've cleaned up 
> the file descriptor tables so (2) after we're done with do_fsync() we 
> never touch ->f_mapping again.

Er, no.  do_fsync() won't hit the sys_write() that is yet to enter
->write().  And you can't get rid of new callers _anyway_ (see
previous mail).
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Al Viro
On Wed, Jul 11, 2007 at 12:50:48PM +0300, Pekka J Enberg wrote:
> Hi Al,
> 
> On Wed, 11 Jul 2007, Al Viro wrote:
> > Better: I have the only opened descriptor for foo.  I send it to myself
> > as described above.  I close it.  revoke() is called, finds no opened
> > instances of foo in any descriptor tables and cheerfully does nothing.
> > I call recvmsg() and I have completely undamaged opened file back.
> 
> Uhm, nice. So, revoke() needs a proper inode -> struct files mapping 
> somewhere. Can we add a list of files to struct inode? Are there other 
> cases where a file can point to an inode but the file is not attached to 
> any file descriptor?

Umm...  Any number, really - it might be in the middle of syscall
while another task sharing descriptor table has closed the descriptor.
Then there's quota, then there's process accounting, then there's
execve() in progress, then there's knfsd working with that struct
file, etc.

The fundamental issue here is that even if you do find struct file,
you can't blindly rip its ->f_mapping since it can be in the middle
of ->read(), ->write(), pageout, etc.  And even if you do manage
that, you still have the ability to do fchmod() later.

I don't see how the ability to find all instances in SCM_RIGHTS
datagrams (for example) will help you with the race I've described
first.  Original state: task B has the only reference to file.
revoke() is called, passes task A.  B sends datagram to A and closes
file.  A receives datagram.  Now the only reference is in A's table
and you've already passed that.

So you can't avoid processes keeping pointers to struct file.
If you could find all struct file over given inode (which, I suspect,
will lead to interesting locking), you could call something on
that struct file, but you'd have zero exclusion with processes
calling methods on it.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Pekka J Enberg
On Wed, 11 Jul 2007, Al Viro wrote:
> BTW, read() or write() in progress might get rather unhappy if your
> live replacement of ->f_mapping races with them...

For writes, we (1) never start any new operations after we've cleaned up 
the file descriptor tables so (2) after we're done with do_fsync() we 
never touch ->f_mapping again.

But for reads, I think there's a problem if we're in 
do_generic_mapping_read() doing invalidate_inode_pages2() is not enough 
because we're hanging on to the real mapping. Hmm.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Pekka J Enberg
Hi Al,

On Wed, 11 Jul 2007, Al Viro wrote:
> Better: I have the only opened descriptor for foo.  I send it to myself
> as described above.  I close it.  revoke() is called, finds no opened
> instances of foo in any descriptor tables and cheerfully does nothing.
> I call recvmsg() and I have completely undamaged opened file back.

Uhm, nice. So, revoke() needs a proper inode -> struct files mapping 
somewhere. Can we add a list of files to struct inode? Are there other 
cases where a file can point to an inode but the file is not attached to 
any file descriptor?

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Al Viro
On Wed, Jul 11, 2007 at 10:37:33AM +0100, Al Viro wrote:
> On Wed, Jul 11, 2007 at 12:01:06PM +0300, Pekka J Enberg wrote:
> > From: Pekka Enberg <[EMAIL PROTECTED]>
> > 
> > The revokeat(2) and frevoke(2) system calls invalidate open file descriptors
> > and shared mappings of an inode.  After an successful revocation, operations
> > on file descriptors fail with the EBADF or ENXIO error code for regular and
> > device files, respectively.  Attempting to read from or write to a revoked
> > mapping causes SIGBUS.
> > 
> > The actual operation is done in two passes:
> > 
> >  1. Revoke all file descriptors that point to the given inode. We do
> > this under tasklist_lock so that after this pass, we don't need
> > to worry about racing with close(2) or dup(2).
> 
> How does that deal with the following:
> 
> task A gets its descriptor table cleansed
> task B sends a descriptor to task A via SCM_RIGHTS datagram
> task B gets its descriptor table cleansed
> task A receives the datagram and gets the descriptor installed
> task A does any kind of IO on that descriptor
> ->f_mapping gets replaced in the most inconvenient time.
> 
> Come to think of that, what do you do if I create a socketpair,
> stuff the descriptor into SCM_RIGHTS datagram and send it to
> myself?  Then receive it a day after you've called revoke() and
> voila - I've got an almost undamaged struct file back...  At
> the very least, it allows me to fchmod().  Or fchdir() if that
> had been a directory...
> 
> BTW, read() or write() in progress might get rather unhappy if your
> live replacement of ->f_mapping races with them...

Better: I have the only opened descriptor for foo.  I send it to myself
as described above.  I close it.  revoke() is called, finds no opened
instances of foo in any descriptor tables and cheerfully does nothing.
I call recvmsg() and I have completely undamaged opened file back.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Al Viro
On Wed, Jul 11, 2007 at 12:01:06PM +0300, Pekka J Enberg wrote:
> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> The revokeat(2) and frevoke(2) system calls invalidate open file descriptors
> and shared mappings of an inode.  After an successful revocation, operations
> on file descriptors fail with the EBADF or ENXIO error code for regular and
> device files, respectively.  Attempting to read from or write to a revoked
> mapping causes SIGBUS.
> 
> The actual operation is done in two passes:
> 
>  1. Revoke all file descriptors that point to the given inode. We do
> this under tasklist_lock so that after this pass, we don't need
> to worry about racing with close(2) or dup(2).

How does that deal with the following:

task A gets its descriptor table cleansed
task B sends a descriptor to task A via SCM_RIGHTS datagram
task B gets its descriptor table cleansed
task A receives the datagram and gets the descriptor installed
task A does any kind of IO on that descriptor
->f_mapping gets replaced in the most inconvenient time.

Come to think of that, what do you do if I create a socketpair,
stuff the descriptor into SCM_RIGHTS datagram and send it to
myself?  Then receive it a day after you've called revoke() and
voila - I've got an almost undamaged struct file back...  At
the very least, it allows me to fchmod().  Or fchdir() if that
had been a directory...

BTW, read() or write() in progress might get rather unhappy if your
live replacement of ->f_mapping races with them...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html