Re: [RFC/PATCH 2/5] revoke: core code
Hi, On Wed, 11 Jul 2007, Al Viro wrote: > The fundamental issue here is that even if you do find struct file, > you can't blindly rip its ->f_mapping since it can be in the middle > of ->read(), ->write(), pageout, etc. And even if you do manage > that, you still have the ability to do fchmod() later. Then we would need to change the VFS and relevant parts so that we can take down ->f_mapping. I don't see how we could do that without affecting current hotpaths. Hmm. I suppose what we really need to do is cannibalize the actual inode (remove from inode cache, detach from dentry and take down the mapping) so that we don't have to touch existing struct file pointers at all. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 2/5] revoke: core code
On Wed, Jul 11, 2007 at 01:01:07PM +0300, Pekka J Enberg wrote: > On Wed, 11 Jul 2007, Al Viro wrote: > > BTW, read() or write() in progress might get rather unhappy if your > > live replacement of ->f_mapping races with them... > > For writes, we (1) never start any new operations after we've cleaned up > the file descriptor tables so (2) after we're done with do_fsync() we > never touch ->f_mapping again. Er, no. do_fsync() won't hit the sys_write() that is yet to enter ->write(). And you can't get rid of new callers _anyway_ (see previous mail). - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 2/5] revoke: core code
On Wed, Jul 11, 2007 at 12:50:48PM +0300, Pekka J Enberg wrote: > Hi Al, > > On Wed, 11 Jul 2007, Al Viro wrote: > > Better: I have the only opened descriptor for foo. I send it to myself > > as described above. I close it. revoke() is called, finds no opened > > instances of foo in any descriptor tables and cheerfully does nothing. > > I call recvmsg() and I have completely undamaged opened file back. > > Uhm, nice. So, revoke() needs a proper inode -> struct files mapping > somewhere. Can we add a list of files to struct inode? Are there other > cases where a file can point to an inode but the file is not attached to > any file descriptor? Umm... Any number, really - it might be in the middle of syscall while another task sharing descriptor table has closed the descriptor. Then there's quota, then there's process accounting, then there's execve() in progress, then there's knfsd working with that struct file, etc. The fundamental issue here is that even if you do find struct file, you can't blindly rip its ->f_mapping since it can be in the middle of ->read(), ->write(), pageout, etc. And even if you do manage that, you still have the ability to do fchmod() later. I don't see how the ability to find all instances in SCM_RIGHTS datagrams (for example) will help you with the race I've described first. Original state: task B has the only reference to file. revoke() is called, passes task A. B sends datagram to A and closes file. A receives datagram. Now the only reference is in A's table and you've already passed that. So you can't avoid processes keeping pointers to struct file. If you could find all struct file over given inode (which, I suspect, will lead to interesting locking), you could call something on that struct file, but you'd have zero exclusion with processes calling methods on it. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 2/5] revoke: core code
On Wed, 11 Jul 2007, Al Viro wrote: > BTW, read() or write() in progress might get rather unhappy if your > live replacement of ->f_mapping races with them... For writes, we (1) never start any new operations after we've cleaned up the file descriptor tables so (2) after we're done with do_fsync() we never touch ->f_mapping again. But for reads, I think there's a problem if we're in do_generic_mapping_read() doing invalidate_inode_pages2() is not enough because we're hanging on to the real mapping. Hmm. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 2/5] revoke: core code
Hi Al, On Wed, 11 Jul 2007, Al Viro wrote: > Better: I have the only opened descriptor for foo. I send it to myself > as described above. I close it. revoke() is called, finds no opened > instances of foo in any descriptor tables and cheerfully does nothing. > I call recvmsg() and I have completely undamaged opened file back. Uhm, nice. So, revoke() needs a proper inode -> struct files mapping somewhere. Can we add a list of files to struct inode? Are there other cases where a file can point to an inode but the file is not attached to any file descriptor? Pekka - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 2/5] revoke: core code
On Wed, Jul 11, 2007 at 10:37:33AM +0100, Al Viro wrote: > On Wed, Jul 11, 2007 at 12:01:06PM +0300, Pekka J Enberg wrote: > > From: Pekka Enberg <[EMAIL PROTECTED]> > > > > The revokeat(2) and frevoke(2) system calls invalidate open file descriptors > > and shared mappings of an inode. After an successful revocation, operations > > on file descriptors fail with the EBADF or ENXIO error code for regular and > > device files, respectively. Attempting to read from or write to a revoked > > mapping causes SIGBUS. > > > > The actual operation is done in two passes: > > > > 1. Revoke all file descriptors that point to the given inode. We do > > this under tasklist_lock so that after this pass, we don't need > > to worry about racing with close(2) or dup(2). > > How does that deal with the following: > > task A gets its descriptor table cleansed > task B sends a descriptor to task A via SCM_RIGHTS datagram > task B gets its descriptor table cleansed > task A receives the datagram and gets the descriptor installed > task A does any kind of IO on that descriptor > ->f_mapping gets replaced in the most inconvenient time. > > Come to think of that, what do you do if I create a socketpair, > stuff the descriptor into SCM_RIGHTS datagram and send it to > myself? Then receive it a day after you've called revoke() and > voila - I've got an almost undamaged struct file back... At > the very least, it allows me to fchmod(). Or fchdir() if that > had been a directory... > > BTW, read() or write() in progress might get rather unhappy if your > live replacement of ->f_mapping races with them... Better: I have the only opened descriptor for foo. I send it to myself as described above. I close it. revoke() is called, finds no opened instances of foo in any descriptor tables and cheerfully does nothing. I call recvmsg() and I have completely undamaged opened file back. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 2/5] revoke: core code
On Wed, Jul 11, 2007 at 12:01:06PM +0300, Pekka J Enberg wrote: > From: Pekka Enberg <[EMAIL PROTECTED]> > > The revokeat(2) and frevoke(2) system calls invalidate open file descriptors > and shared mappings of an inode. After an successful revocation, operations > on file descriptors fail with the EBADF or ENXIO error code for regular and > device files, respectively. Attempting to read from or write to a revoked > mapping causes SIGBUS. > > The actual operation is done in two passes: > > 1. Revoke all file descriptors that point to the given inode. We do > this under tasklist_lock so that after this pass, we don't need > to worry about racing with close(2) or dup(2). How does that deal with the following: task A gets its descriptor table cleansed task B sends a descriptor to task A via SCM_RIGHTS datagram task B gets its descriptor table cleansed task A receives the datagram and gets the descriptor installed task A does any kind of IO on that descriptor ->f_mapping gets replaced in the most inconvenient time. Come to think of that, what do you do if I create a socketpair, stuff the descriptor into SCM_RIGHTS datagram and send it to myself? Then receive it a day after you've called revoke() and voila - I've got an almost undamaged struct file back... At the very least, it allows me to fchmod(). Or fchdir() if that had been a directory... BTW, read() or write() in progress might get rather unhappy if your live replacement of ->f_mapping races with them... - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html