On Mon, Dec 8, 2008 at 5:20 PM, Daniel Phillips <[EMAIL PROTECTED]> wrote: > On Monday 08 December 2008 13:02, Mike Snitzer wrote: >> 2008/12/8 Daniel Phillips <[EMAIL PROTECTED]>: >> > This updated patch implements an instantiate variant that takes care of >> > the orphan dirent problem (unlinked while open) by implementing a >> > variant of d_instantiate that unhashes the orphan and returns a clone of >> > the open dirent in the rare case that somebody creates a entry of the >> > same name before the orphan closes: >> >> Not to hijack this thread with a general tux3 design question related >> to orphaned inodes but: >> >> In reviewing http://userweb.kernel.org/~hirofumi/tux3/doc/design.html >> I saw that forward logging should enable: >> "logging orphan inodes that are unlinked while open, so they can be >> deleted on replay after a crash." >> >> and >> >> "One traditional nasty case that becomes really nice with logical >> forward logging is truncate of a gigantic file. We just need to commit >> a logical update like ['resize', inum, 0] then the inode data truncate >> can proceed as convenient. Another is orphan inode handling where an >> open file has been completely unlinked, in which case we log the >> logical change ['free', inum] then proceed with the actual delete when >> the file is closed or when the log is replayed after a surprise >> reboot." >> >> So putting my distributed filesystem hat on: One unfortunate aspect of >> ext3 is that orphaned inode processing after a crash blindly deletes >> all inodes with n_link==0. This is a problem if a remote client >> application still has the orphaned inode open but the filesystem was >> unmounted (either forcibly in the case of a Linux crash; or cleanly if >> write access to the fs was revoked on a given server, e.g. filesystem >> ownership migrated to another server). It is a problem because the >> new owning server will re-mount the fs and the conventional orphaned >> inode processing will cleanup the orphaned inodes out from underneath >> the remote client application; whereby breaking the application. >> >> So my question is, how might tux3 be trained to _not_ cleanup orphaned >> inodes on re-mount like conventional Linux fileystems? Could a >> re-mount filter be added that would trap and then somehow reschedule >> tux3's deferred delete of orphan inodes? This would leave a window of >> time for an exposed hook to be called (by an upper layer) to >> reconstitute a reference on each orphaned inode that is still open. > > Something like the NFS silly rename problem. There, the client avoids > closing a file by renaming it instead, which creates a cleanup problem. > Something more elegant ought to be possible. > > If the dirent is gone, leaving an orphaned inode, and the filesystem > has been convinced not to delete the orphan on restart, how would you > re-open the file? Open by inode number from within kernel?
Well, in a distributed filesystem the server-side may not even have the notion of open or closed; the client is concerned with such details. But yes, some mechanism to read the orphaned inode off the disk into memory. E.g. iget5_locked, linux gives you enough rope to defeat n_link==0, combined with a call to read_inode() (ext3_read_inode() became ext3_iget()). Unfortunately to read orphaned inodes with ext3 that requires clearing the EXT3_ORPHAN_FS flag in the super_block's s_mount_state. It is all quite ugly; and maybe a corner-case that tux3 doesn't need/want to worry about? Mike _______________________________________________ Tux3 mailing list [email protected] http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
