RE: [nfsv4] RE: Finding hardlinks
> From: [EMAIL PROTECTED] on behalf of Nicolas Williams > Sent: Fri 1/5/2007 18:40 > To: Halevy, Benny > Cc: Trond Myklebust; Jan Harkes; Miklos Szeredi; nfsv4@ietf.org; > linux-kernel@vger.kernel.org; Mikulas Patocka; linux-fsdevel@vger.kernel.org; > Jeff Layton; Arjan van de Ven > Subject: Re: [nfsv4] RE: Finding hardlinks > > On Thu, Jan 04, 2007 at 12:04:14PM +0200, Benny Halevy wrote: > > I agree that the way the client implements its cache is out of the protocol > > scope. But how do you interpret "correct behavior" in section 4.2.1? > > "Clients MUST use filehandle comparisons only to improve performance, not > > for correct behavior. All clients > need to be prepared for situations in > > which it cannot be determined whether two filehandles denote the same > > > object and in such cases, avoid making invalid assumptions which might > > cause incorrect behavior." > > Don't you consider data corruption due to cache inconsistency an incorrect > > behavior? > > If a file with multiple hardlinks appears to have multiple distinct > filehandles then a client like Trond's will treat it as multiple > distinct files (with the same hardlink count, and you won't be able to > find the other links to them -- oh well). Can this cause data > corruption? Yes, but only if there are applications that rely on the > different file names referencing the same file, and backup apps on the > client won't get the hardlinks right either. Well, this is why the hard links were made, no? FWIW, I believe that rename of an open file might also produce this problem. > > What I don't understand is why getting the fileid is so hard -- always > GETATTR when you GETFH and you'll be fine. I'm guessing that's not as > difficult as it is to maintain a hash table of fileids. The problem with NFS is that fileid isn't enough because the client doesn't know about removes by other clients until it uses the stale filehandle. Also, quite a few file systems are not keeping fileids unique (this triggered this thread) > > Nico > -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
> We are in the process of porting RAIF to 2.6.19 right now. Should be done > in early January. The trick is that we are trying to keep the same source > good for a wide range of kernel versions. In fact, not too long ago we > even were able to compile it for 2.4.24! > > Nikolai. We now have RAIF for the 2.6.19 kernel available at: ftp://ftp.fsl.cs.sunysb.edu/pub/raif/raif-1.1.tar.gz This version is more stable but there are for sure still some remaining bugs and we very much appreciate your feedback. Thank you. Chaitanya on behalf of the RAIF team. Filesystems and Storage Laboratory Stony Brook University - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
Miklos Szeredi <[EMAIL PROTECTED]> wrote: >> > Well, sort of. Samefile without keeping fds open doesn't have any >> > protection against the tree changing underneath between first >> > registering a file and later opening it. The inode number is more >> >> You only need to keep one-file-per-hardlink-group open during final >> verification, checking that inode hashing produced reasonable results. > > What final verification? I wasn't just talking about 'tar' but all > cases where st_ino might be used to check the identity of two files at > possibly different points in time. > > Time A:remember identity of file X > Time B:check if identity of file Y matches that of file X > > With samefile() if you open X at A, and keep it open till B, you can > accumulate large numbers of open files and the application can fail. > > If you don't keep an open file, just remember the path, then renaming > X will foil the later identity check. Changing the file at this path > between A and B can even give you a false positive. This applies to > 'tar' as well as the other uses. If you open Y, this open file descriptor will guarantee that no distinct file will have the same inode number while all hardliked files must have the same inode number. (AFAIK) Now you will check this against the list of hardlink candidates using the stored inode number. If the inode number has changed, this will result in a false negative. If you removed X, recreated it with the same inode number and linked that to Y, you'll get a false positive (which could be identified by the [mc]time changes). Samefile without keeping the files open will result in the same false positive as open+fstat+stat, while samefile with keeping the files open will occasionally overflow the files table, Therefore I think it's not worth while introducing samefile as long as the inode is unique for open files. OTOH you'll want to keep the inode number as stable as possible, since it's the only sane way to find sets of hardlinked files and some important programs may depend on it. -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. http://david.woodhou.se/why-not-spf.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: Stable inodes for inode-less filesystems (was: Finding hardlinks)
Pavel Machek <[EMAIL PROTECTED]> wrote: >> Another idea is to export the filesystem internal ID as an arbitray >> length cookie through the extended attribute interface. That could be >> stored/compared by the filesystem quite efficiently. > > How will that work for FAT? > Or maybe we can relax that "inode may not change over rename" and > "zero length files need unique inode numbers"... I didn't look into the code, and I'm not experienced in writing (linux) fs, but I have an Idea I'd like to share. Maybe it's not that bad ... (I'm going to type about inode numbers, since having constant inodes is desired and the extended attribute would only be an aid if the inode is too small.) IIRC, no cluster is reserved for empty files on FAT; if I'm wrong, it'll be much easier, you would just use the cluster-number (less than 32 bit). The basic idea is to use a different inode range for non-empty and empty files. This will result in the inode possibly changing after close()* or on rename(empty1, empty2). OTOH it will keep a stable inode for non-empty files and for newly written files** if they aren't stat()ed before writing the first byte. I'm not sure if it's better than changing inodes after $randomtime, but I just made a quick strace on gtar, rsync and cp -a; they don't look at the dest inode before it would change (or at all). (If this idea is applied to iso9660, the hard problem will be finding the number of hardlinked files for one location) Changing the inode# on the last close* can be done by purging the cache if the file is empty XOR the file has an inode# from the empty-range. (That should be the same operation as done by unlink()?) A new open(), stat() or readdir should create the correct kind of inode#. *) You can optionally just wait for the inode to expire, but you need to keep the associated reservation(s) until then. I don't expect any visible effect from doing this, but see ** from the next paragraph on how to minimize the effect. The reserved directory entry (see far below in this text) is 32 Bytes, but the fragmentation may be bad. **) which are empty on open() and therefore don't yet have a stable inode# Those inode numbers will apear to be stable because nobody saw them change. It's safe to change the inode# because by reserving disk space, we got a unique inode#. I hope the kernel side allows this ... For non-empty files, you can use the cluster-number (start address), it's unique, and won't change on rename. It will, however, change on emptying or writing to an empty file. If you write to an empty file, no special handling is required, all reservations are implicit*. If you empty a file, you'll have to keep the first cluster reserved** untill it's closed, otherwise you'd risk an inode collision. *) since the inode# doesn't change yet, you'll still have to treat it like an empty file while unlinking or renaming. **) It's OK to reuse it if it's in the middle of a file, so you may optionally keep a list of these clusters and not start files there instead of reserving the space. OTOH, it's more work. Empty files will be a PITA with 32-bit-inodes, since a full-sized FAT32 can have about 2^38 empty files*. (The extended attribute would work as described below.) You can, however, generate inode numbers for empty files, risking collisions. This requires all generated inode numbers to be above 0x4000 (or above the number of clusters on disk). *) 8 TB divided by 32 B / directory entry With 64-bit-values, you can generate an unique inode for empty files using cluster#-of-dir | 0x8000 | index_in_dir << 32. The downside is, it will change on cross-directory-renames and may change on in- directory-renames. If this happens to an open file, you'll need to make sure the old inode# is not reused by reserving that directory entry, since the inode# can't change for open files. extra operations on final close: if "empty" inode: if !empty unreserve_directory_entry(inode & 0x7fff, inode >> 32) uncache inode (will change inode#) stop if unreserve_directory_entry(inode & 0x7fff, inode >> 32) uncache inode if "non-empty" inode if empty free start cluster uncache inode extra operations on unlink/rename: if "empty" inode: if can_use_current_inode#_for_dest do it unreserve_directory_entry(inode & 0x7fff, inode >> 32) // because of "mv a/empty b/empty; mv b/empty a/empty" else if is_open_file // the current inode won't fit the new location: reserve_directory_entry(old_inode & 0x7fff, inode >> 32) extra operations on truncate if "non-empty" inode && empty_after_truncate exempt start cluster from being freed, or put it on a list of non-startclusters extra operation on extend if "empty" inode && nobody did e.g. stat() after opening this file silently change inode, nobody will notice. Racy? Possible? Required data in filehandle: Location of directory entry (d.e. contains inode information) (this shouldn't be new
Re: Is a NULL check missing in nfs_lookup?
In message <[EMAIL PROTECTED]>, Matthew Wilcox writes: > On Fri, Jan 05, 2007 at 02:10:06PM -0500, Erez Zadok wrote: > > Ah, ok. So why not put an ASSERT in there, or at least a comment, to make > > the code clearer. As it stands, anyone looking at the code in the future > > can easily rediscover this "bug" that dereferences a null ptr. > > Because anyone poking in the VFS should take the time to understand > how it works? Adding crap like BUG_ON(!nd) is pointless -- you don't > get a clearer backtrace from that than you do from a null pointer. Then stick a /* comment */ telling anyone who looks at the code what was the author's intentions, esp. for such an obvious null/uninitialized pointer deref. Source code should be as clear as possible, regardless of the expertise of its author or any reader. If you keep this code as is, the bug will get re-discovered and re-reported time and again, wasting everyone's time. Erez. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Fri, Jan 05, 2007 at 12:23:04PM -0700, Matthew Wilcox wrote: > On Fri, Jan 05, 2007 at 02:10:06PM -0500, Erez Zadok wrote: > > Ah, ok. So why not put an ASSERT in there, or at least a comment, to make > > the code clearer. As it stands, anyone looking at the code in the future > > can easily rediscover this "bug" that dereferences a null ptr. > > Because anyone poking in the VFS should take the time to understand > how it works? Adding crap like BUG_ON(!nd) is pointless -- you don't > get a clearer backtrace from that than you do from a null pointer. Except that a BUG_ON makes the author's intent clear and a NULL pointer deref doesn't. Sure, anyone poking around in the VFS should understand how it works. That doesn't mean whoever poked there before can't leave trail marks. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Fri, Jan 05, 2007 at 02:10:06PM -0500, Erez Zadok wrote: > Ah, ok. So why not put an ASSERT in there, or at least a comment, to make > the code clearer. As it stands, anyone looking at the code in the future > can easily rediscover this "bug" that dereferences a null ptr. Because anyone poking in the VFS should take the time to understand how it works? Adding crap like BUG_ON(!nd) is pointless -- you don't get a clearer backtrace from that than you do from a null pointer. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
In message <[EMAIL PROTECTED]>, Trond Myklebust writes: > On Thu, 2007-01-04 at 19:00 -0500, Chaitanya Patti wrote: > > > > Hello everyone, > > > > In the function nfs_lookup in nfs/dir.c , the following line (line # 926): > > > > error = nfs_reval_fsid(nd->mnt, dir, &fhandle, &fattr); > > > > uses `nd' without having checked if it is NULL. > > > > Is this correct? > > It is quite intentional and correct. Calling ->lookup() without correct > intent information is a bug. Ah, ok. So why not put an ASSERT in there, or at least a comment, to make the code clearer. As it stands, anyone looking at the code in the future can easily rediscover this "bug" that dereferences a null ptr. Thanks, Erez. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [nfsv4] RE: Finding hardlinks
For now, I'm not going to address the controversial issues here, mainly because I haven't decided how I feel about them yet. Whether allowing multiple filehandles per object is a good or even reasonably acceptable idea. What the fact that RFC3530 talks about implies about what clients should do about the issue. One thing that I hope is not controversial is that the v4.1 spec should either get rid of this or make it clear and implementable. I expect plenty of controversy about which of those to choose, but hope that there isn't any about the proposition that we have to choose one of those two. > SECINFO information is, for instance, given > out on a per-filehandle basis, does that mean that the server will have > different security policies? Well yes, RFC3530 does say "The new SECINFO operation will allow the client to determine, on a per filehandle basis", but I think that just has to be considered as an error rather than indicating that if you have two different filehandles for the same object, they can have different security policies. SECINFO in RFC3530 takes a directory fh and a name, so if there are multiple filehandles for the object with that name, there is no way for SECINFO to associate different policies with different filehandles. All it has is the name to go by. I think this should be corrected to "on a per-object basis" in the new spec no matter what we do on other issues. I think the principle here has to be that if we do allow multiple fh's to map to the same object, we require that they designate the same object, and thus it is not allowed for the server to act as if you have multiple different object with different characteristics. Similarly as to: > In some places, people haven't even started > to think about the consequences: > > If GETATTR directed to the two filehandles does not return the > fileid attribute for both of the handles, then it cannot be > determined whether the two objects are the same. Therefore, > operations which depend on that knowledge (e.g., client side data > caching) cannot be done reliably. I think they (and maybe "they" includes me, I haven't checked the history here) started to think about them, but went in a bad direction. The implication here that you can have a different set of attributes supported for the same object based on which filehandle is used to access the attributes is totally bogus. The definition of supp_attr says "The bit vector which would retrieve all mandatory and recommended attributes that are supported for this object. The scope of this attribute applies to all objects with a matching fsid." So having the same object have different attributes supported based on the filehandle used or even two objects in the same fs having different attributes supported, in particular having fileid supported for one and not the other just isn't valid. > The fact is that RFC3530 contains masses of rope with which > to allow server and client vendors to hang themselves. If that means simply making poor choices, then OK. But if there are other cases where you feel that the specification of a feature is simply incoherent and the consequences not really thought out, then I think we need to discuss them and not propagate that state of affairs to v4.1. -Original Message- From: Trond Myklebust [mailto:[EMAIL PROTECTED] Sent: Friday, January 05, 2007 5:29 AM To: Benny Halevy Cc: Jan Harkes; Miklos Szeredi; nfsv4@ietf.org; linux-kernel@vger.kernel.org; Mikulas Patocka; linux-fsdevel@vger.kernel.org; Jeff Layton; Arjan van de Ven Subject: Re: [nfsv4] RE: Finding hardlinks On Fri, 2007-01-05 at 10:28 +0200, Benny Halevy wrote: > Trond Myklebust wrote: > > Exactly where do you see us violating the close-to-open cache > > consistency guarantees? > > > > I haven't seen that. What I did see is cache inconsistency when opening > the same file with different file descriptors when the filehandle changes. > My testing shows that at least fsync and close fail with EIO when the filehandle > changed while there was dirty data in the cache and that's good. Still, > not sharing the cache while the file is opened (even on a different file > descriptors by the same process) seems impractical. Tough. I'm not going to commit to adding support for multiple filehandles. The fact is that RFC3530 contains masses of rope with which to allow server and client vendors to hang themselves. The fact that the protocol claims support for servers that use multiple filehandles per inode does not mean it is necessarily a good idea. It adds unnecessary code complexity, it screws with server scalability (extra GETATTR calls just in order to probe existing filehandles), and it is insufficiently well documented in the RFC: SECINFO information is, for instance, given out on a per-filehandle basis, does that mean that the server will have different security policies? In some places, people haven't even started to thin
Re: Finding hardlinks
On Fri, Jan 05, 2007 at 09:43:22AM +0100, Miklos Szeredi wrote: > > > > > High probability is all you have. Cosmic radiation hitting your > > > > > computer will more likly cause problems, than colliding 64bit inode > > > > > numbers ;) > > > > > > > > Some of us have machines designed to cope with cosmic rays, and would be > > > > unimpressed with a decrease in reliability. > > > > > > With the suggested samefile() interface you'd get a failure with just > > > about 100% reliability for any application which needs to compare a > > > more than a few files. The fact is open files are _very_ expensive, > > > no wonder they are limited in various ways. > > > > > > What should 'tar' do when it runs out of open files, while searching > > > for hardlinks? Should it just give up? Then the samefile() interface > > > would be _less_ reliable than the st_ino one by a significant margin. > > > > You need at most two simultenaously open files for examining any > > number of hardlinks. So yes, you can make it reliable. > > Well, sort of. Samefile without keeping fds open doesn't have any > protection against the tree changing underneath between first > registering a file and later opening it. The inode number is more > useful in this respect. In fact inode number + generation number will > give you a unique identifier in time as well, which is a _lot_ more > useful to determine if the file you are checking is actually the same > as one that you've come across previously. Samefile with keeping fds open doesn't buy you much anyway. What exactly would be the value of a directory tree seen by operating only on fds (even for directories) when some rogue process is renaming, moving, updating stuff underneath? One ends up with a tree which misses alot of files and hardly bears any resemblance with the actual tree at any point in time and I'm not even talking about filedata. It is futile to try to get a consistent tree view on a live filesystem, with- or without using fds. It just doesn't work without fundamental support for some kind of "freezing" or time-travel inside the kernel. Snapshots at the block device level are problematic too. > > So instead of samefile() I'd still suggest an extended attribute > interface which exports the file's unique (in space and time) > identifier as an opaque cookie. But then you're just _shifting_ the problem instead of fixing it: st_ino/st_mtime (st_ctime?) are designed for this purpose. If the filesystem doesn't support it properly: live with the consequences which are mostly minor. Notable exceptions are of course backup tools but backups _must_ be verified anyway so you'll discover soon. (btw, that's what I noticed after restoring a system from a CD (iso9660 with RR): all hardlinks were gone) -- Frank - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Fri, Jan 05, 2007 at 12:11:17PM -0500, Shaya Potter wrote: > ok, I guess what I don't understand is when are > > dentry->d_sb->s_root and nd->mnt->mnt_root not equivalent. > > I guess it would be "when crossing a mountpoint on the server", so I > look at nfs_follow_mountpoint() and basically see that you create a new > mountpoint and a new nd for that. And since superblock objects are only > per "real" file system not per mount point, they will be different. > > I guess the question is, why shouldn't a dentry object know what > vfsmount it belongs to is? Can it belong to multiple vfsmount objects? Yes. dentry belongs to superblock. vfsmount refers to a subtree of some superblock's dentry tree. There can be any number of vfsmounts refering to subtrees of the same fs. Some might refer to the entire tree, some - to its arbitrary subtrees (possibly as small as a single file). There can be any number of vfsmounts refering to any given subtree. Think what happens when you create a binding. Or when you clone an entire namespace. (Pieces of) the same filesystem might be mounted in a lot of places. vfsmount describes a part of unified tree delimited by mountpoints; if the same fs (or its parts) are seen under several mountpoints, you get vfsmounts that refer to the same fs instance, i.e. the same superblock and dentry tree. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Fri, Jan 05, 2007 at 12:11:17PM -0500, Shaya Potter wrote: > I guess the question is, why shouldn't a dentry object know what > vfsmount it belongs to is? Can it belong to multiple vfsmount objects? > (Again, probably showing my igornance here). If you have the same tree mounted in two places, they'll have different vfsmounts. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nfsv4] RE: Finding hardlinks
On Thu, Jan 04, 2007 at 12:04:14PM +0200, Benny Halevy wrote: > I agree that the way the client implements its cache is out of the protocol > scope. But how do you interpret "correct behavior" in section 4.2.1? > "Clients MUST use filehandle comparisons only to improve performance, not > for correct behavior. All clients need to be prepared for situations in which > it cannot be determined whether two filehandles denote the same object and in > such cases, avoid making invalid assumptions which might cause incorrect > behavior." > Don't you consider data corruption due to cache inconsistency an incorrect > behavior? If a file with multiple hardlinks appears to have multiple distinct filehandles then a client like Trond's will treat it as multiple distinct files (with the same hardlink count, and you won't be able to find the other links to them -- oh well). Can this cause data corruption? Yes, but only if there are applications that rely on the different file names referencing the same file, and backup apps on the client won't get the hardlinks right either. What I don't understand is why getting the fileid is so hard -- always GETATTR when you GETFH and you'll be fine. I'm guessing that's not as difficult as it is to maintain a hash table of fileids. Nico -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Fri, 2007-01-05 at 17:01 +0100, Trond Myklebust wrote: > On Fri, 2007-01-05 at 10:00 -0500, Shaya Potter wrote: > > I'd agree with you (And even told the person the problem up front) > > except it's not oopsing on a lack of intent information, it's oopsing > > because nd is null and therefore can not access nd->mnt. > > > > > i.e. Let say I couldn't reconstruct nd perfectly (due to not knowing > > vfsmnt information), I could possible construct a fake nd with the > > proper intent information (i.e. very likely no intent information to be > > passed) and it would still oops. > > No. The nameidata structure is _mandatory_, as is the intent info. In > order to ensure POSIX correctness, the NFS client sometimes needs to > create submounts (when crossing a mountpoint on the server). If you > don't provide a proper nameidata structure, then that is impossible. ok, I guess what I don't understand is when are dentry->d_sb->s_root and nd->mnt->mnt_root not equivalent. I guess it would be "when crossing a mountpoint on the server", so I look at nfs_follow_mountpoint() and basically see that you create a new mountpoint and a new nd for that. And since superblock objects are only per "real" file system not per mount point, they will be different. I guess the question is, why shouldn't a dentry object know what vfsmount it belongs to is? Can it belong to multiple vfsmount objects? (Again, probably showing my igornance here). Basically, the question I'm trying to figure out, are a) why do intents and nd have to be so tightly wound together? b) why can't a dentry object know about vfsmount? - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nfsv4] RE: Finding hardlinks
On Fri, 2007-01-05 at 10:40 -0600, Nicolas Williams wrote: > What I don't understand is why getting the fileid is so hard -- always > GETATTR when you GETFH and you'll be fine. I'm guessing that's not as > difficult as it is to maintain a hash table of fileids. You've been sleeping in class. We always try to get the fileid together with the GETFH. The irritating bit is having to redo a GETATTR using the old filehandle in order to figure out if the 2 filehandles refer to the same file. Unlike filehandles, fileids can be reused. Then there is the point of dealing with that servers can (and do!) actually lie to you. Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Fri, 2007-01-05 at 10:00 -0500, Shaya Potter wrote: > I'd agree with you (And even told the person the problem up front) > except it's not oopsing on a lack of intent information, it's oopsing > because nd is null and therefore can not access nd->mnt. > i.e. Let say I couldn't reconstruct nd perfectly (due to not knowing > vfsmnt information), I could possible construct a fake nd with the > proper intent information (i.e. very likely no intent information to be > passed) and it would still oops. No. The nameidata structure is _mandatory_, as is the intent info. In order to ensure POSIX correctness, the NFS client sometimes needs to create submounts (when crossing a mountpoint on the server). If you don't provide a proper nameidata structure, then that is impossible. Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> > And does it matter? If you rename a file, tar might skip it no matter of > > hardlink detection (if readdir races with rename, you can read none of the > > names of file, one or both --- all these are possible). > > > > If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete > > both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", > > tar might hardlink both "c" and "d" to "a" and "b". > > > > No one guarantees you sane result of tar or cp -a while changing the tree. > > I don't see how is_samefile() could make it worse. > > There are several cases where changing the tree doesn't affect the > correctness of the tar or cp -a result. In some of these cases using > samefile() instead of st_ino _will_ result in a corrupted result. Also note, that using st_ino in combination with samefile() doesn't make the result much better, it eliminates false positives, but cannot fix false negatives. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UPDATED PATCH] fix memory corruption from misinterpreted bad_inode_ops return values
[EMAIL PROTECTED] wrote: > Hmm, the problem with this is it bloats bad_inode.o with lots of empty > functions that return -EIO. Even though we're not interested in the > parameters, GCC doesn't know this, and doesn't fold the functions into only > the couple of definitions that return different types. > After posting this, I saw the discussion on the LKML. I was only about 12 hours late, as usual... Phillip - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> And does it matter? If you rename a file, tar might skip it no matter of > hardlink detection (if readdir races with rename, you can read none of the > names of file, one or both --- all these are possible). > > If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete > both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", > tar might hardlink both "c" and "d" to "a" and "b". > > No one guarantees you sane result of tar or cp -a while changing the tree. > I don't see how is_samefile() could make it worse. There are several cases where changing the tree doesn't affect the correctness of the tar or cp -a result. In some of these cases using samefile() instead of st_ino _will_ result in a corrupted result. Generally samefile() is _weaker_ than the st_ino interface in comparing the identity of two files without using massive amounts of memory. You're searching for a better solution, not one that is broken in a different way, aren't you? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Fri, 2007-01-05 at 13:22 +0100, Trond Myklebust wrote: > On Thu, 2007-01-04 at 19:00 -0500, Chaitanya Patti wrote: > > > > Hello everyone, > > > > In the function nfs_lookup in nfs/dir.c , the following line (line # 926): > > > > error = nfs_reval_fsid(nd->mnt, dir, &fhandle, &fattr); > > > > uses `nd' without having checked if it is NULL. > > > > Is this correct? > > It is quite intentional and correct. Calling ->lookup() without correct > intent information is a bug. I'd agree with you (And even told the person the problem up front) except it's not oopsing on a lack of intent information, it's oopsing because nd is null and therefore can not access nd->mnt. i.e. Let say I couldn't reconstruct nd perfectly (due to not knowing vfsmnt information), I could possible construct a fake nd with the proper intent information (i.e. very likely no intent information to be passed) and it would still oops. So my question, is changing nfs_reval_fsid() from static inline int nfs_reval_fsid(struct vfsmount *mnt...) that calls __nfs_revalidate_inode(, mnt->mnt_root->d_inode); and is called as error = nfs_reval_fsid(nd->mnt...) by nfs_lookup() to static inline int nfs_reval_fsid(struct dentry * dentry...) that calls __nfs_revalidate_inode(server, dentry->d_inode); and is called as error = nfs_reval_fsid(dentry->d_sb->s_root...) by nfs_lookup() incorrect? now, it could be me missing the boat here, I wouldn't be surprised. thanks. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UPDATED PATCH] fix memory corruption from misinterpreted bad_inode_ops return values
Eric Sandeen wrote: >but Al felt that it was probably better to create an EIO-returner for each >actual op signature. Since so few ops share a signature, I just went ahead >& created an EIO function for each individual file & inode op that returns >a value. Hmm, the problem with this is it bloats bad_inode.o with lots of empty functions that return -EIO. Even though we're not interested in the parameters, GCC doesn't know this, and doesn't fold the functions into only the couple of definitions that return different types. Text size of original bad_inode.o: Idx Name Size VMA LMA File off Algn 0 .text 006c 0034 2**2 == 108 bytes Size with patch applied: Idx Name Size VMA LMA File off Algn 0 .text 016b 0034 2**2 patch applied: == 363 bytes, or over three times larger! >I originally had coded up the fix by creating a return_EIO_ macro >for each return type, This adds two extra functions (return for ssize_t and long), which gives an increase in size of only 12 bytes: Idx Name Size VMA LMA File off Algn 0 .text 0078 0034 2**2 == 120 bytes. Isn't this better? Thanks Phillip -- View this message in context: http://www.nabble.com/-UPDATED-PATCH--fix-memory-corruption-from-misinterpreted-bad_inode_ops-return-values-tf2916716.html#a8178968 Sent from the linux-fsdevel mailing list archive at Nabble.com. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
Well, sort of. Samefile without keeping fds open doesn't have any protection against the tree changing underneath between first registering a file and later opening it. The inode number is more You only need to keep one-file-per-hardlink-group open during final verification, checking that inode hashing produced reasonable results. What final verification? I wasn't just talking about 'tar' but all cases where st_ino might be used to check the identity of two files at possibly different points in time. Time A:remember identity of file X Time B:check if identity of file Y matches that of file X With samefile() if you open X at A, and keep it open till B, you can accumulate large numbers of open files and the application can fail. If you don't keep an open file, just remember the path, then renaming X will foil the later identity check. Changing the file at this path between A and B can even give you a false positive. This applies to 'tar' as well as the other uses. And does it matter? If you rename a file, tar might skip it no matter of hardlink detection (if readdir races with rename, you can read none of the names of file, one or both --- all these are possible). If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", tar might hardlink both "c" and "d" to "a" and "b". No one guarantees you sane result of tar or cp -a while changing the tree. I don't see how is_samefile() could make it worse. Mikulas Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> > Well, sort of. Samefile without keeping fds open doesn't have any > > protection against the tree changing underneath between first > > registering a file and later opening it. The inode number is more > > You only need to keep one-file-per-hardlink-group open during final > verification, checking that inode hashing produced reasonable results. What final verification? I wasn't just talking about 'tar' but all cases where st_ino might be used to check the identity of two files at possibly different points in time. Time A:remember identity of file X Time B:check if identity of file Y matches that of file X With samefile() if you open X at A, and keep it open till B, you can accumulate large numbers of open files and the application can fail. If you don't keep an open file, just remember the path, then renaming X will foil the later identity check. Changing the file at this path between A and B can even give you a false positive. This applies to 'tar' as well as the other uses. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
Hi! > > > > Some of us have machines designed to cope with cosmic rays, and would be > > > > unimpressed with a decrease in reliability. > > > > > > With the suggested samefile() interface you'd get a failure with just > > > about 100% reliability for any application which needs to compare a > > > more than a few files. The fact is open files are _very_ expensive, > > > no wonder they are limited in various ways. > > > > > > What should 'tar' do when it runs out of open files, while searching > > > for hardlinks? Should it just give up? Then the samefile() interface > > > would be _less_ reliable than the st_ino one by a significant margin. > > > > You need at most two simultenaously open files for examining any > > number of hardlinks. So yes, you can make it reliable. > > Well, sort of. Samefile without keeping fds open doesn't have any > protection against the tree changing underneath between first > registering a file and later opening it. The inode number is more You only need to keep one-file-per-hardlink-group open during final verification, checking that inode hashing produced reasonable results. Pavel -- Thanks for all the (sleeping) penguins. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is a NULL check missing in nfs_lookup?
On Thu, 2007-01-04 at 19:00 -0500, Chaitanya Patti wrote: > > Hello everyone, > > In the function nfs_lookup in nfs/dir.c , the following line (line # 926): > > error = nfs_reval_fsid(nd->mnt, dir, &fhandle, &fattr); > > uses `nd' without having checked if it is NULL. > > Is this correct? It is quite intentional and correct. Calling ->lookup() without correct intent information is a bug. Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nfsv4] RE: Finding hardlinks
On Fri, 2007-01-05 at 10:28 +0200, Benny Halevy wrote: > Trond Myklebust wrote: > > Exactly where do you see us violating the close-to-open cache > > consistency guarantees? > > > > I haven't seen that. What I did see is cache inconsistency when opening > the same file with different file descriptors when the filehandle changes. > My testing shows that at least fsync and close fail with EIO when the > filehandle > changed while there was dirty data in the cache and that's good. Still, > not sharing the cache while the file is opened (even on a different file > descriptors by the same process) seems impractical. Tough. I'm not going to commit to adding support for multiple filehandles. The fact is that RFC3530 contains masses of rope with which to allow server and client vendors to hang themselves. The fact that the protocol claims support for servers that use multiple filehandles per inode does not mean it is necessarily a good idea. It adds unnecessary code complexity, it screws with server scalability (extra GETATTR calls just in order to probe existing filehandles), and it is insufficiently well documented in the RFC: SECINFO information is, for instance, given out on a per-filehandle basis, does that mean that the server will have different security policies? In some places, people haven't even started to think about the consequences: If GETATTR directed to the two filehandles does not return the fileid attribute for both of the handles, then it cannot be determined whether the two objects are the same. Therefore, operations which depend on that knowledge (e.g., client side data caching) cannot be done reliably. This implies the combination is legal, but offers no indication as to how you would match OPEN/CLOSE requests via different paths. AFAICS you would have to do non-cached I/O with no share modes (i.e. NFSv3-style "special" stateids). There is no way in hell we will ever support non-cached I/O in NFS other than the special case of O_DIRECT. ...and no, I'm certainly not interested in "fixing" the RFC on this point in any way other than getting this crap dropped from the spec. I see no use for it at all. Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] GFS2: Fix incorrect return code from gfs2_lookupi
Hi, On Thu, 2007-01-04 at 21:32 -0600, Russell Cattelan wrote: > This fixes a bug found by the fsfuzzer tool. > http://projects.info-pull.com/mokb/MOKB-15-11-2006.html > > A NULL was not an acceptable error condition expected > by any of the gfs2_lookupi callers. > Now applied to the GFS2 git tree. Thanks, Steve. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> > > > High probability is all you have. Cosmic radiation hitting your > > > > computer will more likly cause problems, than colliding 64bit inode > > > > numbers ;) > > > > > > Some of us have machines designed to cope with cosmic rays, and would be > > > unimpressed with a decrease in reliability. > > > > With the suggested samefile() interface you'd get a failure with just > > about 100% reliability for any application which needs to compare a > > more than a few files. The fact is open files are _very_ expensive, > > no wonder they are limited in various ways. > > > > What should 'tar' do when it runs out of open files, while searching > > for hardlinks? Should it just give up? Then the samefile() interface > > would be _less_ reliable than the st_ino one by a significant margin. > > You need at most two simultenaously open files for examining any > number of hardlinks. So yes, you can make it reliable. Well, sort of. Samefile without keeping fds open doesn't have any protection against the tree changing underneath between first registering a file and later opening it. The inode number is more useful in this respect. In fact inode number + generation number will give you a unique identifier in time as well, which is a _lot_ more useful to determine if the file you are checking is actually the same as one that you've come across previously. So instead of samefile() I'd still suggest an extended attribute interface which exports the file's unique (in space and time) identifier as an opaque cookie. For filesystems like FAT you can basically only guarantee that two files are the same as long as those files are in the icache, no matter if you use samefile() or inode numbers. Userpace _can_ make the inodes stay in the cache by keeping the files open, which works for samefile as well as checking by inode number. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET 1][PATCH 0/6] Filesystem AIO read/write
On Fri, Jan 05 2007, Suparna Bhattacharya wrote: > On Fri, Jan 05, 2007 at 08:02:33AM +0100, Jens Axboe wrote: > > On Fri, Jan 05 2007, Suparna Bhattacharya wrote: > > > On Thu, Jan 04, 2007 at 09:02:42AM -0800, Andrew Morton wrote: > > > > On Thu, 4 Jan 2007 10:26:21 +0530 > > > > Suparna Bhattacharya <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Wed, Jan 03, 2007 at 02:15:56PM -0800, Andrew Morton wrote: > > > > > > On Thu, 28 Dec 2006 13:53:08 +0530 > > > > > > Suparna Bhattacharya <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > This patchset implements changes to make filesystem AIO read > > > > > > > and write asynchronous for the non O_DIRECT case. > > > > > > > > > > > > Unfortunately the unplugging changes in Jen's block tree have > > > > > > trashed these > > > > > > patches to a degree that I'm not confident in my repair attempts. > > > > > > So I'll > > > > > > drop the fasio patches from -mm. > > > > > > > > > > I took a quick look and the conflicts seem pretty minor to me, the > > > > > unplugging > > > > > changes mostly touch nearby code. > > > > > > > > Well... the conflicts (both mechanical and conceptual) are such that a > > > > round of retesting is needed. > > > > > > > > > Please let know how you want this fixed up. > > > > > > > > > > >From what I can tell the comments in the unplug patches seem to say > > > > > >that > > > > > it needs more work and testing, so perhaps a separate fixup patch may > > > > > be > > > > > a better idea rather than make the fsaio patchset dependent on this. > > > > > > > > Patches against next -mm would be appreciated, please. Sorry about > > > > that. > > > > > > > > I _assume_ Jens is targetting 2.6.21? > > > > > > When is the next -mm likely to be out ? > > > > > > I was considering regenerating the blk unplug patches against the > > > fsaio changes instead of the other way around, if Jens were willing to > > > accept that. But if the next -mm is just around the corner then its > > > not an issue. > > > > I don't really care much, but I work against mainline and anything but > > occasional one-off generations of a patch against a different base is > > not very likely. > > > > The -mm order should just reflect the merge order of the patches, what > > is the fsaio target? > > 2.6.21 was what I had in mind, to enable the glibc folks to proceed with > conversion to native AIO. > > Regenerating my patches against the unplug stuff is not a problem, I only > worry about being queued up behind something that may take longer to > stabilize and is likely to change ... If that is not the case, I don't > mind. Same here, hence the suggestion to base then in merging order. If your target is 2.6.21, then I think fsaio should be first. While I think the plug changes are safe and as such mergable, we still need to see lots of results and do more testing. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nfsv4] RE: Finding hardlinks
Trond Myklebust wrote: > On Thu, 2007-01-04 at 12:04 +0200, Benny Halevy wrote: >> I agree that the way the client implements its cache is out of the protocol >> scope. But how do you interpret "correct behavior" in section 4.2.1? >> "Clients MUST use filehandle comparisons only to improve performance, not >> for correct behavior. All clients need to be prepared for situations in >> which it cannot be determined whether two filehandles denote the same object >> and in such cases, avoid making invalid assumptions which might cause >> incorrect behavior." >> Don't you consider data corruption due to cache inconsistency an incorrect >> behavior? > > Exactly where do you see us violating the close-to-open cache > consistency guarantees? > I haven't seen that. What I did see is cache inconsistency when opening the same file with different file descriptors when the filehandle changes. My testing shows that at least fsync and close fail with EIO when the filehandle changed while there was dirty data in the cache and that's good. Still, not sharing the cache while the file is opened (even on a different file descriptors by the same process) seems impractical. Benny - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET 1][PATCH 0/6] Filesystem AIO read/write
On Fri, Jan 05, 2007 at 08:02:33AM +0100, Jens Axboe wrote: > On Fri, Jan 05 2007, Suparna Bhattacharya wrote: > > On Thu, Jan 04, 2007 at 09:02:42AM -0800, Andrew Morton wrote: > > > On Thu, 4 Jan 2007 10:26:21 +0530 > > > Suparna Bhattacharya <[EMAIL PROTECTED]> wrote: > > > > > > > On Wed, Jan 03, 2007 at 02:15:56PM -0800, Andrew Morton wrote: > > > > > On Thu, 28 Dec 2006 13:53:08 +0530 > > > > > Suparna Bhattacharya <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > This patchset implements changes to make filesystem AIO read > > > > > > and write asynchronous for the non O_DIRECT case. > > > > > > > > > > Unfortunately the unplugging changes in Jen's block tree have trashed > > > > > these > > > > > patches to a degree that I'm not confident in my repair attempts. So > > > > > I'll > > > > > drop the fasio patches from -mm. > > > > > > > > I took a quick look and the conflicts seem pretty minor to me, the > > > > unplugging > > > > changes mostly touch nearby code. > > > > > > Well... the conflicts (both mechanical and conceptual) are such that a > > > round of retesting is needed. > > > > > > > Please let know how you want this fixed up. > > > > > > > > >From what I can tell the comments in the unplug patches seem to say > > > > >that > > > > it needs more work and testing, so perhaps a separate fixup patch may be > > > > a better idea rather than make the fsaio patchset dependent on this. > > > > > > Patches against next -mm would be appreciated, please. Sorry about that. > > > > > > I _assume_ Jens is targetting 2.6.21? > > > > When is the next -mm likely to be out ? > > > > I was considering regenerating the blk unplug patches against the > > fsaio changes instead of the other way around, if Jens were willing to > > accept that. But if the next -mm is just around the corner then its > > not an issue. > > I don't really care much, but I work against mainline and anything but > occasional one-off generations of a patch against a different base is > not very likely. > > The -mm order should just reflect the merge order of the patches, what > is the fsaio target? 2.6.21 was what I had in mind, to enable the glibc folks to proceed with conversion to native AIO. Regenerating my patches against the unplug stuff is not a problem, I only worry about being queued up behind something that may take longer to stabilize and is likely to change ... If that is not the case, I don't mind. Regards Suparna > > -- > Jens Axboe > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to [EMAIL PROTECTED] For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: mailto:"[EMAIL PROTECTED]">[EMAIL PROTECTED] -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html