Re: [PATCH 26/35] Unionfs: Privileged operations workqueue
On Dec 7 2006 21:17, Josef Sipek wrote: +void __unionfs_mknod(void *data) +{ + struct sioq_args *args = data; + struct mknod_args *m = args-mknod; ... | vfs_mknod(m-parent, m-dentry, m-mode, m-dev); If I make the *args = data line const, then gcc (4.1) yells about modifying a const variable 3 lines down.. args-err = vfs_mknod(m-parent, m-dentry, m-mode, m-dev); Sure, I could cast, but that seems like adding cruft for no good reason. No I despise casts more than missing consts. Why would gcc throw a warning? Let's take this super simple program No, this program doesn't tickle the problem.. Try to compile this one: The members of m (i.e. m-*) are not modified as for as __unionfs_mknod goes. vfs_mknod may only modify the members of m-parent (i.e. m-parent-*) struct mknod_args { int mode; int dev; }; void __mknod(const void *data) { const struct mknod_args *args = data; args-mode = 0; } int main(void) { const struct mknod_args *m; __mknod(m); return 0; } $ gcc -Wall -c test.c test.c: In function âmknodâtest.c:10: error: assignment of read-only location Josef Jeff Sipek. -- Reality is merely an illusion, albeit a very persistent one. - Albert Einstein -`J' --
Re: [PATCH 15/35] Unionfs: Common file operations
On Dec 7 2006 23:16, Josef Sipek wrote: I think there was an ioctl for files to find out where a particular file lives on disk. That's the UNIONFS_IOCTL_QUERYFILE case. No I meant something that works on all filesystems, something generic, not unionfs-based. -`J' -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 35/35] Unionfs: Extended Attributes support
On Dec 8 2006 00:35, Josef Sipek wrote: --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -18,6 +18,75 @@ #include union.h +#ifdef CONFIG_UNION_FS_XATTR ^^ this, do you?. Beware, copyup.c gets compiled all the time even when you don't have xattrs enabled. Oops, I thought this was xattr.c. Sorry for the noise. -`J' -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] [PATCH 10/10] gfs2: nfs lock support for gfs2
On Thu, Dec 07, 2006 at 09:30:43AM -0600, David Teigland wrote: Some posix locks would be trivial to cancel and others would be hard. If gfs_controld has not yet read the op from the kernel's send_list, then we just remove the op and it never goes out. After gfs_controld has taken it and sent it, then it's had its effect and, as you reminded me, is unreversible without introducing new complexity (like the provisional locks which sound unpleasant). Something like that may be necessary in the end anyway. The background: NFSv4 doesn't have lock grant callbacks. It only has a single lock call. You can set a blocking bit in that call, but all this does is tell the server I intend continue polling for this lock, so if it becomes available, could you hold it a little while to give me a chance at it before anyone else? That's just a hint, not a requirement. But if we don't take the hint, locking between clients with different polling strategies, or between remote clients and local users, may be dramatically unfair. The client also isn't required to poll for the lock again--it may stop without notice at any time. So we'd like to have a way to hold a lock on behalf of a client, without losing the ability to cancel the lock if the client goes away. Hence the provisional locks. For now, maybe we could settle for a solution that at least handles exclusive whole-file locking (for which an unlock is an adequate cancel), and leaves open the possibility of a more complete solution later. --b. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] ensure unique i_ino in filesystems without permanent inode numbers (libfs superblock cleanup)
On Fri, Dec 08, 2006 at 08:08:03AM -0500, Jeff Layton wrote: Josef Sipek wrote: - ret = simple_fill_super(sb, IPATHFS_MAGIC, files); + ret = simple_fill_super(sb, IPATHFS_MAGIC, files, 1); I don't know...the magic looking 1 and 0 (later in the patch) seem a bit arbitrary. Maybe a #define is in order? Yeah, I'm not fond of that, though the comments on simple_fill_super should explain it. Basically, I need simple_fill_super to operate in two different modes, and I was using the extra flag to key this. I'm not clear on what sort of #define would make sense here. Can you suggest something? First I was thinking about defining 2 constats, but maybe the better thing to do would be to 1) rename simple_fill_super to __simple_fill_super 2) #define simple_fill_super_foo(...) to __siple_fill_super(., 0) 3) #define simple_fill_super_bar(...) to __siple_fill_super(., 1) (Or equivalent thing using inline functions.) I can't really think of any good name for #define'd flag. Beware, I'm pretty much just thinking out loud.. :) @@ -399,7 +407,10 @@ int simple_fill_super(struct super_block inode-i_blocks = 0; inode-i_atime = inode-i_mtime = inode-i_ctime = CURRENT_TIME; I'd indent CURRENT_TIME a bit. I wasn't planning on touching those parts of the code that don't need to be changed, since formatting deltas can make it harder to see the actual changes in the patch. That should probably be addressed in a follow-on patch if you think it needs to be changed. Oh, sorry, that wasn't your code. You're right about it not being the the right thing to fix in your patch. Actually, it looks like another problem created by line-wrapping. Josef Jeff Sipek. -- NT is to UNIX what a doughnut is to a particle accelerator. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 26/35] Unionfs: Privileged operations workqueue
On Dec 8 2006 11:00, Josef Sipek wrote: +void __unionfs_mkdir(void *data) +{ + struct sioq_args *args = data; + struct mkdir_args *m = args-mkdir; + + args-err = vfs_mkdir(m-parent, m-dentry, m-mode); + complete(args-comp); +} The members of m (i.e. m-*) are not modified as for as __unionfs_mknod goes. vfs_mknod may only modify the members of m-parent (i.e. m-parent-*) Yes they are. The return value is passed through a member of m. Where - where do you see that m-parent, m-dentry or m-mode are modified? (The original submission is above.) -`J' -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of buffered write path (deadlock fixes)
On Fri, Dec 08, 2006 at 02:28:10PM +1100, Nick Piggin wrote: In generic_file_buffered_write() we now do: status = a_ops-commit_write(file, page, offset,offset+copied); Which tells the file system to commit only the amount of data that filemap_copy_from_user() was able to pull in, despite our zeroing of the newly allocated buffers in the copied != bytes case. Shouldn't we be doing: status = a_ops-commit_write(file, page, offset,offset+bytes); instead, thus preserving ordered writeout (for ext3, ocfs2, etc) for those parts of the page which are properly allocated and zero'd but haven't been copied into yet? I think that in the case of a crash just after the transaction is closed in -commit_write(), we might lose those guarantees, exposing stale data on disk. No, because we might be talking about buffers that haven't been newly allocated, but are not uptodate (so the pagecache can contain junk). We can't zero these guys and do the commit_write, because that exposes transient zeroes to a concurrent reader. Ahh ok - zeroing would populate with incorrect data and we can't write those buffers because we'd write junk over good data. And of course, the way it works right now will break ordered write mode. Sigh. What we *could* do, is to do the full length commit_write for uptodate pages, even if the copy is short. But we still need to do a zero-length commit_write if the page is not uptodate (this would reduce the number of new cases that need to be considered). Does a short commit_write cause a problem for filesystems? They can still do any and all operations they would have with a full-length one. If they've done allocation, yes. You're telling the file system to stop early in the page, even though there may be BH_New buffers further on which should be processed (for things like ordered data mode). Hmm, I think we should just just change functions like walk_page_buffers() in fs/ext3/inode.c and fs/ocfs2/aops.c to look for BH_New buffers outside the range specified (they walk the entire buffer list anyway). If it finds one that's buffer_new() it passes it to the journal unconditionally. You'd also have to revert the change you did in fs/ext3/inode.c to at least always make the call to walk_page_buffers(). I really don't like that we're hiding a detail of this interaction so deep within the file system commit_write() callback. I suppose we can just do our best to document it. But maybe it would be better to eliminate that case. OK. How about a zero-length commit_write? In that case again, they should be able to remain unchanged *except* that they are not to extend i_size or mark the page uptodate. If we make the change I described above (looking for BH_New buffers outside the range passed), then zero length or partial shouldn't matter, but zero length instead of partial would be nicer imho just for the sake of reducing the total number of cases down to the entire range or zero length. For some reason, I'm not seeing where BH_New is being cleared in case with no errors or faults. Hopefully I'm wrong, but if I'm not I believe we need to clear the flag somewhere (perhaps in block_commit_write()?). Hmm, it is a bit inconsistent. It seems to be anywhere from prepare_write to block_write_full_page. Where do filesystems need the bit? It would be nice to clear it in commit_write if possible. Worst case we'll need a new bit. -commit_write() would probably do fine. Currently, block_prepare_write() uses it to know which buffers were newly allocated (the file system specific get_block_t sets the bit after allocation). I think we could safely move the clearing of that bit to block_commit_write(), thus still allowing us to detect and zero those blocks in generic_file_buffered_write() Thanks, --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Secure Deletion and Trash-Bin Support for Ext4
They are defined but unused in 2.6.19, right? I can't see anywhere in the 2.6.19 ext2/3/4/reiser trees that actually those flags, including setting and retrieving them from disk. JFS i can see sets, clears and retreives them, but not the fielsystems you mention. Though I might just be blind. ;) Well, at least Ext2/3/4 and JFS do store these attributes and retrieve them from the disk. However, not a single one of these file systems supports the related functionality (secure deletion or a trash-bin). Clearly, this is confusing for the users. If all we need to add to XFS is support for those flags, then XFS support would be trivial to add. Oh, damn. I take that back. We're almost out of flag space in the on disk inode - these two flags would use the last 2 flag bits so this may require an on disk inode format change in XFS. This will be a little more complex than I first thought, but not impossible as we already support two on-disk inode format versions. If you do not see any obvious use for these two bits in the near future and ready to sacrifice them (or at least one of them) for secure deletion and trash-bin attributes we would be more than happy to add the actual functionality to XFS as well. I'd certainly like to see new functionality like this added at the VFS rather than in any particular filesystem. Don't know about anyone else, though I guess, the right thing to do would be to move the common trash-bin (tb.c and tb.h) code into the /fs and /include/linux directories. And call them trashbin.[ch] ;) In fact, we have already done this :-) Also, VFS would require just a couple of trivial changes to support something like '-o trashbin' mount-time option for all file systems. In addition, individual file systems may support per-file attributes for this (e.g., ext2/3/4). Sounds like a good idea to me. Thank you! So we will submit the corresponding patches soon. We will start from the trashbin.c and patches for the individual file systems. A little later we will submit a VFS patch to add support for the '-o trashbin' mount option. Nikolai Joukov -- Filesystems and Storage Laboratory Stony Brook University - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] [PATCH 1/10] lockd: add new export operation for nfsv4/lockd locking
J. Bruce Fields wrote: From: Marc Eshel [EMAIL PROTECTED] There is currently a filesystem -lock() method, but it is defined only by a few filesystems that are not exported via nfsd. So none of the lock routines that are used by lockd or nfsv4 bother to call those methods. Filesystems such as cluster filesystems would like to do their own locking and also would like to be exportable via NFS. So we add a new lock() export operation, and new routines vfs_lock_file, vfs_test_lock, and vfs_cancel_lock, which call the new export operation, falling back on the appropriate local operation if the export operation is unavailable. These new functions are intended to be used by lockd and nfsd; lockd and nfsd changes to take advantage of them are made by later patches. Starting to read these patches .. I see these new APIs are placed in lockd directory. Was there any discussion before to make these generic vfs layer functions ? I'm just curious. -- Wendy - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html