Re: [PATCH 26/35] Unionfs: Privileged operations workqueue

2006-12-08 Thread Jan Engelhardt

On Dec 7 2006 21:17, Josef Sipek wrote:
   +void __unionfs_mknod(void *data)
   +{
   +  struct sioq_args *args = data;
   +  struct mknod_args *m = args-mknod;

...
| vfs_mknod(m-parent, m-dentry, m-mode, m-dev);

 If I make the *args = data line const, then gcc (4.1) yells about modifying
 a const variable 3 lines down..
 
 args-err = vfs_mknod(m-parent, m-dentry, m-mode, m-dev);
 
 Sure, I could cast, but that seems like adding cruft for no good reason.
 
 No I despise casts more than missing consts. Why would gcc throw a warning?
 Let's take this super simple program

No, this program doesn't tickle the problem.. Try to compile this one:

The members of m (i.e. m-*) are not modified as for as __unionfs_mknod goes.
vfs_mknod may only modify the members of m-parent (i.e. m-parent-*)



struct mknod_args {
   int mode;
   int dev;
};

void  __mknod(const void *data)
{
   const struct mknod_args *args = data;
   args-mode = 0;
}

int main(void) {
   const struct mknod_args *m;
   __mknod(m);
   return 0;
}


$ gcc -Wall -c test.c
test.c: In function âmknodâtest.c:10: error: assignment of read-only location


Josef Jeff Sipek.

-- 
Reality is merely an illusion, albeit a very persistent one.
   - Albert Einstein


-`J'
-- 

Re: [PATCH 15/35] Unionfs: Common file operations

2006-12-08 Thread Jan Engelhardt

On Dec 7 2006 23:16, Josef Sipek wrote:
 I think there was an ioctl for files to find out where a particular
 file lives on disk.

That's the UNIONFS_IOCTL_QUERYFILE case.

No I meant something that works on all filesystems, something generic, not
unionfs-based.


-`J'
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 35/35] Unionfs: Extended Attributes support

2006-12-08 Thread Jan Engelhardt


On Dec 8 2006 00:35, Josef Sipek wrote:
 
 --- a/fs/unionfs/copyup.c
 +++ b/fs/unionfs/copyup.c
 @@ -18,6 +18,75 @@
  
  #include union.h
  
 +#ifdef CONFIG_UNION_FS_XATTR
 
 ^^ this, do you?.
 
Beware, copyup.c gets compiled all the time even when you don't have xattrs
enabled.

Oops, I thought this was xattr.c. Sorry for the noise.

-`J'
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH 10/10] gfs2: nfs lock support for gfs2

2006-12-08 Thread J. Bruce Fields
On Thu, Dec 07, 2006 at 09:30:43AM -0600, David Teigland wrote:
 Some posix locks would be trivial to cancel and others would be hard.  If
 gfs_controld has not yet read the op from the kernel's send_list, then we
 just remove the op and it never goes out.  After gfs_controld has taken
 it and sent it, then it's had its effect and, as you reminded me, is
 unreversible without introducing new complexity (like the provisional
 locks which sound unpleasant).

Something like that may be necessary in the end anyway.  The background:

NFSv4 doesn't have lock grant callbacks.  It only has a single lock
call.  You can set a blocking bit in that call, but all this does is
tell the server I intend continue polling for this lock, so if it
becomes available, could you hold it a little while to give me a chance
at it before anyone else?  That's just a hint, not a requirement.  But
if we don't take the hint, locking between clients with different
polling strategies, or between remote clients and local users, may be
dramatically unfair.

The client also isn't required to poll for the lock again--it may stop
without notice at any time.  So we'd like to have a way to hold a lock
on behalf of a client, without losing the ability to cancel the lock if
the client goes away.  Hence the provisional locks.

For now, maybe we could settle for a solution that at least handles
exclusive whole-file locking (for which an unlock is an adequate
cancel), and leaves open the possibility of a more complete solution
later.

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] ensure unique i_ino in filesystems without permanent inode numbers (libfs superblock cleanup)

2006-12-08 Thread Josef Sipek
On Fri, Dec 08, 2006 at 08:08:03AM -0500, Jeff Layton wrote:
 Josef Sipek wrote:
  -  ret = simple_fill_super(sb, IPATHFS_MAGIC, files);
  +  ret = simple_fill_super(sb, IPATHFS_MAGIC, files, 1);
 
  I don't know...the magic looking 1 and 0 (later in the patch) seem a bit
  arbitrary. Maybe a #define is in order?
 
 Yeah, I'm not fond of that, though the comments on simple_fill_super should
 explain it. Basically, I need simple_fill_super to operate in two different
 modes, and I was using the extra flag to key this. I'm not clear on what
 sort of #define would make sense here. Can you suggest something?
 
First I was thinking about defining 2 constats, but maybe the better thing
to do would be to

1) rename simple_fill_super to __simple_fill_super
2) #define simple_fill_super_foo(...) to __siple_fill_super(., 0)
3) #define simple_fill_super_bar(...) to __siple_fill_super(., 1)

(Or equivalent thing using inline functions.)
 
I can't really think of any good name for #define'd flag.

Beware, I'm pretty much just thinking out loud.. :)
 
  @@ -399,7 +407,10 @@ int simple_fill_super(struct super_block
 inode-i_blocks = 0;
 inode-i_atime = inode-i_mtime = inode-i_ctime =
 CURRENT_TIME;
 
  I'd indent CURRENT_TIME a bit.
 
 I wasn't planning on touching those parts of the code that don't need to be
 changed, since formatting deltas can make it harder to see the actual
 changes in the patch. That should probably be addressed in a follow-on
 patch if you think it needs to be changed.

Oh, sorry, that wasn't your code. You're right about it not being the the
right thing to fix in your patch.

Actually, it looks like another problem created by line-wrapping.

Josef Jeff Sipek.

-- 
NT is to UNIX what a doughnut is to a particle accelerator.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/35] Unionfs: Privileged operations workqueue

2006-12-08 Thread Jan Engelhardt

On Dec 8 2006 11:00, Josef Sipek wrote:

+void __unionfs_mkdir(void *data)
+{
+   struct sioq_args *args = data;
+   struct mkdir_args *m = args-mkdir;
+
+   args-err = vfs_mkdir(m-parent, m-dentry, m-mode);
+   complete(args-comp);
+}

 The members of m (i.e. m-*) are not modified as for as __unionfs_mknod goes.
 vfs_mknod may only modify the members of m-parent (i.e. m-parent-*)
 
Yes they are. The return value is passed through a member of m.

Where - where do you see that m-parent, m-dentry or m-mode are modified?
(The original submission is above.)


-`J'
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of buffered write path (deadlock fixes)

2006-12-08 Thread Mark Fasheh
On Fri, Dec 08, 2006 at 02:28:10PM +1100, Nick Piggin wrote:
 In generic_file_buffered_write() we now do:
 
  status = a_ops-commit_write(file, page, offset,offset+copied);
 
 Which tells the file system to commit only the amount of data that
 filemap_copy_from_user() was able to pull in, despite our zeroing of 
 the newly allocated buffers in the copied != bytes case. Shouldn't we be
 doing:
 
 status = a_ops-commit_write(file, page, offset,offset+bytes);
 
 instead, thus preserving ordered writeout (for ext3, ocfs2, etc) for those
 parts of the page which are properly allocated and zero'd but haven't been
 copied into yet? I think that in the case of a crash just after the
 transaction is closed in -commit_write(), we might lose those guarantees,
 exposing stale data on disk.
 
 No, because we might be talking about buffers that haven't been newly
 allocated, but are not uptodate (so the pagecache can contain junk).
 
 We can't zero these guys and do the commit_write, because that exposes
 transient zeroes to a concurrent reader.

Ahh ok - zeroing would populate with incorrect data and we can't write those
buffers because we'd write junk over good data.

And of course, the way it works right now will break ordered write mode.

Sigh.

 
 What we *could* do, is to do the full length commit_write for uptodate
 pages, even if the copy is short. But we still need to do a zero-length
 commit_write if the page is not uptodate (this would reduce the number
 of new cases that need to be considered).

 Does a short commit_write cause a problem for filesystems? They can
 still do any and all operations they would have with a full-length one.

If they've done allocation, yes. You're telling the file system to stop
early in the page, even though there may be BH_New buffers further on which
should be processed (for things like ordered data mode).

Hmm, I think we should just just change functions like walk_page_buffers()
in fs/ext3/inode.c and fs/ocfs2/aops.c to look for BH_New buffers outside
the range specified (they walk the entire buffer list anyway). If it finds
one that's buffer_new() it passes it to the journal unconditionally. You'd
also have to revert the change you did in fs/ext3/inode.c to at least always
make the call to walk_page_buffers().

I really don't like that we're hiding a detail of this interaction so deep
within the file system commit_write() callback. I suppose we can just do our
best to document it.


 But maybe it would be better to eliminate that case. OK.
 How about a zero-length commit_write? In that case again, they should
 be able to remain unchanged *except* that they are not to extend i_size
 or mark the page uptodate.

If we make the change I described above (looking for BH_New buffers outside
the range passed), then zero length or partial shouldn't matter, but zero
length instead of partial would be nicer imho just for the sake of reducing
the total number of cases down to the entire range or zero length.


 For some reason, I'm not seeing where BH_New is being cleared in case with
 no errors or faults. Hopefully I'm wrong, but if I'm not I believe we need
 to clear the flag somewhere (perhaps in block_commit_write()?).
 
 Hmm, it is a bit inconsistent. It seems to be anywhere from prepare_write
 to block_write_full_page.
 
 Where do filesystems need the bit? It would be nice to clear it in
 commit_write if possible. Worst case we'll need a new bit.

-commit_write() would probably do fine. Currently, block_prepare_write()
uses it to know which buffers were newly allocated (the file system specific
get_block_t sets the bit after allocation). I think we could safely move
the clearing of that bit to block_commit_write(), thus still allowing us to
detect and zero those blocks in generic_file_buffered_write()

Thanks,
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Secure Deletion and Trash-Bin Support for Ext4

2006-12-08 Thread Nikolai Joukov
 They are defined but unused in 2.6.19, right? I can't see anywhere
 in the 2.6.19 ext2/3/4/reiser trees that actually those flags,
 including setting and retrieving them from disk. JFS i can see
 sets, clears and retreives them, but not the fielsystems you
 mention. Though I might just be blind. ;)

Well, at least Ext2/3/4 and JFS do store these attributes and retrieve
them from the disk.  However, not a single one of these file systems
supports the related functionality (secure deletion or a trash-bin).
Clearly, this is confusing for the users.

 If all we need to add to XFS is support for those flags, then XFS
 support would be trivial to add.

 Oh, damn. I take that back. We're almost out of flag space in the on
 disk inode - these two flags would use the last 2 flag bits so this
 may require an on disk inode format change in XFS. This will be
 a little more complex than I first thought, but not impossible
 as we already support two on-disk inode format versions.

If you do not see any obvious use for these two bits in the near future
and ready to sacrifice them (or at least one of them) for secure deletion
and trash-bin attributes we would be more than happy to add the actual
functionality to XFS as well.

 I'd certainly like to see new functionality like this added at the
 VFS rather than in any particular filesystem. Don't know about
 anyone else, though

  I guess, the right thing to do would be to move the common trash-bin
  (tb.c and tb.h) code into the /fs and /include/linux directories.

 And call them trashbin.[ch] ;)

In fact, we have already done this :-)

  Also, VFS would require just a couple of trivial changes to support
  something like '-o trashbin' mount-time option for all file systems.
  In addition, individual file systems may support per-file attributes for
  this (e.g., ext2/3/4).

 Sounds like a good idea to me.

Thank you!  So we will submit the corresponding patches soon.  We will
start from the trashbin.c and patches for the individual file systems.  A
little later we will submit a VFS patch to add support for the
'-o trashbin' mount option.

Nikolai Joukov
--
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH 1/10] lockd: add new export operation for nfsv4/lockd locking

2006-12-08 Thread Wendy Cheng

J. Bruce Fields wrote:


From: Marc Eshel [EMAIL PROTECTED]

There is currently a filesystem -lock() method, but it is defined only by
a few filesystems that are not exported via nfsd.  So none of the lock
routines that are used by lockd or nfsv4 bother to call those methods.

Filesystems such as cluster filesystems would like to do their own locking
and also would like to be exportable via NFS.

So we add a new lock() export operation, and new routines vfs_lock_file,
vfs_test_lock, and vfs_cancel_lock, which call the new export operation,
falling back on the appropriate local operation if the export operation is
unavailable.

These new functions are intended to be used by lockd and nfsd; lockd and
nfsd changes to take advantage of them are made by later patches.

 

Starting to read these patches .. I see these new APIs are placed in 
lockd directory. Was there any discussion before to make these generic 
vfs layer functions ? I'm just curious.


-- Wendy
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html