Status of buffered write path (deadlock fixes)

2006-12-04 Thread Nick Piggin

Hi,

I'd like to try to state where we are WRT the buffered write patches,
and ask for comments. Sorry for the wide cc list, but this is an
important issue which hasn't had enough review.

Well the next -mm will include everything we've done so far. I won't
repost patches unless someone would like to comment on a specific one.

I think the core generic_file_buffered_write is fairly robust, after
fixing the efault and zerolength iov problems picked up in testing
(thanks, very helpful!).

So now I *believe* we have an approach that solves the deadlock and
doesn't expose transient or stale data, transient zeroes, or anything
like that.

Error handling is getting close, but there may be cases that nobody
has picked up, and I've noticed a couple which I'll explain below.

I think we do the right thing WRT pagecache error handling: a
!uptodate page remains !uptodate, an uptodate page can handle the
write being done in several parts. Comments in the patches attempt
to explain how this works. I think it is pretty straightforward.

But WRT block allocation in the case of errors, it needs more review.

Block allocation:
- prepare_write can allocate blocks
- prepare_write doesn't need to initialize the pagecache on top of
  these blocks where it is within the range specified in prepare_write
  (because the copy_from_user will initialise it correctly)
- In the case of a !uptodate page, unless the page is brought uptodate
  (ie the copy_from_user completely succeeds) and marked dirty, then
  a read that sneaks in after we unlock the page (to retry the write)
  will try to bring it uptodate by pulling in the uninitialised blocks.

Problem 1:
I think that allocating blocks outside i_size is OK WRT uninitialised
data, because we update i_size only after a successful copy. However,
I don't think we trim these blocks off (eg. perhaps the "prepare_write
may have instantiated a few blocks" path should be the normal error
path for both the copy_from_user and the commit_write error cases as
well?)

We allocate blocks within holes, but these don't need to be trimmed: it
is enough to just zero out any new buffers. It might be nicer if we had
some kind of way to punch a hole, but it is a rare corner case.

Problem 2:
nobh error handling[*]. We have just a single buffer that is used for
each block in the prepare_write path, so the "zero new buffers" trick
doesn't work.

I think one solution to this could be to allocate all buffers for the
page like normal, and then strip them off when commit_write succeeds?
This would allow the zero_new_buffers path to work properly.

[*] Actually I think there is a problem with the mainline nobh error
handling in that a whole page of blocks will get zeroed on failure,
even valid data that isn't being touched by the write.

Finally, filesystems. Only OGAWA Hirofumi and Mark Fasheh have given much
feedback so far. I've tried to grok ext2/3 and think they'll work OK, and
have at least *looked* at all the rest. However in the worst case, there
might be many subtle and different problems :( Filesystem developers need
to review this, please. I don't want to cc every filesystem dev list, but
if anybody thinks it would be helpful to forward this then please do.

Well, that's about where its at. Block allocation problems 1 and 2
shouldn't be too hard to fix, but I would like confirmation / suggestions.

Thanks,
Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-04 Thread Trond Myklebust
On Mon, 2006-12-04 at 18:59 -0600, Rob Ross wrote:
> Hi all,
> 
> I don't think that the group intended that there be an opendirplus(); 
> rather readdirplus() would simply be called instead of the usual 
> readdir(). We should clarify that.
> 
> Regarding Peter Staubach's comments about no one ever using the 
> readdirplus() call; well, if people weren't performing this workload in 
> the first place, we wouldn't *need* this sort of call! This call is 
> specifically targeted at improving "ls -l" performance on large 
> directories, and Sage has pointed out quite nicely how that might work.

...and we have pointed out how nicely this ignores the realities of
current caching models. There is no need for a  readdirplus() system
call. There may be a need for a caching barrier, but AFAICS that is all.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-04 Thread Gary Grider

At 05:59 PM 12/4/2006, Rob Ross wrote:

Hi all,

I don't think that the group intended that there be an 
opendirplus(); rather readdirplus() would simply be called instead 
of the usual readdir(). We should clarify that.


Regarding Peter Staubach's comments about no one ever using the 
readdirplus() call; well, if people weren't performing this workload 
in the first place, we wouldn't *need* this sort of call! This call 
is specifically targeted at improving "ls -l" performance on large 
directories, and Sage has pointed out quite nicely how that might work.


In our case (PVFS), we would essentially perform three phases of 
communication with the file system for a readdirplus that was 
obtaining full statistics: first grabbing the directory entries, 
then obtaining metadata from servers on all objects in bulk, then 
gathering file sizes in bulk. The reduction in control message 
traffic is enormous, and the concurrency is much greater than in a 
readdir()+stat()s workload. We'd never perform this sort of 
optimization optimistically, as the cost of guessing wrong is just 
too high. We would want to see the call as a proper VFS operation 
that we could act upon.


The entire readdirplus() operation wasn't intended to be atomic, and 
in fact the returned structure has space for an error associated 
with the stat() on a particular entry, to allow for implementations 
that stat() subsequently and get an error because the object was 
removed between when the entry was read out of the directory and 
when the stat was performed. I think this fits well with what 
Andreas and others are thinking. We should clarify the description 
appropriately.


I don't think that we have a readdirpluslite() variation documented 
yet? Gary? It would make a lot of sense. Except that it should 
probably have a better name...


Correct, we do not have that documented.  I suppose we could just 
have a mask like

statlite and keep it to one call perhaps.


Regarding Andreas's note that he would prefer the statlite() flags 
to mean "valid", that makes good sense to me (and would obviously 
apply to the so-far even more hypothetical readdirpluslite()). I 
don't think there's a lot of value in returning possibly-inaccurate values?


The one use that some users talk about is just knowing the file is 
growing is important  and useful to them,
knowing exactly to the byte how much growth seems less important to 
them until they close.
On these big parallel apps, so many things can happen that can just 
hang.  They often use
the presence of checkpoint files and how big they are to gage 
progress of he application.
Of course there are other ways this can be accomplished but they do 
this sort of thing
a lot.  That is the main case I have heard that might benefit from 
"possibly-inaccurate" values.
Of course it assumes that the inaccuracy is just old information and 
not bogus information.


Thanks, we will put out a complete version of what we have in a 
document to the Open Group
site in a week or two so all the pages in their current state are 
available.  We could then
begin some iteration on all these comments we have gotten from the 
various communities.


Thanks
Gary



Thanks everyone,

Rob

Trond Myklebust wrote:

On Mon, 2006-12-04 at 00:32 -0700, Andreas Dilger wrote:
I'm wondering if a corresponding opendirplus() (or similar) would 
also be appropriate to inform the kernel/filesystem that 
readdirplus() will follow, and stat information should be 
gathered/buffered.  Or do most implementations wait for the first 
readdir() before doing any actual work anyway?

I'm not sure what some filesystems might do here.  I suppose NFS has weak
enough cache semantics that it _might_ return stale cached data from the
client in order to fill the readdirplus() data, but it is just as likely
that it ships the whole thing to the server and returns everything in
one shot.  That would imply everything would be at least as up-to-date
as the opendir().

Whether or not the posix committee decides on readdirplus, I propose
that we implement this sort of thing in the kernel via a readdir
equivalent to posix_fadvise(). That can give exactly the barrier
semantics that they are asking for, and only costs 1 extra syscall as
opposed to 2 (opendirplus() and readdirplus()).
Cheers
  Trond





-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-04 Thread Rob Ross

Hi all,

I don't think that the group intended that there be an opendirplus(); 
rather readdirplus() would simply be called instead of the usual 
readdir(). We should clarify that.


Regarding Peter Staubach's comments about no one ever using the 
readdirplus() call; well, if people weren't performing this workload in 
the first place, we wouldn't *need* this sort of call! This call is 
specifically targeted at improving "ls -l" performance on large 
directories, and Sage has pointed out quite nicely how that might work.


In our case (PVFS), we would essentially perform three phases of 
communication with the file system for a readdirplus that was obtaining 
full statistics: first grabbing the directory entries, then obtaining 
metadata from servers on all objects in bulk, then gathering file sizes 
in bulk. The reduction in control message traffic is enormous, and the 
concurrency is much greater than in a readdir()+stat()s workload. We'd 
never perform this sort of optimization optimistically, as the cost of 
guessing wrong is just too high. We would want to see the call as a 
proper VFS operation that we could act upon.


The entire readdirplus() operation wasn't intended to be atomic, and in 
fact the returned structure has space for an error associated with the 
stat() on a particular entry, to allow for implementations that stat() 
subsequently and get an error because the object was removed between 
when the entry was read out of the directory and when the stat was 
performed. I think this fits well with what Andreas and others are 
thinking. We should clarify the description appropriately.


I don't think that we have a readdirpluslite() variation documented yet? 
Gary? It would make a lot of sense. Except that it should probably have 
a better name...


Regarding Andreas's note that he would prefer the statlite() flags to 
mean "valid", that makes good sense to me (and would obviously apply to 
the so-far even more hypothetical readdirpluslite()). I don't think 
there's a lot of value in returning possibly-inaccurate values?


Thanks everyone,

Rob

Trond Myklebust wrote:

On Mon, 2006-12-04 at 00:32 -0700, Andreas Dilger wrote:
I'm wondering if a corresponding opendirplus() (or similar) would also be 
appropriate to inform the kernel/filesystem that readdirplus() will 
follow, and stat information should be gathered/buffered.  Or do most 
implementations wait for the first readdir() before doing any actual work 
anyway?

I'm not sure what some filesystems might do here.  I suppose NFS has weak
enough cache semantics that it _might_ return stale cached data from the
client in order to fill the readdirplus() data, but it is just as likely
that it ships the whole thing to the server and returns everything in
one shot.  That would imply everything would be at least as up-to-date
as the opendir().


Whether or not the posix committee decides on readdirplus, I propose
that we implement this sort of thing in the kernel via a readdir
equivalent to posix_fadvise(). That can give exactly the barrier
semantics that they are asking for, and only costs 1 extra syscall as
opposed to 2 (opendirplus() and readdirplus()).

Cheers
  Trond


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-04 Thread Valerie Henson
On Mon, Dec 04, 2006 at 04:36:20PM -0800, Valerie Henson wrote:
> On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote:
> > Hi Steve,
> > 
> > On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote:
> > > > In the future, I'd like to see a "relative atime" mode, which functions
> > > > in the manner described by Valerie Henson at:
> > > > 
> > > > http://lkml.org/lkml/2006/8/25/380
> > > > 
> > > I'd like to second that. [adding Val Henson to the "to"] What (if
> > > anything) remains to be done before the relative atime patch is ready to
> > > go upstream? I'm happy to help out here if required,
> > Last time I looked at them, things seemed to be in pretty good shape - it
> > wasn't a very large patch series.

And the userland part.

-VAL

Add the "relatime" (relative atime) option support to mount.  Relative
atime only updates the atime if the previous atime is older than the
mtime or ctime.  Like noatime, but useful for applications like mutt
that need to know when a file has been read since it was last
modified.

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>

---
 mount/mount.8   |7 +++
 mount/mount.c   |6 ++
 mount/mount_constants.h |4 
 3 files changed, 17 insertions(+)
--- util-linux-2.13-pre7.orig/mount/mount.8
+++ util-linux-2.13-pre7/mount/mount.8
@@ -586,6 +586,13 @@ access on the news spool to speed up new
 .B nodiratime
 Do not update directory inode access times on this filesystem.
 .TP
+.B relatime
+Update inode access times relative to modify or change time.  Access
+time is only updated if the previous access time was earlier than the
+current modify or change time. (Similar to noatime, but doesn't break
+mutt or other applications that need to know if a file has been read
+since the last time it was modified.)
+.TP
 .B noauto
 Can only be mounted explicitly (i.e., the
 .B \-a
--- util-linux-2.13-pre7.orig/mount/mount.c
+++ util-linux-2.13-pre7/mount/mount.c
@@ -164,6 +164,12 @@ static const struct opt_map opt_map[] =
   { "diratime",0, 1, MS_NODIRATIME },  /* Update dir access times */
   { "nodiratime", 0, 0, MS_NODIRATIME },/* Do not update dir access times */
 #endif
+#ifdef MS_RELATIME
+  { "relatime", 0, 0, MS_RELATIME },   /* Update access times relative to
+  mtime/ctime */
+  { "norelatime", 0, 1, MS_RELATIME }, /* Update access time without regard
+  to mtime/ctime */
+#endif
   { NULL,  0, 0, 0 }
 };

--- util-linux-2.13-pre7.orig/mount/mount_constants.h
+++ util-linux-2.13-pre7/mount/mount_constants.h
@@ -57,6 +57,10 @@ if we have a stack or plain mount - moun
 #ifndef MS_VERBOSE
 #define MS_VERBOSE 0x8000  /* 32768 */
 #endif
+#ifndef MS_RELATIME
+#define MS_RELATIME   0x20 /* 20: Update access times relative
+  to mtime/ctime */
+#endif
 /*
  * Magic mount flag number. Had to be or-ed to the flag values.
  */
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-04 Thread Rob Ross

Hi,

I agree that it is not feasible to add new system calls every time 
somebody has a problem, and we don't take adding system calls lightly. 
However, in this case we're talking about an entire *community* of 
people (high-end computing), not just one or two people. Of course it 
may still be the case that that community is not important enough to 
justify the addition of system calls; that's obviously not my call to make!


I'm sure that you meant more than just to rename openg() to lookup(), 
but I don't understand what you are proposing. We still need a second 
call to take the results of the lookup (by whatever name) and convert 
that into a file descriptor. That's all the openfh() (previously named 
sutoc()) is for.


I think the subject line might be a little misleading; we're not just 
talking about NFS here. There are a number of different file systems 
that might benefit from these enhancements (e.g. GPFS, Lustre, PVFS, 
PanFS, etc.).


Finally, your comment on making filesystem developers miserable is sort 
of a point of philosophical debate for me. I personally find myself 
miserable trying to extract performance given the very small amount of 
information passing through the existing POSIX calls. The additional 
information passing through these new calls will make it much easier to 
obtain performance without correctly guessing what the user might 
actually be up to. While they do mean more work in the short term, they 
should also mean a more straight-forward path to performance for 
cluster/parallel file systems.


Thanks for the input. Does this help explain why we don't think we can 
just work under the existing calls?


Rob

Latchesar Ionkov wrote:

Hi,

One general remark: I don't think it is feasible to add new system
calls every time somebody has a problem. Usually there are (may be not
that good) solutions that don't require big changes and work well
enough. "Let's change the interface and make the life of many
filesystem developers miserable, because they have to worry about
3-4-5 more operations" is not the easiest solution in the long run.

On 12/1/06, Rob Ross <[EMAIL PROTECTED]> wrote:

Hi all,

The use model for openg() and openfh() (renamed sutoc()) is n processes
spread across a large cluster simultaneously opening a file. The
challenge is to avoid to the greatest extent possible incurring O(n) FS
interactions. To do that we need to allow actions of one process to be
reused by other processes on other OS instances.

The openg() call allows one process to perform name resolution, which is
often the most expensive part of this use model. Because permission


If the name resolution is the most expensive part, why not implement
just the name lookup part and call it "lookup" instead of "openg". Or
even better, make NFS to resolve multiple names with a single request.
If the NFS server caches the last few name lookups, the responses from
the other nodes will be fast, and you will get your file descriptor
with two instead of the proposed one request. The performance could be
just good enough without introducing any new functions and file
handles.

Thanks,
   Lucho


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Relative atime (was Re: What's in ocfs2.git)

2006-12-04 Thread Valerie Henson
On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote:
> Hi Steve,
> 
> On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote:
> > > In the future, I'd like to see a "relative atime" mode, which functions
> > > in the manner described by Valerie Henson at:
> > > 
> > > http://lkml.org/lkml/2006/8/25/380
> > > 
> > I'd like to second that. [adding Val Henson to the "to"] What (if
> > anything) remains to be done before the relative atime patch is ready to
> > go upstream? I'm happy to help out here if required,
> Last time I looked at them, things seemed to be in pretty good shape - it
> wasn't a very large patch series.

Yep, the relative atime patch is tiny and pretty much done - just
needs some soak time in -mm and a little more review (cc'd Viro and
fsdevel).  Kernel patch against 2.6.18-rc4 appended, patch to mount
following. (Note that my web server suffered a RAID failure and my
patches page is unavailable till the restore finishes.)

-VAL

Add "relatime" (relative atime) support.  Relative atime only updates
the atime if the previous atime is older than the mtime or ctime.
Like noatime, but useful for applications like mutt that need to know
when a file has been read since it was last modified.

Signed-off-by: Valerie Henson <[EMAIL PROTECTED]>

---
 fs/inode.c|   11 ++-
 fs/namespace.c|5 -
 include/linux/fs.h|1 +
 include/linux/mount.h |1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

--- linux-2.6.18-rc4-relatime.orig/fs/inode.c
+++ linux-2.6.18-rc4-relatime/fs/inode.c
@@ -1200,7 +1200,16 @@ void touch_atime(struct vfsmount *mnt, s
return;

now = current_fs_time(inode->i_sb);
-   if (!timespec_equal(&inode->i_atime, &now)) {
+   if (timespec_equal(&inode->i_atime, &now))
+   return;
+   /*
+* With relative atime, only update atime if the previous
+* atime is earlier than either the ctime or mtime.
+*/
+   if (!mnt ||
+   !(mnt->mnt_flags & MNT_RELATIME) ||
+   (timespec_compare(&inode->i_atime, &inode->i_mtime) < 0) ||
+   (timespec_compare(&inode->i_atime, &inode->i_ctime) < 0)) {
inode->i_atime = now;
mark_inode_dirty_sync(inode);
}
--- linux-2.6.18-rc4-relatime.orig/fs/namespace.c
+++ linux-2.6.18-rc4-relatime/fs/namespace.c
@@ -376,6 +376,7 @@ static int show_vfsmnt(struct seq_file *
{ MNT_NOEXEC, ",noexec" },
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
+   { MNT_RELATIME, ",relatime" },
{ 0, NULL }
};
struct proc_fs_info *fs_infop;
@@ -1413,9 +1414,11 @@ long do_mount(char *dev_name, char *dir_
mnt_flags |= MNT_NOATIME;
if (flags & MS_NODIRATIME)
mnt_flags |= MNT_NODIRATIME;
+   if (flags & MS_RELATIME)
+   mnt_flags |= MNT_RELATIME;

flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
-  MS_NOATIME | MS_NODIRATIME);
+  MS_NOATIME | MS_NODIRATIME | MS_RELATIME);

/* ... and get the mountpoint */
retval = path_lookup(dir_name, LOOKUP_FOLLOW, &nd);
--- linux-2.6.18-rc4-relatime.orig/include/linux/fs.h
+++ linux-2.6.18-rc4-relatime/include/linux/fs.h
@@ -119,6 +119,7 @@ extern int dir_notify_enable;
 #define MS_PRIVATE (1<<18) /* change to private */
 #define MS_SLAVE   (1<<19) /* change to slave */
 #define MS_SHARED  (1<<20) /* change to shared */
+#define MS_RELATIME(1<<21) /* Update atime relative to mtime/ctime. */
 #define MS_ACTIVE  (1<<30)
 #define MS_NOUSER  (1<<31)

--- linux-2.6.18-rc4-relatime.orig/include/linux/mount.h
+++ linux-2.6.18-rc4-relatime/include/linux/mount.h
@@ -27,6 +27,7 @@ struct namespace;
 #define MNT_NOEXEC 0x04
 #define MNT_NOATIME0x08
 #define MNT_NODIRATIME 0x10
+#define MNT_RELATIME   0x20

 #define MNT_SHRINKABLE 0x100
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Secure Deletion and Trash-Bin Support for Ext4

2006-12-04 Thread David Chinner
On Mon, Dec 04, 2006 at 01:33:55PM -0500, Nikolai Joukov wrote:
> As we promised on the linux-ext4 list on October 31, here is the patch
> that adds secure deletion via a trash-bin functionality for ext4.  It is a
> compromise solution that combines secure deletion with the trash-bin support
> (the latter had been requested by even more people than the former :-).

Given that almost all of the code for this uses vfs interfaces and
only a couple of simple filesystem hooks, is there any reason for
this being ext4 specific?  i.e. why aren't you hooking the vfs layer
so we get a single undelete/secure delete implementation for all
filesystems?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] prune_icache_sb

2006-12-04 Thread Wendy Cheng

Russell Cattelan wrote:

Wendy Cheng wrote:

Linux kernel, particularly the VFS layer, is starting to show signs 
of inadequacy as the software components built upon it keep growing. 
I have doubts that it can keep up and handle this complexity with a 
development policy like you just described (filesystem is a dumb 
layer ?). Aren't these DIO_xxx_LOCKING flags inside 
__blockdev_direct_IO() a perfect example why trying to do too many 
things inside vfs layer for so many filesystems is a bad idea ? By 
the way, since we're on this subject, could we discuss a little bit 
about vfs rename call (or I can start another new discussion thread) ?


Note that linux do_rename() starts with the usual lookup logic, 
followed by "lock_rename", then a final round of dentry lookup, and 
finally comes to filesystem's i_op->rename call. Since lock_rename() 
only calls for vfs layer locks that are local to this particular 
machine, for a cluster filesystem, there exists a huge window between 
the final lookup and filesystem's i_op->rename calls such that the 
file could get deleted from another node before fs can do anything 
about it. Is it possible that we could get a new function pointer 
(lock_rename) in inode_operations structure so a cluster filesystem 
can do proper locking ?


It looks like the ocfs2 guys have the similar problem?

http://ftp.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/0009-PATCH-Allow-file-systems-to-manually-d_move-inside-of-rename.txt 






Thanks for the pointer. Same as ocfs2, under current VFS code, both 
GFS1/2 also need FS_ODD_RENAME flag for the rename problem - got an ugly 
~200 line draft patch ready for GFS1 (and am looking into GFS2 at this 
moment). The issue here is, for GFS, if vfs lock_rename() can call us, 
this complication can be greatly reduced. Will start another thread to 
see whether the wish can be granted.


-- Wendy

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] Secure Deletion and Trash-Bin Support for Ext4

2006-12-04 Thread Nikolai Joukov
As we promised on the linux-ext4 list on October 31, here is the patch
that adds secure deletion via a trash-bin functionality for ext4.  It is a
compromise solution that combines secure deletion with the trash-bin support
(the latter had been requested by even more people than the former :-).

The patch moves the files marked with the -s (secure deletion) or the -u
(undelete) ext2/3/4 attributes to a .trash// directory.  It does so
reliably because it does it in the kernel upon every unlink operation.
User-mode tools can miss some unlinks, which is unacceptable for secure
deletion.  The per-user subdirectories allow us to solve the problem
with permissions.  The .trash directory is owned by root and has
drwxr-xr-x permissions.  The per-uid subdirectories are owned and only
accessible by their corresponding owners (drwx--) so users cannot
read each other's files in the trash.  Alternative solutions would
require changing a lot of ext3 code.

Right now we do not store the whole path to the file moved into the
trash-bin.  We keep the filenames or append them with numbers in case of
collisions.  In the future we may change it to store the whole path
information if necessary.

Later (e.g., when the system is idle) a user-mode daemon can just scan
.trash's subdirectories, overwrite the files marked with -s, and unlink
the files.  The purging and data overwriting is performed entirely in user
space.  It may or may not be necessary to add a mechanism for the file
system to initiate the user-mode deletion once the space becomes scarce.

The benefits of this secure deletion approach are obvious:
1) small kernel patch;
2) two solutions (trash-bin and secure deletion) in one;
3) the user-mode part can be made arbitrarily complex and overwrite with
   many patterns, many times, and at configurable times; and
4) most of the code can be reused for other file systems in the future.

We will really appreciate any comments, help, and feedback.

Thank you,
Nikolai Joukov, Harry Papaxenopoulos, and Erez Zadok.
Filesystems and Storage Laboratory,
Stony Brook University

Signed-off-by: Nikolai Joukov <[EMAIL PROTECTED]>
Signed-off-by: Harry Papaxenopoulos <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/ext4/Makefile|1 +
 fs/ext4/namei.c |   40 ++-
 fs/ext4/super.c |6 +
 fs/ext4/tb.c|  237 +++
 fs/ext4/tb.h|   18 +++
 fs/Kconfig  |9 +
 6 files changed, 309 insertions(+), 2 deletions(-)

diff -Naur 2.6.19/fs/ext4/Makefile 2.6.19tb/fs/ext4/Makefile
--- 2.6.19/fs/ext4/Makefile 2006-12-02 19:49:53.0 -0500
+++ 2.6.19tb/fs/ext4/Makefile   2006-12-02 19:46:13.0 -0500
@@ -10,3 +10,4 @@
 ext4dev-$(CONFIG_EXT4DEV_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
 ext4dev-$(CONFIG_EXT4DEV_FS_POSIX_ACL) += acl.o
 ext4dev-$(CONFIG_EXT4DEV_FS_SECURITY)  += xattr_security.o
+ext4dev-$(CONFIG_EXT4DEV_FS_TRASH_BIN) += tb.o
diff -Naur 2.6.19/fs/ext4/namei.c 2.6.19tb/fs/ext4/namei.c
--- 2.6.19/fs/ext4/namei.c  2006-12-02 19:49:49.0 -0500
+++ 2.6.19tb/fs/ext4/namei.c2006-12-04 13:03:35.0 -0500
@@ -41,6 +41,7 @@
 #include "namei.h"
 #include "xattr.h"
 #include "acl.h"
+#include "tb.h"

 /*
  * define how far ahead to read directories while searching them.
@@ -2067,6 +2068,10 @@
struct inode * inode;
struct buffer_head * bh;
struct ext4_dir_entry_2 * de;
+   int trashed = 0;
+#ifdef CONFIG_EXT4DEV_FS_TRASH_BIN
+   struct dentry *user_dentry = NULL;
+#endif
handle_t *handle;

/* Initialize quotas before so that eventual writes go
@@ -2096,13 +2101,40 @@
  inode->i_ino, inode->i_nlink);
inode->i_nlink = 1;
}
-   retval = ext4_delete_entry(handle, dir, de, bh);
+#ifdef CONFIG_EXT4DEV_FS_TRASH_BIN
+   if (EXT4_I(dentry->d_inode)->i_flags &
+   (EXT4_UNRM_FL | EXT4_SECRM_FL)) {
+
+   /*
+* We put this code here to optimize the common case. Since
+* lookups are expensive, we try to reserve from making any,
+* unless one of the trash-bin flags are set. The cleanest
+* way though is to probably move this code outside the
+* above if statement.
+*/
+   user_dentry = ext4_get_user_dentry(dir, 1);
+   if (IS_ERR(user_dentry)) {
+   retval = PTR_ERR(user_dentry);
+   user_dentry = NULL;
+   goto end_unlink;
+   }
+
+   if (inode->i_nlink == 1 && user_dentry->d_inode &&
+   user_dentry->d_inode->i_ino != dir->i_ino) {
+   retval = ext4_trash_entry(dir, dentry);
+   trashed = 1;
+   }
+   }
+#endif
+   if (!trashed)
+   retval = ext4_delete_entry(h

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-04 Thread Peter Staubach

Sage Weil wrote:

On Fri, 1 Dec 2006, Trond Myklebust wrote:

I'm quite happy with a proposal for a statlite(). I'm objecting to
readdirplus() because I can't see that it offers you anything useful.
You haven't provided an example of an application which would clearly
benefit from a readdirplus() interface instead of readdir()+statlite()
and possibly some tools for managing cache consistency.


Okay, now I think I understand where you're coming from.

The difference between readdirplus() and readdir()+statlite() is that 
(depending on the mask you specify) statlite() either provides the 
"right" answer (ala stat()), or anything that is vaguely "recent."  
readdirplus() would provide size/mtime from sometime _after_ the 
initial opendir() call, establishing a useful ordering.  So without 
readdirplus(), you either get readdir()+stat() and the performance 
problems I mentioned before, or readdir()+statlite() where "recent" 
may not be good enough.


Instead of my previous example of proccess #1 waiting for process #2 
to finish and then checking the results with stat(), imagine instead 
that #1 is waiting for 100,000 other processes to finish, and then 
wants to check the results (size/mtime) of all of them.  
readdir()+statlite() won't work, and readdir()+stat() may be 
pathologically slow.


Also, it's a tiring and trivial example, but even the 'ls -al' 
scenario isn't ideally addressed by readdir()+statlite(), since 
statlite() might return size/mtime from before 'ls -al' was executed 
by the user.  One can easily imagine modifying a file on one host, 
then doing 'ls -al' on another host and not seeing the effects.  If 
'ls -al' can use readdirplus(), it's overall application semantics can 
be preserved without hammering large directories in a distributed 
filesystem.




I think that there are several points which are missing here.

First, readdirplus(), without any sort of caching, is going to be _very_
expensive, performance-wise, for _any_ size directory.  You can see this
by instrumenting any NFS server which already supports the NFSv3 READDIRPLUS
semantics.

Second, the NFS client side readdirplus() implementation is going to be
_very_ expensive as well.  The NFS client does write-behind and all this
data _must_ be flushed to the server _before_ the over the wire READDIRPLUS
can be issued.  This means that the client will have to step through every
inode which is associated with the directory inode being readdirplus()'d
and ensure that all modified data has been successfully written out.  This
part of the operation, for a sufficiently large directory and a sufficiently
large page cache, could take signficant time in itself.

These overheads may make this new operation expensive enough that no
applications will end up using it.


I agree that an interface which allows a userland process offer hints to
the kernel as to what kind of cache consistency it requires for file
metadata would be useful. We already have stuff like posix_fadvise() etc
for file data, and perhaps it might be worth looking into how you could
devise something similar for metadata.
If what you really want is for applications to be able to manage network
filesystem cache consistency, then why not provide those tools instead?


True, something to manage the attribute cache consistency for 
statlite() results would also address the issue by letting an 
application declare how weak it's results are allowed to be.  That 
seems a bit more awkward, though, and would only affect 
statlite()--the only call that allows weak consistency in the first 
place.  In contrast, readdirplus maps nicely onto what filesystems 
like NFS are already doing over the wire. 


Speaking of applications, how many applications are there in the world,
or even being contemplated, which are interested in a directory of
files and whether or not this set of files has changed from the previous
snapshot of the set of files?  Most applications deal with one or two
files on such a basis, not multitudes.  In fact, having worked with
file systems and NFS in particular for more than 20 years now, I have
yet to hear of one.  This is a lot of work and complexity for very
little gain, I think.

Is this not a problem which be better solved at the application level?
Or perhaps finer granularity than "noac" for the NFS attribute caching?

   Thanx...

  ps
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] prune_icache_sb

2006-12-04 Thread Russell Cattelan

Wendy Cheng wrote:


Andrew Morton wrote:


On Thu, 30 Nov 2006 11:05:32 -0500
Wendy Cheng <[EMAIL PROTECTED]> wrote:

 



The idea is, instead of unconditionally dropping every buffer 
associated with the particular mount point (that defeats the purpose 
of page caching), base kernel exports the "drop_pagecache_sb()" call 
that allows page cache to be trimmed. More importantly, it is 
changed to offer the choice of not randomly purging any buffer but 
the ones that seem to be unused (i_state is NULL and i_count is 
zero). This will encourage filesystem(s) to pro actively response to 
vm memory shortage if they choose so.
  



argh.
 

I read this as "It is ok to give system admin(s) commands (that this 
"drop_pagecache_sb() call" is all about) to drop page cache. It is, 
however, not ok to give filesystem developer(s) this very same 
function to trim their own page cache if the filesystems choose to do 
so" ?



In Linux a filesystem is a dumb layer which sits between the VFS and the
I/O layer and provides dumb services such as reading/writing inodes,
reading/writing directory entries, mapping pagecache offsets to disk
blocks, etc.  (This model is to varying degrees incorrect for every
post-ext2 filesystem, but that's the way it is).
 

Linux kernel, particularly the VFS layer, is starting to show signs of 
inadequacy as the software components built upon it keep growing. I 
have doubts that it can keep up and handle this complexity with a 
development policy like you just described (filesystem is a dumb layer 
?). Aren't these DIO_xxx_LOCKING flags inside __blockdev_direct_IO() a 
perfect example why trying to do too many things inside vfs layer for 
so many filesystems is a bad idea ? By the way, since we're on this 
subject, could we discuss a little bit about vfs rename call (or I can 
start another new discussion thread) ?


Note that linux do_rename() starts with the usual lookup logic, 
followed by "lock_rename", then a final round of dentry lookup, and 
finally comes to filesystem's i_op->rename call. Since lock_rename() 
only calls for vfs layer locks that are local to this particular 
machine, for a cluster filesystem, there exists a huge window between 
the final lookup and filesystem's i_op->rename calls such that the 
file could get deleted from another node before fs can do anything 
about it. Is it possible that we could get a new function pointer 
(lock_rename) in inode_operations structure so a cluster filesystem 
can do proper locking ?


It looks like the ocfs2 guys have the similar problem?

http://ftp.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/0009-PATCH-Allow-file-systems-to-manually-d_move-inside-of-rename.txt

Does this change help fix gfs lock ordering problem as well?


-Russell Cattelan
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-04 Thread Trond Myklebust
On Mon, 2006-12-04 at 00:32 -0700, Andreas Dilger wrote:
> > I'm wondering if a corresponding opendirplus() (or similar) would also be 
> > appropriate to inform the kernel/filesystem that readdirplus() will 
> > follow, and stat information should be gathered/buffered.  Or do most 
> > implementations wait for the first readdir() before doing any actual work 
> > anyway?
> 
> I'm not sure what some filesystems might do here.  I suppose NFS has weak
> enough cache semantics that it _might_ return stale cached data from the
> client in order to fill the readdirplus() data, but it is just as likely
> that it ships the whole thing to the server and returns everything in
> one shot.  That would imply everything would be at least as up-to-date
> as the opendir().

Whether or not the posix committee decides on readdirplus, I propose
that we implement this sort of thing in the kernel via a readdir
equivalent to posix_fadvise(). That can give exactly the barrier
semantics that they are asking for, and only costs 1 extra syscall as
opposed to 2 (opendirplus() and readdirplus()).

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/35] fsstack: Remove unneeded wrapper

2006-12-04 Thread Josef 'Jeff' Sipek
From: Andrew Morton <[EMAIL PROTECTED]>

Remove unneeded wrapper.

Cc: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Cc: Michael Halcrow <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 fs/stack.c   |   10 --
 include/linux/fs_stack.h |   12 ++--
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/fs/stack.c b/fs/stack.c
index 5f6f12d..03987f2 100644
--- a/fs/stack.c
+++ b/fs/stack.c
@@ -11,13 +11,13 @@ void fsstack_copy_inode_size(struct inod
i_size_write(dst, i_size_read((struct inode *)src));
dst->i_blocks = src->i_blocks;
 }
+EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);
 
 /* copy all attributes; get_nlinks is optional way to override the i_nlink
  * copying
  */
-void __fsstack_copy_attr_all(struct inode *dest,
-const struct inode *src,
-int (*get_nlinks)(struct inode *))
+void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
+   int (*get_nlinks)(struct inode *))
 {
if (!get_nlinks)
dest->i_nlink = src->i_nlink;
@@ -34,6 +34,4 @@ void __fsstack_copy_attr_all(struct inod
dest->i_blkbits = src->i_blkbits;
dest->i_flags = src->i_flags;
 }
-
-EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);
-EXPORT_SYMBOL_GPL(__fsstack_copy_attr_all);
+EXPORT_SYMBOL_GPL(fsstack_copy_attr_all);
diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
index 56b3e09..bb516ce 100644
--- a/include/linux/fs_stack.h
+++ b/include/linux/fs_stack.h
@@ -8,9 +8,8 @@
 #include 
 
 /* externs for fs/stack.c */
-extern void __fsstack_copy_attr_all(struct inode *dest,
-   const struct inode *src,
-   int (*get_nlinks)(struct inode *));
+extern void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
+   int (*get_nlinks)(struct inode *));
 
 extern void fsstack_copy_inode_size(struct inode *dst, const struct inode 
*src);
 
@@ -29,11 +28,4 @@ static inline void fsstack_copy_attr_tim
dest->i_ctime = src->i_ctime;
 }
 
-static inline void fsstack_copy_attr_all(struct inode *dest,
-const struct inode *src)
-{
-   __fsstack_copy_attr_all(dest, src, NULL);
-}
-
 #endif /* _LINUX_FS_STACK_H */
-
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/35] fsstack: Make fsstack_copy_attr_all copy inode size

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

fsstack_copy_attr_all should copy the inode size in addition to all the
other attributes.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 fs/stack.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/stack.c b/fs/stack.c
index 8ffb880..5ddbc34 100644
--- a/fs/stack.c
+++ b/fs/stack.c
@@ -34,5 +34,7 @@ void fsstack_copy_attr_all(struct inode
dest->i_ctime = src->i_ctime;
dest->i_blkbits = src->i_blkbits;
dest->i_flags = src->i_flags;
+
+   fsstack_copy_inode_size(dest, src);
 }
 EXPORT_SYMBOL_GPL(fsstack_copy_attr_all);
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/35] fs/stack.c should #include

2006-12-04 Thread Josef 'Jeff' Sipek
From: Adrian Bunk <[EMAIL PROTECTED]>

Every file should #include the headers containing the prototypes for
its global functions.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Acked-by: Josef Sipek <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 fs/stack.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/stack.c b/fs/stack.c
index 03987f2..8ffb880 100644
--- a/fs/stack.c
+++ b/fs/stack.c
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 /* does _NOT_ require i_mutex to be held.
  *
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 29/35] Unionfs: Superblock operations

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains the superblock operations for Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/super.c |  342 
 1 files changed, 342 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
new file mode 100644
index 000..9920079
--- /dev/null
+++ b/fs/unionfs/super.c
@@ -0,0 +1,342 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* The inode cache is used with alloc_inode for both our inode info and the
+ * vfs inode.  */
+static struct kmem_cache *unionfs_inode_cachep;
+
+static void unionfs_read_inode(struct inode *inode)
+{
+   static struct address_space_operations unionfs_empty_aops;
+   int size;
+   struct unionfs_inode_info *info = UNIONFS_I(inode);
+
+   if (!info) {
+   printk(KERN_ERR "No kernel memory when allocating inode "
+   "private data!\n");
+   BUG();
+   }
+
+   memset(info, 0, offsetof(struct unionfs_inode_info, vfs_inode));
+   info->bstart = -1;
+   info->bend = -1;
+   atomic_set(&info->generation,
+  atomic_read(&UNIONFS_SB(inode->i_sb)->generation));
+   spin_lock_init(&info->rdlock);
+   info->rdcount = 1;
+   info->hashsize = -1;
+   INIT_LIST_HEAD(&info->readdircache);
+
+   size = sbmax(inode->i_sb) * sizeof(struct inode *);
+   info->lower_inodes = kzalloc(size, GFP_KERNEL);
+   if (!info->lower_inodes) {
+   printk(KERN_ERR "No kernel memory when allocating lower-"
+   "pointer array!\n");
+   BUG();
+   }
+
+   inode->i_version++;
+   inode->i_op = &unionfs_main_iops;
+   inode->i_fop = &unionfs_main_fops;
+
+   /* I don't think ->a_ops is ever allowed to be NULL */
+   inode->i_mapping->a_ops = &unionfs_empty_aops;
+}
+
+static void unionfs_put_inode(struct inode *inode)
+{
+   /*
+* This is really funky stuff:
+* Basically, if i_count == 1, iput will then decrement it and this
+* inode will be destroyed.  It is currently holding a reference to the
+* hidden inode.  Therefore, it needs to release that reference by
+* calling iput on the hidden inode.  iput() _will_ do it for us (by
+* calling our clear_inode), but _only_ if i_nlink == 0.  The problem
+* is, NFS keeps i_nlink == 1 for silly_rename'd files.  So we must for
+* our i_nlink to 0 here to trick iput() into calling our clear_inode.
+*/
+
+   if (atomic_read(&inode->i_count) == 1)
+   inode->i_nlink = 0;
+}
+
+/*
+ * we now define delete_inode, because there are two VFS paths that may
+ * destroy an inode: one of them calls clear inode before doing everything
+ * else that's needed, and the other is fine.  This way we truncate the inode
+ * size (and its pages) and then clear our own inode, which will do an iput
+ * on our and the lower inode.
+ */
+static void unionfs_delete_inode(struct inode *inode)
+{
+   inode->i_size = 0;  /* every f/s seems to do that */
+
+   clear_inode(inode);
+}
+
+/* final actions when unmounting a file system */
+static void unionfs_put_super(struct super_block *sb)
+{
+   int bindex, bstart, bend;
+   struct unionfs_sb_info *spd;
+
+   spd = UNIONFS_SB(sb);
+   if (!spd)
+   return;
+   
+   bstart = sbstart(sb);
+   bend = sbend(sb);
+
+   /* Make sure we have no leaks of branchget/branchput. */
+   for (bindex = bstart; bindex <= bend; bindex++)
+   BUG_ON(branch_count(sb, bindex) != 0);
+
+   kfree(spd->data);
+   kfree(spd);
+   sb->s_fs_info = NULL;
+}
+
+/* Since people use this to answer the "How big of a file can I write?"
+ * question, we report the size of the highest priority branch as the size of
+ * the union. */
+static int unionfs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+   int err = 0;
+   struct super_block *sb, *hidden_sb;
+
+   sb = dentry->d_sb;
+
+   hidden_sb = unionfs_lower_supe

[PATCH 24/35] Unionfs: Readdir state

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This file contains the routines for maintaining readdir state.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/rdstate.c |  282 ++
 1 files changed, 282 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/rdstate.c b/fs/unionfs/rdstate.c
new file mode 100644
index 000..5a19157
--- /dev/null
+++ b/fs/unionfs/rdstate.c
@@ -0,0 +1,282 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* This file contains the routines for maintaining readdir state. */
+/* There are two structures here, rdstate which is a hash table
+ * of the second structure which is a filldir_node. */
+
+/* This is a struct kmem_cache for filldir nodes, because we allocate a lot
+ * of them and they shouldn't waste memory.  If the node has a small name
+ * (as defined by the dentry structure), then we use an inline name to
+ * preserve kmalloc space. */
+static struct kmem_cache *unionfs_filldir_cachep;
+int init_filldir_cache(void)
+{
+   unionfs_filldir_cachep =
+   kmem_cache_create("unionfs_filldir", sizeof(struct filldir_node), 0,
+ SLAB_RECLAIM_ACCOUNT, NULL, NULL);
+
+   return (unionfs_filldir_cachep ? 0 : -ENOMEM);
+}
+
+void destroy_filldir_cache(void)
+{
+   if (unionfs_filldir_cachep)
+   kmem_cache_destroy(unionfs_filldir_cachep);
+}
+
+/* This is a tuning parameter that tells us roughly how big to make the
+ * hash table in directory entries per page.  This isn't perfect, but
+ * at least we get a hash table size that shouldn't be too overloaded.
+ * The following averages are based on my home directory.
+ * 14.44693Overall
+ * 12.29   Single Page Directories
+ * 117.93  Multi-page directories
+ */
+#define DENTPAGE 4096
+#define DENTPERONEPAGE 12
+#define DENTPERPAGE 118
+#define MINHASHSIZE 1
+static int guesstimate_hash_size(struct inode *inode)
+{
+   struct inode *hidden_inode;
+   int bindex;
+   int hashsize = MINHASHSIZE;
+
+   if (UNIONFS_I(inode)->hashsize > 0)
+   return UNIONFS_I(inode)->hashsize;
+
+   for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
+   if (!(hidden_inode = unionfs_lower_inode_idx(inode, bindex)))
+   continue;
+
+   if (hidden_inode->i_size == DENTPAGE)
+   hashsize += DENTPERONEPAGE;
+   else
+   hashsize += (hidden_inode->i_size / DENTPAGE) * 
DENTPERPAGE;
+   }
+
+   return hashsize;
+}
+
+int init_rdstate(struct file *file)
+{
+   BUG_ON(sizeof(loff_t) != (sizeof(unsigned int) + sizeof(unsigned int)));
+   BUG_ON(UNIONFS_F(file)->rdstate != NULL);
+
+   UNIONFS_F(file)->rdstate = alloc_rdstate(file->f_dentry->d_inode,
+fbstart(file));
+   
+   return (UNIONFS_F(file)->rdstate ? 0 : -ENOMEM);
+}
+
+struct unionfs_dir_state *find_rdstate(struct inode *inode, loff_t fpos)
+{
+   struct unionfs_dir_state *rdstate = NULL;
+   struct list_head *pos;
+
+   spin_lock(&UNIONFS_I(inode)->rdlock);
+   list_for_each(pos, &UNIONFS_I(inode)->readdircache) {
+   struct unionfs_dir_state *r =
+   list_entry(pos, struct unionfs_dir_state, cache);
+   if (fpos == rdstate2offset(r)) {
+   UNIONFS_I(inode)->rdcount--;
+   list_del(&r->cache);
+   rdstate = r;
+   break;
+   }
+   }
+   spin_unlock(&UNIONFS_I(inode)->rdlock);
+   return rdstate;
+}
+
+struct unionfs_dir_state *alloc_rdstate(struct inode *inode, int bindex)
+{
+   int i = 0;
+   int hashsize;
+   int mallocsize = sizeof(struct unionfs_dir_state);
+   struct unionfs_dir_state *rdstate;
+
+   hashsize = guesstimate_hash_size(inode);
+   mallocsize += hashsize * sizeof(struct list_head);
+   /* Round it up to the next highest power of two. */
+   mallocsize--;
+   mallocsize |= mallocsize >> 1;
+   mallocsize |= mallocsize >> 2;
+   ma

[PATCH 04/35] fsstack: Fix up eCryptfs compilation

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

The fsstack tidy patch broke eCryptfs. This patch makes eCryptfs compile
again.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 fs/ecryptfs/inode.c |6 +++---
 fs/ecryptfs/main.c  |2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 3e2a786..d798d9f 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -589,9 +589,9 @@ ecryptfs_rename(struct inode *old_dir, s
lower_new_dir_dentry->d_inode, lower_new_dentry);
if (rc)
goto out_lock;
-   fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode);
+   fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode, NULL);
if (new_dir != old_dir)
-   fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode);
+   fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode, 
NULL);
 out_lock:
unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
dput(lower_new_dentry->d_parent);
@@ -878,7 +878,7 @@ static int ecryptfs_setattr(struct dentr
}
rc = notify_change(lower_dentry, ia);
 out:
-   fsstack_copy_attr_all(inode, lower_inode);
+   fsstack_copy_attr_all(inode, lower_inode, NULL);
return rc;
 }
 
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index 5982931..a4aee57 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -113,7 +113,7 @@ int ecryptfs_interpose(struct dentry *lo
d_add(dentry, inode);
else
d_instantiate(dentry, inode);
-   fsstack_copy_attr_all(inode, lower_inode);
+   fsstack_copy_attr_all(inode, lower_inode, NULL);
/* This size will be overwritten for real files w/ headers and
 * other metadata */
fsstack_copy_inode_size(inode, lower_inode);
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/35] Unionfs: Rename

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch provides rename functionality for Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/rename.c |  442 +++
 1 files changed, 442 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
new file mode 100644
index 000..f0c3461
--- /dev/null
+++ b/fs/unionfs/rename.c
@@ -0,0 +1,442 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
+struct inode *new_dir, struct dentry *new_dentry,
+int bindex, struct dentry **wh_old)
+{
+   int err = 0;
+   struct dentry *hidden_old_dentry;
+   struct dentry *hidden_new_dentry;
+   struct dentry *hidden_old_dir_dentry;
+   struct dentry *hidden_new_dir_dentry;
+   struct dentry *hidden_wh_dentry;
+   struct dentry *hidden_wh_dir_dentry;
+   char *wh_name = NULL;
+
+   hidden_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
+   hidden_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
+
+   if (!hidden_new_dentry) {
+   hidden_new_dentry =
+   create_parents(new_dentry->d_parent->d_inode, new_dentry, 
bindex);
+   if (IS_ERR(hidden_new_dentry)) {
+   printk(KERN_DEBUG "error creating directory tree for"
+ " rename, bindex = %d, err = %ld\n",
+ bindex, PTR_ERR(hidden_new_dentry));
+   err = PTR_ERR(hidden_new_dentry);
+   goto out;
+   }
+   }
+
+   wh_name = alloc_whname(new_dentry->d_name.name, new_dentry->d_name.len);
+   if (IS_ERR(wh_name)) {
+   err = PTR_ERR(wh_name);
+   goto out;
+   }
+
+   hidden_wh_dentry = lookup_one_len(wh_name, hidden_new_dentry->d_parent,
+   new_dentry->d_name.len + UNIONFS_WHLEN);
+   if (IS_ERR(hidden_wh_dentry)) {
+   err = PTR_ERR(hidden_wh_dentry);
+   goto out;
+   }
+
+   if (hidden_wh_dentry->d_inode) {
+   /* get rid of the whiteout that is existing */
+   if (hidden_new_dentry->d_inode) {
+   printk(KERN_WARNING "Both a whiteout and a dentry"
+   " exist when doing a rename!\n");
+   err = -EIO;
+
+   dput(hidden_wh_dentry);
+   goto out;
+   }
+
+   hidden_wh_dir_dentry = lock_parent(hidden_wh_dentry);
+   if (!(err = is_robranch_super(old_dentry->d_sb, bindex)))
+   err = vfs_unlink(hidden_wh_dir_dentry->d_inode,
+  hidden_wh_dentry);
+
+   dput(hidden_wh_dentry);
+   unlock_dir(hidden_wh_dir_dentry);
+   if (err)
+   goto out;
+   } else
+   dput(hidden_wh_dentry);
+
+   dget(hidden_old_dentry);
+   hidden_old_dir_dentry = dget_parent(hidden_old_dentry);
+   hidden_new_dir_dentry = dget_parent(hidden_new_dentry);
+
+   lock_rename(hidden_old_dir_dentry, hidden_new_dir_dentry);
+
+   err = is_robranch_super(old_dentry->d_sb, bindex);
+   if (err)
+   goto out_unlock;
+
+   /* ready to whiteout for old_dentry. caller will create the actual
+* whiteout, and must dput(*wh_old) */
+   if (wh_old) {
+   char *whname;
+   whname = alloc_whname(old_dentry->d_name.name,
+ old_dentry->d_name.len);
+   err = PTR_ERR(whname);
+   if (IS_ERR(whname))
+   goto out_unlock;
+   *wh_old = lookup_one_len(whname, hidden_old_dir_dentry,
+old_dentry->d_name.len + 
UNIONFS_WHLEN);
+   kfree(whname);
+   err = PTR_ERR(*wh_old);
+   if (IS_ERR(*wh_old)) {
+  

[PATCH 27/35] Unionfs: Handling of stale inodes

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Provides nicer handling of stale inodes.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/stale_inode.c |  114 ++
 1 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/stale_inode.c b/fs/unionfs/stale_inode.c
new file mode 100644
index 000..4255dfd
--- /dev/null
+++ b/fs/unionfs/stale_inode.c
@@ -0,0 +1,114 @@
+/*
+ *  Adpated from linux/fs/bad_inode.c
+ *
+ *  Copyright (C) 1997, Stephen Tweedie
+ *
+ *  Provide stub functions for "stale" inodes, a bit friendlier than the
+ *  -EIO that bad_inode.c does.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+static struct address_space_operations unionfs_stale_aops;
+
+/* declarations for "sparse */
+extern struct inode_operations stale_inode_ops;
+
+/*
+ * The follow_link operation is special: it must behave as a no-op
+ * so that a stale root inode can at least be unmounted. To do this
+ * we must dput() the base and return the dentry with a dget().
+ */
+static void *stale_follow_link(struct dentry *dent, struct nameidata *nd)
+{
+   return ERR_PTR(vfs_follow_link(nd, ERR_PTR(-ESTALE)));
+}
+
+static int return_ESTALE(void)
+{
+   return -ESTALE;
+}
+
+#define ESTALE_ERROR ((void *) (return_ESTALE))
+
+static struct file_operations stale_file_ops = {
+   .llseek = ESTALE_ERROR,
+   .read = ESTALE_ERROR,
+   .write = ESTALE_ERROR,
+   .readdir = ESTALE_ERROR,
+   .poll = ESTALE_ERROR,
+   .ioctl = ESTALE_ERROR,
+   .mmap = ESTALE_ERROR,
+   .open = ESTALE_ERROR,
+   .flush = ESTALE_ERROR,
+   .release = ESTALE_ERROR,
+   .fsync = ESTALE_ERROR,
+   .fasync = ESTALE_ERROR,
+   .lock = ESTALE_ERROR,
+};
+
+struct inode_operations stale_inode_ops = {
+   .create = ESTALE_ERROR,
+   .lookup = ESTALE_ERROR,
+   .link = ESTALE_ERROR,
+   .unlink = ESTALE_ERROR,
+   .symlink = ESTALE_ERROR,
+   .mkdir = ESTALE_ERROR,
+   .rmdir = ESTALE_ERROR,
+   .mknod = ESTALE_ERROR,
+   .rename = ESTALE_ERROR,
+   .readlink = ESTALE_ERROR,
+   .follow_link = stale_follow_link,
+   .truncate = ESTALE_ERROR,
+   .permission = ESTALE_ERROR,
+};
+
+/*
+ * When a filesystem is unable to read an inode due to an I/O error in
+ * its read_inode() function, it can call make_stale_inode() to return a
+ * set of stubs which will return ESTALE errors as required.
+ *
+ * We only need to do limited initialisation: all other fields are
+ * preinitialised to zero automatically.
+ */
+
+/**
+ * make_stale_inode - mark an inode stale due to an I/O error
+ * @inode: Inode to mark stale
+ *
+ * When an inode cannot be read due to a media or remote network
+ * failure this function makes the inode "stale" and causes I/O operations
+ * on it to fail from this point on.
+ */
+
+void make_stale_inode(struct inode *inode)
+{
+   inode->i_mode = S_IFREG;
+   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   inode->i_op = &stale_inode_ops;
+   inode->i_fop = &stale_file_ops;
+   inode->i_mapping->a_ops = &unionfs_stale_aops;
+}
+
+/*
+ * This tests whether an inode has been flagged as stale. The test uses
+ * &stale_inode_ops to cover the case of invalidated inodes as well as
+ * those created by make_stale_inode() above.
+ */
+
+/**
+ * is_stale_inode - is an inode errored
+ * @inode: inode to test
+ *
+ * Returns true if the inode in question has been marked as stale.
+ */
+
+int is_stale_inode(struct inode *inode)
+{
+   return (inode->i_op == &stale_inode_ops);
+}
+
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/35] fsstack: Fix up ecryptfs's fsstack usage

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Fix up a stray ecryptfs_copy_attr_all call and remove prototypes for
ecryptfs_copy_* as they no longer exist.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 fs/ecryptfs/dentry.c  |2 +-
 fs/ecryptfs/ecryptfs_kernel.h |4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index 52d1e36..b0352d8 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -61,7 +61,7 @@ static int ecryptfs_d_revalidate(struct
struct inode *lower_inode =
ecryptfs_inode_to_lower(dentry->d_inode);
 
-   ecryptfs_copy_attr_all(dentry->d_inode, lower_inode);
+   fsstack_copy_attr_all(dentry->d_inode, lower_inode, NULL);
}
 out:
return rc;
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index 870a65b..afb64bd 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -413,9 +414,6 @@ int ecryptfs_encode_filename(struct ecry
 const char *name, int length,
 char **encoded_name);
 struct dentry *ecryptfs_lower_dentry(struct dentry *this_dentry);
-void ecryptfs_copy_attr_atime(struct inode *dest, const struct inode *src);
-void ecryptfs_copy_attr_all(struct inode *dest, const struct inode *src);
-void ecryptfs_copy_inode_size(struct inode *dst, const struct inode *src);
 void ecryptfs_dump_hex(char *data, int bytes);
 int virt_to_scatterlist(const void *addr, int size, struct scatterlist *sg,
int sg_size);
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 34/35] Unionfs: Kconfig and Makefile

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains the changes to fs Kconfig file, Makefiles, and Maintainers
file for Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 MAINTAINERS |7 +++
 fs/Kconfig  |   10 ++
 fs/Makefile |1 +
 fs/unionfs/Makefile |5 +
 4 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8385a69..7d9ebb0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3051,6 +3051,13 @@ L:   linux-kernel@vger.kernel.org
 W: http://www.kernel.dk
 S: Maintained
 
+UNIONFS
+P: Josef "Jeff" Sipek
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
+W: http://www.unionfs.org
+S: Maintained
+
 USB ACM DRIVER
 P: Oliver Neukum
 M: [EMAIL PROTECTED]
diff --git a/fs/Kconfig b/fs/Kconfig
index b3b5aa0..4b31ea4 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1557,6 +1557,16 @@ config UFS_DEBUG
  Y here.  This will result in _many_ additional debugging messages to 
be
  written to the system log.
 
+config UNION_FS
+   tristate "Union file system (EXPERIMENTAL)"
+   depends on EXPERIMENTAL
+   help
+ Unionfs is a stackable unification file system, which appears to
+ merge the contents of several directories (branches), while keeping
+ their physical content separate.
+
+ See  for details
+
 endmenu
 
 menu "Network File Systems"
diff --git a/fs/Makefile b/fs/Makefile
index b9ffa63..76c6acc 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -115,3 +115,4 @@ obj-$(CONFIG_HPPFS) += hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
 obj-$(CONFIG_OCFS2_FS) += ocfs2/
 obj-$(CONFIG_GFS2_FS)   += gfs2/
+obj-$(CONFIG_UNION_FS) += unionfs/
diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
new file mode 100644
index 000..25dd78f
--- /dev/null
+++ b/fs/unionfs/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_UNION_FS) += unionfs.o
+
+unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
+   stale_inode.o branchman.o rdstate.o copyup.o dirhelper.o \
+   rename.o unlink.o lookup.o commonfops.o dirfops.o sioq.o
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 28/35] Unionfs: Miscellaneous helper functions

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains miscellaneous helper functions used thoughout Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/subr.c |  170 +
 1 files changed, 170 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
new file mode 100644
index 000..f002f19
--- /dev/null
+++ b/fs/unionfs/subr.c
@@ -0,0 +1,170 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* Pass an unionfs dentry and an index.  It will try to create a whiteout
+ * for the filename in dentry, and will try in branch 'index'.  On error,
+ * it will proceed to a branch to the left.
+ */
+int create_whiteout(struct dentry *dentry, int start)
+{
+   int bstart, bend, bindex;
+   struct dentry *hidden_dir_dentry;
+   struct dentry *hidden_dentry;
+   struct dentry *hidden_wh_dentry;
+   char *name = NULL;
+   int err = -EINVAL;
+
+   verify_locked(dentry);
+
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+
+   /* create dentry's whiteout equivalent */
+   name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
+   if (IS_ERR(name)) {
+   err = PTR_ERR(name);
+   goto out;
+   }
+
+   for (bindex = start; bindex >= 0; bindex--) {
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+
+   if (!hidden_dentry) {
+   /* if hidden dentry is not present, create the entire
+* hidden dentry directory structure and go ahead.
+* Since we want to just create whiteout, we only want
+* the parent dentry, and hence get rid of this dentry.
+*/
+   hidden_dentry = create_parents(dentry->d_inode,
+  dentry, bindex);
+   if (!hidden_dentry || IS_ERR(hidden_dentry)) {
+   printk(KERN_DEBUG "create_parents failed for "
+   "bindex = %d\n", bindex);
+   continue;
+   }
+   }
+
+   hidden_wh_dentry = lookup_one_len(name, hidden_dentry->d_parent,
+   dentry->d_name.len + UNIONFS_WHLEN);
+   if (IS_ERR(hidden_wh_dentry))
+   continue;
+
+   /* The whiteout already exists. This used to be impossible, but
+* now is possible because of opaqueness. */
+   if (hidden_wh_dentry->d_inode) {
+   dput(hidden_wh_dentry);
+   err = 0;
+   goto out;
+   }
+
+   hidden_dir_dentry = lock_parent(hidden_wh_dentry);
+   if (!(err = is_robranch_super(dentry->d_sb, bindex))) {
+   err = vfs_create(hidden_dir_dentry->d_inode,
+  hidden_wh_dentry,
+  ~current->fs->umask & S_IRWXUGO, NULL);
+
+   }
+   unlock_dir(hidden_dir_dentry);
+   dput(hidden_wh_dentry);
+
+   if (!err || !IS_COPYUP_ERR(err))
+   break;
+   }
+
+   /* set dbopaque  so that lookup will not proceed after this branch */
+   if (!err)
+   set_dbopaque(dentry, bindex);
+
+out:
+   kfree(name);
+   return err;
+}
+
+/* This is a helper function for rename, which ends up with hosed over dentries
+ * when it needs to revert. */
+int unionfs_refresh_hidden_dentry(struct dentry *dentry, int bindex)
+{
+   struct dentry *hidden_dentry;
+   struct dentry *hidden_parent;
+   int err = 0;
+
+   verify_locked(dentry);
+
+   lock_dentry(dentry->d_parent);
+   hidden_parent = unionfs_lower_dentry_idx(dentry->d_parent, bindex);
+   unlock_dentry(dentry->d_parent);
+
+   BUG_ON(!S_ISDIR(hidden_parent->d_inode->i_mode));
+
+   hidden_dentry = lookup_one_len(dentry->d_name.name, hidden_pare

[PATCH 14/35] Unionfs: Branch management functionality

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains the ioctls to increase the union generation and to query
which branch a file exists on.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/branchman.c |   81 
 1 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/branchman.c b/fs/unionfs/branchman.c
new file mode 100644
index 000..168c5d5
--- /dev/null
+++ b/fs/unionfs/branchman.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* increase the superblock generation count; effectively invalidating every
+ * upper inode, dentry and file object */
+int unionfs_ioctl_incgen(struct file *file, unsigned int cmd, unsigned long 
arg)
+{
+   struct super_block *sb;
+   int gen;
+
+   sb = file->f_dentry->d_sb;
+
+   unionfs_write_lock(sb);
+
+   gen = atomic_inc_return(&UNIONFS_SB(sb)->generation);
+
+   atomic_set(&UNIONFS_D(sb->s_root)->generation, gen);
+   atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, gen);
+
+   unionfs_write_unlock(sb);
+
+   return gen;
+}
+
+/* return to userspace the branch indices containing the file in question
+ *
+ * We use fd_set and therefore we are limited to the number of the branches
+ * to FD_SETSIZE, which is currently 1024 - plenty for most people
+ */
+int unionfs_ioctl_queryfile(struct file *file, unsigned int cmd,
+   unsigned long arg)
+{
+   int err = 0;
+   fd_set branchlist;
+
+   int bstart = 0, bend = 0, bindex = 0;
+   struct dentry *dentry, *hidden_dentry;
+
+   dentry = file->f_dentry;
+   lock_dentry(dentry);
+   if ((err = unionfs_partial_lookup(dentry)))
+   goto out;
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+
+   FD_ZERO(&branchlist);
+
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (!hidden_dentry)
+   continue;
+   if (hidden_dentry->d_inode)
+   FD_SET(bindex, &branchlist);
+   }
+
+   err = copy_to_user((void __user *)arg, &branchlist, sizeof(fd_set));
+   if (err)
+   err = -EFAULT;
+
+out:
+   unlock_dentry(dentry);
+   return err < 0 ? err : bend;
+}
+
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 31/35] Unionfs: Internal include file

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains an internal Unionfs include file. The include file is
specific to kernel code only, and therefore is separate from
include/linux/unionfs.h.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/union.h |  479 
 1 files changed, 479 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
new file mode 100644
index 000..ff0b814
--- /dev/null
+++ b/fs/unionfs/union.h
@@ -0,0 +1,479 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef Sipek
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _UNION_H_
+#define _UNION_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+/* the file system name */
+#define UNIONFS_NAME "unionfs"
+
+/* unionfs file systems superblock magic */
+#define UNIONFS_SUPER_MAGIC 0xf15f083d
+
+/* unionfs root inode number */
+#define UNIONFS_ROOT_INO 1
+
+/* Mount time flags */
+#define MOUNT_FLAG(sb) (UNIONFS_SB(sb)->mount_flag)
+
+/* number of characters while generating unique temporary file names */
+#defineUNIONFS_TMPNAM_LEN  12
+
+/* number of times we try to get a unique temporary file name */
+#define GET_TMPNAM_MAX_RETRY   5
+
+/* Operations vectors defined in specific files. */
+extern struct file_operations unionfs_main_fops;
+extern struct file_operations unionfs_dir_fops;
+extern struct inode_operations unionfs_main_iops;
+extern struct inode_operations unionfs_dir_iops;
+extern struct inode_operations unionfs_symlink_iops;
+extern struct super_operations unionfs_sops;
+extern struct dentry_operations unionfs_dops;
+
+/* How long should an entry be allowed to persist */
+#define RDCACHE_JIFFIES 5*HZ
+
+/* file private data. */
+struct unionfs_file_info {
+   int bstart;
+   int bend;
+   atomic_t generation;
+
+   struct unionfs_dir_state *rdstate;
+   struct file **lower_files;
+};
+
+/* unionfs inode data in memory */
+struct unionfs_inode_info {
+   int bstart;
+   int bend;
+   atomic_t generation;
+   int stale;
+   /* Stuff for readdir over NFS. */
+   spinlock_t rdlock;
+   struct list_head readdircache;
+   int rdcount;
+   int hashsize;
+   int cookie;
+   
+   /* The hidden inodes */
+   struct inode **lower_inodes;
+   /* to keep track of reads/writes for unlinks before closes */
+   atomic_t totalopens;
+
+   struct inode vfs_inode;
+};
+
+/* unionfs dentry data in memory */
+struct unionfs_dentry_info {
+   /* The semaphore is used to lock the dentry as soon as we get into a
+* unionfs function from the VFS.  Our lock ordering is that children
+* go before their parents. */
+   struct semaphore sem;
+   int bstart;
+   int bend;
+   int bopaque;
+   int bcount;
+   atomic_t generation;
+   struct path *lower_paths;
+};
+
+/* These are the pointers to our various objects. */
+struct unionfs_data {
+   struct super_block *sb;
+   struct vfsmount *hidden_mnt;
+   atomic_t sbcount;
+   int branchperms;
+};
+
+/* unionfs super-block data in memory */
+struct unionfs_sb_info {
+   int bend;
+
+   atomic_t generation;
+   unsigned long mount_flag;
+   struct rw_semaphore rwsem;
+
+   struct unionfs_data *data;
+};
+
+/*
+ * structure for making the linked list of entries by readdir on left branch
+ * to compare with entries on right branch
+ */
+struct filldir_node {
+   struct list_head file_list; /* list for directory entries */
+   char *name; /* name entry */
+   int hash;   /* name hash */
+   int namelen;/* name len since name is not 0 terminated */
+   int bindex; /* we can check for duplicate whiteouts and 
files in the same branch in order to return -EIO. */
+   int whiteout;   /* is this a whiteout entry? */
+   char iname[DNAME_INLINE_LEN_MIN];   /* Inline name, so we don't 
need to separately kmalloc small ones */
+};
+
+/*

Unionfs: Stackable namespace unification filesystem

2006-12-04 Thread Josef 'Jeff' Sipek
The following patches are in a git repo at:

git://git.kernel.org/pub/scm/linux/kernel/git/jsipek/unionfs.git

(master.kernel.org:/pub/scm/linux/kernel/git/jsipek/unionfs.git)

The repository contains the following 35 commits (also available as patches
in replies to this email).

Commits 1..9 (already in -mm):
These patches are already in Andrew Morton's -mm tree.

fsstack: Introduce fsstack_copy_{attr,inode}_*
fsstack: Remove unneeded wrapper
eCryptfs: Use fsstack's generic copy inode attr functions
fsstack: Fix up eCryptfs compilation
struct path: Rename Reiserfs's struct path
struct path: Rename DM's struct path
struct path: Move struct path from fs/namei.c into include/linux
struct path: make eCryptfs a user of struct path
fs/stack.c should #include 

Commits 10..11 (additional fixes to the above)
These patches are not yet in -mm, and they fix two things the above
patches missed.

fsstack: Make fsstack_copy_attr_all copy inode size
fsstack: Fix up ecryptfs's fsstack usage

Commits 12..35 (Unionfs):
These patches make up stripped down version of Unionfs. They depend
on fsstack (see above patches).

Unionfs: Documentation
lookup_one_len_nd - lookup_one_len with nameidata argument
Unionfs: Branch management functionality
Unionfs: Common file operations
Unionfs: Copyup Functionality
Unionfs: Dentry operations
Unionfs: File operations
Unionfs: Directory file operations
Unionfs: Directory manipulation helper functions
Unionfs: Inode operations
Unionfs: Lookup helper functions
Unionfs: Main module functions
Unionfs: Readdir state
Unionfs: Rename
Unionfs: Privileged operations workqueue
Unionfs: Handling of stale inodes
Unionfs: Miscellaneous helper functions
Unionfs: Superblock operations
Unionfs: Helper macros/inlines
Unionfs: Internal include file
Unionfs: Include file
Unionfs: Unlink
Unionfs: Kconfig and Makefile
Unionfs: Extended Attributes support


As always, comments are welcomed.

Thanks,

Josef "Jeff" Sipek.
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/35] eCryptfs: Use fsstack's generic copy inode attr functions

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Replace eCryptfs specific code & calls with the more generic fsstack
equivalents and remove the eCryptfs specific functions.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Cc: Michael Halcrow <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 fs/ecryptfs/file.c  |3 +-
 fs/ecryptfs/inode.c |   75 +--
 fs/ecryptfs/main.c  |5 ++-
 3 files changed, 24 insertions(+), 59 deletions(-)

diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index a92ef05..a961a0c 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ecryptfs_kernel.h"
 
 /**
@@ -192,7 +193,7 @@ retry:
goto retry;
file->f_pos = lower_file->f_pos;
if (rc >= 0)
-   ecryptfs_copy_attr_atime(inode, lower_file->f_dentry->d_inode);
+   fsstack_copy_attr_atime(inode, lower_file->f_dentry->d_inode);
return rc;
 }
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index dfcc684..3e2a786 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ecryptfs_kernel.h"
 
 static struct dentry *lock_parent(struct dentry *dentry)
@@ -53,48 +54,6 @@ static void unlock_dir(struct dentry *di
dput(dir);
 }
 
-void ecryptfs_copy_inode_size(struct inode *dst, const struct inode *src)
-{
-   i_size_write(dst, i_size_read((struct inode *)src));
-   dst->i_blocks = src->i_blocks;
-}
-
-void ecryptfs_copy_attr_atime(struct inode *dest, const struct inode *src)
-{
-   dest->i_atime = src->i_atime;
-}
-
-static void ecryptfs_copy_attr_times(struct inode *dest,
-const struct inode *src)
-{
-   dest->i_atime = src->i_atime;
-   dest->i_mtime = src->i_mtime;
-   dest->i_ctime = src->i_ctime;
-}
-
-static void ecryptfs_copy_attr_timesizes(struct inode *dest,
-const struct inode *src)
-{
-   dest->i_atime = src->i_atime;
-   dest->i_mtime = src->i_mtime;
-   dest->i_ctime = src->i_ctime;
-   ecryptfs_copy_inode_size(dest, src);
-}
-
-void ecryptfs_copy_attr_all(struct inode *dest, const struct inode *src)
-{
-   dest->i_mode = src->i_mode;
-   dest->i_nlink = src->i_nlink;
-   dest->i_uid = src->i_uid;
-   dest->i_gid = src->i_gid;
-   dest->i_rdev = src->i_rdev;
-   dest->i_atime = src->i_atime;
-   dest->i_mtime = src->i_mtime;
-   dest->i_ctime = src->i_ctime;
-   dest->i_blkbits = src->i_blkbits;
-   dest->i_flags = src->i_flags;
-}
-
 /**
  * ecryptfs_create_underlying_file
  * @lower_dir_inode: inode of the parent in the lower fs of the new file
@@ -171,8 +130,8 @@ ecryptfs_do_create(struct inode *directo
ecryptfs_printk(KERN_ERR, "Failure in ecryptfs_interpose\n");
goto out_lock;
}
-   ecryptfs_copy_attr_timesizes(directory_inode,
-lower_dir_dentry->d_inode);
+   fsstack_copy_attr_times(directory_inode, lower_dir_dentry->d_inode);
+   fsstack_copy_inode_size(directory_inode, lower_dir_dentry->d_inode);
 out_lock:
unlock_dir(lower_dir_dentry);
 out:
@@ -365,7 +324,7 @@ static struct dentry *ecryptfs_lookup(st
"d_name.name = [%s]\n", lower_dentry,
lower_dentry->d_name.name);
lower_inode = lower_dentry->d_inode;
-   ecryptfs_copy_attr_atime(dir, lower_dir_dentry->d_inode);
+   fsstack_copy_attr_atime(dir, lower_dir_dentry->d_inode);
BUG_ON(!atomic_read(&lower_dentry->d_count));
ecryptfs_set_dentry_private(dentry,
kmem_cache_alloc(ecryptfs_dentry_info_cache,
@@ -462,7 +421,8 @@ static int ecryptfs_link(struct dentry *
rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb, 0);
if (rc)
goto out_lock;
-   ecryptfs_copy_attr_timesizes(dir, lower_new_dentry->d_inode);
+   fsstack_copy_attr_times(dir, lower_new_dentry->d_inode);
+   fsstack_copy_inode_size(dir, lower_new_dentry->d_inode);
old_dentry->d_inode->i_nlink =
ecryptfs_inode_to_lower(old_dentry->d_inode)->i_nlink;
i_size_write(new_dentry->d_inode, file_size_save);
@@ -488,7 +448,7 @@ static int ecryptfs_unlink(struct inode
printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
goto out_unlock;
}
-   ecryptfs_copy_attr_times(dir, lower_dir_inode);
+   fsstack_copy_attr_times(dir, lower_dir_inode);
dentry->d_inode->i_nlink =
ecryptfs_inode_to_lower(dentry->d_inode)->i_nlink;
dentry->d_inode->i_ctime = dir->i_ctime;
@@ -527,7 +487,8 @@ static int ecryptfs_symlink(struct inode
rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb, 0);
if (rc)

[PATCH 33/35] Unionfs: Unlink

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch provides unlink functionality for Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/unlink.c |  162 +++
 1 files changed, 162 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
new file mode 100644
index 000..844835f
--- /dev/null
+++ b/fs/unionfs/unlink.c
@@ -0,0 +1,162 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* unlink a file by creating a whiteout */
+static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
+{
+   struct dentry *hidden_dentry;
+   struct dentry *hidden_dir_dentry;
+   int bindex;
+   int err = 0;
+
+   if ((err = unionfs_partial_lookup(dentry)))
+   goto out;
+
+   bindex = dbstart(dentry);
+
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (!hidden_dentry)
+   goto out;
+
+   hidden_dir_dentry = lock_parent(hidden_dentry);
+
+   /* avoid destroying the hidden inode if the file is in use */
+   dget(hidden_dentry);
+   if (!(err = is_robranch_super(dentry->d_sb, bindex)))
+   err = vfs_unlink(hidden_dir_dentry->d_inode, hidden_dentry);
+   dput(hidden_dentry);
+   fsstack_copy_attr_times(dir, hidden_dir_dentry->d_inode);
+   unlock_dir(hidden_dir_dentry);
+
+   if (err && !IS_COPYUP_ERR(err))
+   goto out;
+
+   if (err) {
+   if (dbstart(dentry) == 0)
+   goto out;
+
+   err = create_whiteout(dentry, dbstart(dentry) - 1);
+   } else if (dbopaque(dentry) != -1)
+   /* There is a hidden lower-priority file with the same name. */
+   err = create_whiteout(dentry, dbopaque(dentry));
+   else
+   err = create_whiteout(dentry, dbstart(dentry));
+
+out:
+   if (!err)
+   dentry->d_inode->i_nlink--;
+
+   /* We don't want to leave negative leftover dentries for revalidate. */
+   if (!err && (dbopaque(dentry) != -1))
+   update_bstart(dentry);
+
+   return err;
+}
+
+int unionfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+   int err = 0;
+
+   lock_dentry(dentry);
+
+   err = unionfs_unlink_whiteout(dir, dentry);
+   /* call d_drop so the system "forgets" about us */
+   if (!err)
+   d_drop(dentry);
+
+   unlock_dentry(dentry);
+   return err;
+}
+
+static int unionfs_rmdir_first(struct inode *dir, struct dentry *dentry,
+  struct unionfs_dir_state *namelist)
+{
+   int err;
+   struct dentry *hidden_dentry;
+   struct dentry *hidden_dir_dentry = NULL;
+
+   /* Here we need to remove whiteout entries. */
+   err = delete_whiteouts(dentry, dbstart(dentry), namelist);
+   if (err)
+   goto out;
+
+   hidden_dentry = unionfs_lower_dentry(dentry);
+
+   hidden_dir_dentry = lock_parent(hidden_dentry);
+
+   /* avoid destroying the hidden inode if the file is in use */
+   dget(hidden_dentry);
+   if (!(err = is_robranch(dentry)))
+   err = vfs_rmdir(hidden_dir_dentry->d_inode, hidden_dentry);
+   dput(hidden_dentry);
+
+   fsstack_copy_attr_times(dir, hidden_dir_dentry->d_inode);
+   /* propagate number of hard-links */
+   dentry->d_inode->i_nlink = unionfs_get_nlinks(dentry->d_inode);
+
+out:
+   if (hidden_dir_dentry)
+   unlock_dir(hidden_dir_dentry);
+   return err;
+}
+
+int unionfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   int err = 0;
+   struct unionfs_dir_state *namelist = NULL;
+
+   lock_dentry(dentry);
+
+   /* check if this unionfs directory is empty or not */
+   err = check_empty(dentry, &namelist);
+   if (err)
+   goto out;
+
+   err = unionfs_rmdir_first(dir, dentry, namelist);
+   /* create whiteout */
+   if (!err)
+   err = create_whiteout(dentry, dbstart(dentry));
+   else {
+   int new_err;
+
+   if (dbstart(dentry) == 0)
+   

[PATCH 07/35] struct path: Move struct path from fs/namei.c into include/linux

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Moved struct path from fs/namei.c to include/linux/namei.h. This allows many
places in the VFS, as well as any stackable filesystem to easily keep track
of dentry-vfsmount pairs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 fs/namei.c|5 -
 include/linux/namei.h |5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 28d49b3..8a7b923 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -570,11 +570,6 @@ fail:
return PTR_ERR(link);
 }
 
-struct path {
-   struct vfsmount *mnt;
-   struct dentry *dentry;
-};
-
 static inline void dput_path(struct path *path, struct nameidata *nd)
 {
dput(path->dentry);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index f5f1960..d39a5a6 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -29,6 +29,11 @@ struct nameidata {
} intent;
 };
 
+struct path {
+   struct vfsmount *mnt;
+   struct dentry *dentry;
+};
+
 /*
  * Type of the last component on LOOKUP_PARENT
  */
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/35] Unionfs: Common file operations

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains helper functions used through the rest of the code which
pertains to files.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/commonfops.c |  587 +++
 1 files changed, 587 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
new file mode 100644
index 000..92f9bbc
--- /dev/null
+++ b/fs/unionfs/commonfops.c
@@ -0,0 +1,587 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* 1) Copyup the file
+ * 2) Rename the file to '.unionfs' - obviously
+ * stolen from NFS's silly rename
+ */
+static int copyup_deleted_file(struct file *file, struct dentry *dentry,
+  int bstart, int bindex)
+{
+   static unsigned int counter;
+   const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2;
+   const int countersize = sizeof(counter) * 2;
+   const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1;
+   char name[nlen + 1];
+
+   int err;
+   struct dentry *tmp_dentry = NULL;
+   struct dentry *hidden_dentry = NULL;
+   struct dentry *hidden_dir_dentry = NULL;
+
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, bstart);
+
+   sprintf(name, ".unionfs%*.*lx",
+   i_inosize, i_inosize, hidden_dentry->d_inode->i_ino);
+
+   tmp_dentry = NULL;
+   do {
+   char *suffix = name + nlen - countersize;
+
+   dput(tmp_dentry);
+   counter++;
+   sprintf(suffix, "%*.*x", countersize, countersize, counter);
+
+   printk(KERN_DEBUG "unionfs: trying to rename %s to %s\n",
+   dentry->d_name.name, name);
+
+   tmp_dentry = lookup_one_len(name, hidden_dentry->d_parent,
+   UNIONFS_TMPNAM_LEN);
+   if (IS_ERR(tmp_dentry)) {
+   err = PTR_ERR(tmp_dentry);
+   goto out;
+   }
+   } while (tmp_dentry->d_inode != NULL);  /* need negative dentry */
+
+   err = copyup_named_file(dentry->d_parent->d_inode, file, name, bstart,
+   bindex, file->f_dentry->d_inode->i_size);
+   if (err)
+   goto out;
+
+   /* bring it to the same state as an unlinked file */
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
+   hidden_dir_dentry = lock_parent(hidden_dentry);
+   err = vfs_unlink(hidden_dir_dentry->d_inode, hidden_dentry);
+   unlock_dir(hidden_dir_dentry);
+
+out:
+   return err;
+}
+
+/* put all references held by upper struct file and free lower file pointer
+ * array */
+static void cleanup_file(struct file *file)
+{
+   int bindex, bstart, bend;
+   struct file **lf;
+
+   lf = UNIONFS_F(file)->lower_files;
+   bstart = fbstart(file);
+   bend = fbend(file);
+
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   if (unionfs_lower_file_idx(file, bindex)) {
+   branchput(file->f_dentry->d_sb, bindex);
+   fput(unionfs_lower_file_idx(file, bindex));
+   }
+   }
+
+   UNIONFS_F(file)->lower_files = NULL;
+   kfree(lf);
+}
+
+/* open all lower files for a given file */
+static int open_all_files(struct file *file)
+{
+   int bindex, bstart, bend, err = 0;
+   struct file *hidden_file;
+   struct dentry *hidden_dentry;
+   struct dentry *dentry = file->f_dentry;
+   struct super_block *sb = dentry->d_sb;
+
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (!hidden_dentry)
+   continue;
+
+   dget(hidden_dentry);
+   mntget(unionfs_lower_mnt_idx(dentry, bindex));
+   branchget(sb, bindex);
+
+   hidden_file = dentry_open(hidden_dentry,
+   unionfs_lower_mnt_idx(dentry, bindex),
+  

[PATCH 05/35] struct path: Rename Reiserfs's struct path

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Rename Reiserfs's struct path to struct treepath.

Cc: [EMAIL PROTECTED]
Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 fs/reiserfs/bitmap.c  |2 +-
 fs/reiserfs/fix_node.c|6 ++--
 fs/reiserfs/inode.c   |   12 +-
 fs/reiserfs/namei.c   |6 ++--
 fs/reiserfs/stree.c   |   42 +++---
 fs/reiserfs/tail_conversion.c |4 +-
 include/linux/reiserfs_fs.h   |   44 
 7 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/fs/reiserfs/bitmap.c b/fs/reiserfs/bitmap.c
index e3d466a..b286ccb 100644
--- a/fs/reiserfs/bitmap.c
+++ b/fs/reiserfs/bitmap.c
@@ -708,7 +708,7 @@ static void oid_groups(reiserfs_blocknr_
  */
 static int get_left_neighbor(reiserfs_blocknr_hint_t * hint)
 {
-   struct path *path;
+   struct treepath *path;
struct buffer_head *bh;
struct item_head *ih;
int pos_in_item;
diff --git a/fs/reiserfs/fix_node.c b/fs/reiserfs/fix_node.c
index 6d0e554..0ee35c6 100644
--- a/fs/reiserfs/fix_node.c
+++ b/fs/reiserfs/fix_node.c
@@ -957,7 +957,7 @@ static int get_far_parent(struct tree_ba
 {
struct buffer_head *p_s_parent;
INITIALIZE_PATH(s_path_to_neighbor_father);
-   struct path *p_s_path = p_s_tb->tb_path;
+   struct treepath *p_s_path = p_s_tb->tb_path;
struct cpu_key s_lr_father_key;
int n_counter,
n_position = INT_MAX,
@@ -1074,7 +1074,7 @@ static int get_far_parent(struct tree_ba
  */
 static int get_parents(struct tree_balance *p_s_tb, int n_h)
 {
-   struct path *p_s_path = p_s_tb->tb_path;
+   struct treepath *p_s_path = p_s_tb->tb_path;
int n_position,
n_ret_value,
n_path_offset = PATH_H_PATH_OFFSET(p_s_tb->tb_path, n_h);
@@ -1885,7 +1885,7 @@ static int check_balance(int mode,
 static int get_direct_parent(struct tree_balance *p_s_tb, int n_h)
 {
struct buffer_head *p_s_bh;
-   struct path *p_s_path = p_s_tb->tb_path;
+   struct treepath *p_s_path = p_s_tb->tb_path;
int n_position,
n_path_offset = PATH_H_PATH_OFFSET(p_s_tb->tb_path, n_h);
 
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 9c69bca..eda099e 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -207,7 +207,7 @@ static int file_capable(struct inode *in
 }
 
 /*static*/ int restart_transaction(struct reiserfs_transaction_handle *th,
-  struct inode *inode, struct path *path)
+  struct inode *inode, struct treepath *path)
 {
struct super_block *s = th->t_super;
int len = th->t_blocks_allocated;
@@ -569,7 +569,7 @@ static inline int _allocate_block(struct
  long block,
  struct inode *inode,
  b_blocknr_t * allocated_block_nr,
- struct path *path, int flags)
+ struct treepath *path, int flags)
 {
BUG_ON(!th->t_trans_id);
 
@@ -1109,7 +1109,7 @@ static inline ulong to_fake_used_blocks(
 //
 
 // called by read_locked_inode
-static void init_inode(struct inode *inode, struct path *path)
+static void init_inode(struct inode *inode, struct treepath *path)
 {
struct buffer_head *bh;
struct item_head *ih;
@@ -1286,7 +1286,7 @@ static void inode2sd_v1(void *sd, struct
 /* NOTE, you must prepare the buffer head before sending it here,
 ** and then log it after the call
 */
-static void update_stat_data(struct path *path, struct inode *inode,
+static void update_stat_data(struct treepath *path, struct inode *inode,
 loff_t size)
 {
struct buffer_head *bh;
@@ -1655,7 +1655,7 @@ int reiserfs_write_inode(struct inode *i
containing "." and ".." entries */
 static int reiserfs_new_directory(struct reiserfs_transaction_handle *th,
  struct inode *inode,
- struct item_head *ih, struct path *path,
+ struct item_head *ih, struct treepath *path,
  struct inode *dir)
 {
struct super_block *sb = th->t_super;
@@ -1714,7 +1714,7 @@ static int reiserfs_new_directory(struct
containing the body of symlink */
 static int reiserfs_new_symlink(struct reiserfs_transaction_handle *th, struct 
inode *inode,   /* Inode of symlink */
struct item_head *ih,
-   struct path *path, const char *symname,
+   struct treepath *path, const char *symname,
int item_len)
 {
struct super_block *sb = th->t_super;
diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
index abde1ed..23f5cd5 100644
--- a/fs/reiserfs/namei.c
+++ b/fs/reiserfs/namei.c
@@ -5

[PATCH 32/35] Unionfs: Include file

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Global include file - can be included from userspace by utilities.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 include/linux/union_fs.h |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
new file mode 100644
index 000..e76d3cf
--- /dev/null
+++ b/include/linux/union_fs.h
@@ -0,0 +1,20 @@
+#ifndef _LINUX_UNION_FS_H
+#define _LINUX_UNION_FS_H
+
+#define UNIONFS_VERSION  "2.0"
+/*
+ * DEFINITIONS FOR USER AND KERNEL CODE:
+ * (Note: ioctl numbers 1--9 are reserved for fistgen, the rest
+ *  are auto-generated automatically based on the user's .fist file.)
+ */
+# define UNIONFS_IOCTL_INCGEN  _IOR(0x15, 11, int)
+# define UNIONFS_IOCTL_QUERYFILE   _IOR(0x15, 15, int)
+
+/* We don't support normal remount, but unionctl uses it. */
+# define UNIONFS_REMOUNT_MAGIC 0x4a5a4380
+
+/* should be at least LAST_USED_UNIONFS_PERMISSION<<1 */
+#define MAY_NFSRO  16
+
+#endif /* _LINUX_UNIONFS_H */
+
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/35] Unionfs: Inode operations

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch provides the inode operations for Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/inode.c |  926 
 1 files changed, 926 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
new file mode 100644
index 000..c7806e9
--- /dev/null
+++ b/fs/unionfs/inode.c
@@ -0,0 +1,926 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* declarations added for "sparse" */
+extern struct dentry *unionfs_lookup(struct inode *, struct dentry *,
+struct nameidata *);
+extern int unionfs_readlink(struct dentry *dentry, char __user * buf,
+   int bufsiz);
+extern void unionfs_put_link(struct dentry *dentry, struct nameidata *nd,
+void *cookie);
+
+static int unionfs_create(struct inode *parent, struct dentry *dentry,
+ int mode, struct nameidata *nd)
+{
+   int err = 0;
+   struct dentry *hidden_dentry = NULL;
+   struct dentry *whiteout_dentry = NULL;
+   struct dentry *new_hidden_dentry;
+   struct dentry *hidden_parent_dentry = NULL;
+   int bindex = 0, bstart;
+   char *name = NULL;
+
+   lock_dentry(dentry);
+
+   /* We start out in the leftmost branch. */
+   bstart = dbstart(dentry);
+   hidden_dentry = unionfs_lower_dentry(dentry);
+
+   /* check if whiteout exists in this branch, i.e. lookup .wh.foo first */
+   name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
+   if (IS_ERR(name)) {
+   err = PTR_ERR(name);
+   goto out;
+   }
+
+   whiteout_dentry =
+   lookup_one_len(name, hidden_dentry->d_parent,
+  dentry->d_name.len + UNIONFS_WHLEN);
+   if (IS_ERR(whiteout_dentry)) {
+   err = PTR_ERR(whiteout_dentry);
+   whiteout_dentry = NULL;
+   goto out;
+   }
+
+   if (whiteout_dentry->d_inode) {
+   /* .wh.foo has been found. */
+   /* First truncate it and then rename it to foo (hence having
+* the same overall effect as a normal create.
+*/
+   struct dentry *hidden_dir_dentry;
+   struct iattr newattrs;
+
+   mutex_lock(&whiteout_dentry->d_inode->i_mutex);
+   newattrs.ia_valid = ATTR_CTIME | ATTR_MODE | ATTR_ATIME
+   | ATTR_MTIME | ATTR_UID | ATTR_GID | ATTR_FORCE
+   | ATTR_KILL_SUID | ATTR_KILL_SGID;
+
+   newattrs.ia_mode = mode & ~current->fs->umask;
+   newattrs.ia_uid = current->fsuid;
+   newattrs.ia_gid = current->fsgid;
+
+   if (whiteout_dentry->d_inode->i_size != 0) {
+   newattrs.ia_valid |= ATTR_SIZE;
+   newattrs.ia_size = 0;
+   }
+
+   err = notify_change(whiteout_dentry, &newattrs);
+
+   mutex_unlock(&whiteout_dentry->d_inode->i_mutex);
+
+   if (err)
+   printk(KERN_WARNING "unionfs: %s:%d: notify_change "
+   "failed: %d, ignoring..\n",
+   __FILE__, __LINE__, err);
+
+   new_hidden_dentry = unionfs_lower_dentry(dentry);
+   dget(new_hidden_dentry);
+
+   hidden_dir_dentry = dget_parent(whiteout_dentry);
+   lock_rename(hidden_dir_dentry, hidden_dir_dentry);
+
+   if (!(err = is_robranch_super(dentry->d_sb, bstart))) {
+   err =
+   vfs_rename(hidden_dir_dentry->d_inode,
+  whiteout_dentry,
+  hidden_dir_dentry->d_inode,
+  new_hidden_dentry);
+   }
+   if (!err) {
+   fsstack_copy_attr_times(parent,
+   new_hidden_dentry->d_parent->d_inode);
+   fsstack_copy_inode_size(parent,
+ 

[PATCH 01/35] fsstack: Introduce fsstack_copy_{attr,inode}_*

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Introduce several fsstack_copy_* functions which allow stackable filesystems
(such as eCryptfs and Unionfs) to easily copy over (currently only) inode
attributes.  This prevents code duplication and allows for code reuse.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Cc: Michael Halcrow <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 fs/Makefile  |3 ++-
 fs/stack.c   |   39 +++
 include/linux/fs_stack.h |   39 +++
 3 files changed, 80 insertions(+), 1 deletions(-)

diff --git a/fs/Makefile b/fs/Makefile
index 9a5ce93..b9ffa63 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -10,7 +10,8 @@ obj-y :=  open.o read_write.o file_table.
ioctl.o readdir.o select.o fifo.o locks.o dcache.o inode.o \
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
-   pnode.o drop_caches.o splice.o sync.o utimes.o
+   pnode.o drop_caches.o splice.o sync.o utimes.o \
+   stack.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
diff --git a/fs/stack.c b/fs/stack.c
new file mode 100644
index 000..5f6f12d
--- /dev/null
+++ b/fs/stack.c
@@ -0,0 +1,39 @@
+#include 
+#include 
+
+/* does _NOT_ require i_mutex to be held.
+ *
+ * This function cannot be inlined since i_size_{read,write} is rather
+ * heavy-weight on 32-bit systems
+ */
+void fsstack_copy_inode_size(struct inode *dst, const struct inode *src)
+{
+   i_size_write(dst, i_size_read((struct inode *)src));
+   dst->i_blocks = src->i_blocks;
+}
+
+/* copy all attributes; get_nlinks is optional way to override the i_nlink
+ * copying
+ */
+void __fsstack_copy_attr_all(struct inode *dest,
+const struct inode *src,
+int (*get_nlinks)(struct inode *))
+{
+   if (!get_nlinks)
+   dest->i_nlink = src->i_nlink;
+   else
+   dest->i_nlink = (*get_nlinks)(dest);
+
+   dest->i_mode = src->i_mode;
+   dest->i_uid = src->i_uid;
+   dest->i_gid = src->i_gid;
+   dest->i_rdev = src->i_rdev;
+   dest->i_atime = src->i_atime;
+   dest->i_mtime = src->i_mtime;
+   dest->i_ctime = src->i_ctime;
+   dest->i_blkbits = src->i_blkbits;
+   dest->i_flags = src->i_flags;
+}
+
+EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);
+EXPORT_SYMBOL_GPL(__fsstack_copy_attr_all);
diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
new file mode 100644
index 000..56b3e09
--- /dev/null
+++ b/include/linux/fs_stack.h
@@ -0,0 +1,39 @@
+#ifndef _LINUX_FS_STACK_H
+#define _LINUX_FS_STACK_H
+
+/* This file defines generic functions used primarily by stackable
+ * filesystems; none of these functions require i_mutex to be held.
+ */
+
+#include 
+
+/* externs for fs/stack.c */
+extern void __fsstack_copy_attr_all(struct inode *dest,
+   const struct inode *src,
+   int (*get_nlinks)(struct inode *));
+
+extern void fsstack_copy_inode_size(struct inode *dst, const struct inode 
*src);
+
+/* inlines */
+static inline void fsstack_copy_attr_atime(struct inode *dest,
+  const struct inode *src)
+{
+   dest->i_atime = src->i_atime;
+}
+
+static inline void fsstack_copy_attr_times(struct inode *dest,
+  const struct inode *src)
+{
+   dest->i_atime = src->i_atime;
+   dest->i_mtime = src->i_mtime;
+   dest->i_ctime = src->i_ctime;
+}
+
+static inline void fsstack_copy_attr_all(struct inode *dest,
+const struct inode *src)
+{
+   __fsstack_copy_attr_all(dest, src, NULL);
+}
+
+#endif /* _LINUX_FS_STACK_H */
+
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/35] Unionfs: Privileged operations workqueue

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Workqueue & helper functions used to perform privileged operations on
behalf of the user process.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/sioq.c |  115 +
 fs/unionfs/sioq.h |   79 
 2 files changed, 194 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
new file mode 100644
index 000..187ad87
--- /dev/null
+++ b/fs/unionfs/sioq.c
@@ -0,0 +1,115 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+struct workqueue_struct *sioq;
+
+int __init init_sioq(void)
+{
+   int err;
+
+   sioq = create_workqueue("unionfs_siod");
+   if (!IS_ERR(sioq))
+   return 0;
+
+   err = PTR_ERR(sioq);
+   printk(KERN_ERR "create_workqueue failed %d\n", err);
+   sioq = NULL;
+   return err;
+}
+
+void fin_sioq(void)
+{
+   if (sioq)
+   destroy_workqueue(sioq);
+}
+
+void run_sioq(void (*func)(void *arg), struct sioq_args *args)
+{
+   DECLARE_WORK(wk, func, &args->comp);
+
+   init_completion(&args->comp);
+   while (!queue_work(sioq, &wk)) {
+   /* TODO: do accounting if needed */
+   schedule();
+   }
+   wait_for_completion(&args->comp);
+}
+
+void __unionfs_create(void *data)
+{
+   struct sioq_args *args = data;
+   struct create_args *c = &args->create;
+
+   args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
+   complete(&args->comp);
+}
+
+void __unionfs_mkdir(void *data)
+{
+   struct sioq_args *args = data;
+   struct mkdir_args *m = &args->mkdir;
+
+   args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
+   complete(&args->comp);
+}
+
+void __unionfs_mknod(void *data)
+{
+   struct sioq_args *args = data;
+   struct mknod_args *m = &args->mknod;
+
+   args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
+   complete(&args->comp);
+}
+void __unionfs_symlink(void *data)
+{
+   struct sioq_args *args = data;
+   struct symlink_args *s = &args->symlink;
+
+   args->err = vfs_symlink(s->parent, s->dentry, s->symbuf, s->mode);
+   complete(&args->comp);
+}
+
+void __unionfs_unlink(void *data)
+{
+   struct sioq_args *args = data;
+   struct unlink_args *u = &args->unlink;
+
+   args->err = vfs_unlink(u->parent, u->dentry);
+   complete(&args->comp);
+}
+
+void __delete_whiteouts(void *data) {
+   struct sioq_args *args = data;
+   struct deletewh_args *d = &args->deletewh;
+
+   args->err = do_delete_whiteouts(d->dentry, d->bindex, d->namelist);
+   complete(&args->comp);
+}
+
+void __is_opaque_dir(void *data)
+{
+   struct sioq_args *args = data;
+
+   args->ret = lookup_one_len(UNIONFS_DIR_OPAQUE, args->isopaque.dentry,
+   sizeof(UNIONFS_DIR_OPAQUE) - 1);
+   complete(&args->comp);
+}
+
diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
new file mode 100644
index 000..628d214
--- /dev/null
+++ b/fs/unionfs/sioq.h
@@ -0,0 +1,79 @@
+#ifndef _SIOQ_H
+#define _SIOQ_H
+
+struct deletewh_args {
+   struct unionfs_dir_state *namelist;
+   struct dentry *dentry;
+   int bindex;
+};
+
+struct isopaque_args {
+   struct dentry *dentry;
+};
+
+struct create_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   umode_t mode;
+   struct nameidata *nd;
+};
+
+struct mkdir_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   umode_t mode;
+};
+
+struct mknod_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   umode_t mode;
+   dev_t dev;
+};
+
+struct symlink_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   char *symbuf;
+   umode_t mode;
+};
+
+struct unlink_args {
+   struct inode *parent;
+   struct dentry *dentry;
+};
+
+
+struct sioq_args {
+
+   struct completion comp;
+   int err;
+   void *ret;
+
+   union {
+   struct deletewh_args deletewh;
+   struct isopaque_args isopaque;
+   struct c

[PATCH 23/35] Unionfs: Main module functions

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Module init & cleanup code, as well as interposition functions.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/main.c |  685 +
 1 files changed, 685 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
new file mode 100644
index 000..34fb5f8
--- /dev/null
+++ b/fs/unionfs/main.c
@@ -0,0 +1,685 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+#include 
+#include 
+
+/* sb we pass is unionfs's super_block */
+int unionfs_interpose(struct dentry *dentry, struct super_block *sb, int flag)
+{
+   struct inode *hidden_inode;
+   struct dentry *hidden_dentry;
+   int err = 0;
+   struct inode *inode;
+   int is_negative_dentry = 1;
+   int bindex, bstart, bend;
+
+   verify_locked(dentry);
+
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+
+   /* Make sure that we didn't get a negative dentry. */
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   if (unionfs_lower_dentry_idx(dentry, bindex) &&
+   unionfs_lower_dentry_idx(dentry, bindex)->d_inode) {
+   is_negative_dentry = 0;
+   break;
+   }
+   }
+   BUG_ON(is_negative_dentry);
+
+   /* We allocate our new inode below, by calling iget.
+* iget will call our read_inode which will initialize some
+* of the new inode's fields
+*/
+
+   /* On revalidate we've already got our own inode and just need
+* to fix it up. */
+   if (flag == INTERPOSE_REVAL) {
+   inode = dentry->d_inode;
+   UNIONFS_I(inode)->bstart = -1;
+   UNIONFS_I(inode)->bend = -1;
+   atomic_set(&UNIONFS_I(inode)->generation,
+  atomic_read(&UNIONFS_SB(sb)->generation));
+
+   UNIONFS_I(inode)->lower_inodes =
+   kcalloc(sbmax(sb), sizeof(struct inode *), GFP_KERNEL);
+   if (!UNIONFS_I(inode)->lower_inodes) {
+   err = -ENOMEM;
+   goto out;
+   }
+   mutex_lock(&inode->i_mutex);
+   } else {
+   ino_t ino;
+   /* get unique inode number for unionfs */
+   ino = iunique(sb, UNIONFS_ROOT_INO);
+
+   inode = iget(sb, ino);
+   if (!inode) {
+   err = -EACCES;  /* should be impossible??? */
+   goto out;
+   }
+
+   mutex_lock(&inode->i_mutex);
+   if (atomic_read(&inode->i_count) > 1)
+   goto skip;
+   }
+
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (!hidden_dentry) {
+   unionfs_set_lower_inode_idx(inode, bindex, NULL);
+   continue;
+   }
+
+   /* Initialize the hidden inode to the new hidden inode. */
+   if (!hidden_dentry->d_inode)
+   continue;
+
+   unionfs_set_lower_inode_idx(inode, bindex,
+   igrab(hidden_dentry->d_inode));
+   }
+
+   ibstart(inode) = dbstart(dentry);
+   ibend(inode) = dbend(dentry);
+
+   /* Use attributes from the first branch. */
+   hidden_inode = unionfs_lower_inode(inode);
+
+   /* Use different set of inode ops for symlinks & directories */
+   if (S_ISLNK(hidden_inode->i_mode))
+   inode->i_op = &unionfs_symlink_iops;
+   else if (S_ISDIR(hidden_inode->i_mode))
+   inode->i_op = &unionfs_dir_iops;
+
+   /* Use different set of file ops for directories */
+   if (S_ISDIR(hidden_inode->i_mode))
+   inode->i_fop = &unionfs_dir_fops;
+
+   /* properly initialize special inodes */
+   if (S_ISBLK(hidden_inode->i_mode) || S_ISCHR(hidden_inode->i_mode) ||
+   S_ISFIFO(hidden_inode->i_mode) || S_ISSOCK(hidden_inode->i_mode))
+   init_special_inode(ino

[PATCH 18/35] Unionfs: File operations

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch provides the file operations for Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/file.c |  258 +
 1 files changed, 258 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
new file mode 100644
index 000..3dc9f8f
--- /dev/null
+++ b/fs/unionfs/file.c
@@ -0,0 +1,258 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* declarations for sparse */
+extern ssize_t unionfs_read(struct file *, char __user *, size_t, loff_t *);
+extern ssize_t unionfs_write(struct file *, const char __user *, size_t,
+loff_t *);
+
+/***
+ * File Operations *
+ ***/
+
+static loff_t unionfs_llseek(struct file *file, loff_t offset, int origin)
+{
+   loff_t err;
+   struct file *hidden_file = NULL;
+
+   if ((err = unionfs_file_revalidate(file, 0)))
+   goto out;
+
+   hidden_file = unionfs_lower_file(file);
+   /* always set hidden position to this one */
+   hidden_file->f_pos = file->f_pos;
+
+   memcpy(&hidden_file->f_ra, &file->f_ra, sizeof(struct file_ra_state));
+
+   if (hidden_file->f_op && hidden_file->f_op->llseek)
+   err = hidden_file->f_op->llseek(hidden_file, offset, origin);
+   else
+   err = generic_file_llseek(hidden_file, offset, origin);
+
+   if (err < 0)
+   goto out;
+   if (err != file->f_pos) {
+   file->f_pos = err;
+   file->f_version++;
+   }
+out:
+   return err;
+}
+
+ssize_t unionfs_read(struct file * file, char __user * buf, size_t count,
+loff_t * ppos)
+{
+   struct file *hidden_file;
+   loff_t pos = *ppos;
+   int err;
+
+   if ((err = unionfs_file_revalidate(file, 0)))
+   goto out;
+
+   err = -EINVAL;
+   hidden_file = unionfs_lower_file(file);
+   if (!hidden_file->f_op || !hidden_file->f_op->read)
+   goto out;
+
+   err = hidden_file->f_op->read(hidden_file, buf, count, &pos);
+   *ppos = pos;
+
+out:
+   return err;
+}
+
+ssize_t __unionfs_write(struct file * file, const char __user * buf,
+   size_t count, loff_t * ppos)
+{
+   int err = -EINVAL;
+   struct file *hidden_file = NULL;
+   struct inode *inode;
+   struct inode *hidden_inode;
+   loff_t pos = *ppos;
+   int bstart, bend;
+
+   inode = file->f_dentry->d_inode;
+
+   bstart = fbstart(file);
+   bend = fbend(file);
+
+   BUG_ON(bstart == -1);
+
+   hidden_file = unionfs_lower_file(file);
+   hidden_inode = hidden_file->f_dentry->d_inode;
+
+   if (!hidden_file->f_op || !hidden_file->f_op->write)
+   goto out;
+
+   /* adjust for append -- seek to the end of the file */
+   if (file->f_flags & O_APPEND)
+   pos = inode->i_size;
+
+   err = hidden_file->f_op->write(hidden_file, buf, count, &pos);
+
+   /*
+* copy ctime and mtime from lower layer attributes
+* atime is unchanged for both layers
+*/
+   if (err >= 0)
+   fsstack_copy_attr_times(inode, hidden_inode);
+
+   *ppos = pos;
+
+   /* update this inode's size */
+   if (pos > inode->i_size)
+   inode->i_size = pos;
+out:
+   return err;
+}
+
+ssize_t unionfs_write(struct file * file, const char __user * buf, size_t 
count,
+ loff_t * ppos)
+{
+   int err = 0;
+
+   if ((err = unionfs_file_revalidate(file, 1)))
+   goto out;
+
+   err = __unionfs_write(file, buf, count, ppos);
+
+out:
+   return err;
+}
+
+static int unionfs_file_readdir(struct file *file, void *dirent,
+   filldir_t filldir)
+{
+   return -ENOTDIR;
+}
+
+static unsigned int unionfs_poll(struct file *file, poll_table * wait)
+{
+   unsigned int mask = DEFAULT_POLLMASK;
+   struct file *hidden_file = NULL;
+
+   if (unionfs_file_revalidate(file, 0)) {
+   /* We should pretend an error happend.

[PATCH 08/35] struct path: make eCryptfs a user of struct path

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Convert eCryptfs dentry-vfsmount pairs in dentry private data to struct
path.

Cc: Michael Halcrow <[EMAIL PROTECTED]>
Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 fs/ecryptfs/ecryptfs_kernel.h |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index f992533..870a65b 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /* Version verification for shared data structures w/ userspace */
@@ -227,8 +228,7 @@ struct ecryptfs_inode_info {
 /* dentry private data. Each dentry must keep track of a lower
  * vfsmount too. */
 struct ecryptfs_dentry_info {
-   struct dentry *wdi_dentry;
-   struct vfsmount *lower_mnt;
+   struct path lower_path;
struct ecryptfs_crypt_stat *crypt_stat;
 };
 
@@ -355,26 +355,26 @@ ecryptfs_set_dentry_private(struct dentr
 static inline struct dentry *
 ecryptfs_dentry_to_lower(struct dentry *dentry)
 {
-   return ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->wdi_dentry;
+   return ((struct ecryptfs_dentry_info 
*)dentry->d_fsdata)->lower_path.dentry;
 }
 
 static inline void
 ecryptfs_set_dentry_lower(struct dentry *dentry, struct dentry *lower_dentry)
 {
-   ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->wdi_dentry =
+   ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path.dentry =
lower_dentry;
 }
 
 static inline struct vfsmount *
 ecryptfs_dentry_to_lower_mnt(struct dentry *dentry)
 {
-   return ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_mnt;
+   return ((struct ecryptfs_dentry_info 
*)dentry->d_fsdata)->lower_path.mnt;
 }
 
 static inline void
 ecryptfs_set_dentry_lower_mnt(struct dentry *dentry, struct vfsmount 
*lower_mnt)
 {
-   ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_mnt =
+   ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path.mnt =
lower_mnt;
 }
 
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/35] Unionfs: Documentation

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains documentation for Unionfs. You will find several files
outlining basic unification concepts and rename semantics.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 Documentation/filesystems/00-INDEX |2 +
 Documentation/filesystems/unionfs/00-INDEX |8 +++
 Documentation/filesystems/unionfs/concepts.txt |   68 
 Documentation/filesystems/unionfs/rename.txt   |   31 +++
 Documentation/filesystems/unionfs/usage.txt|   41 ++
 5 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/00-INDEX 
b/Documentation/filesystems/00-INDEX
index 4dc28cc..c737e3e 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -82,6 +82,8 @@ udf.txt
- info and mount options for the UDF filesystem.
 ufs.txt
- info on the ufs filesystem.
+unionfs/
+   - info on the unionfs filesystem
 v9fs.txt
- v9fs is a Unix implementation of the Plan 9 9p remote fs protocol.
 vfat.txt
diff --git a/Documentation/filesystems/unionfs/00-INDEX 
b/Documentation/filesystems/unionfs/00-INDEX
new file mode 100644
index 000..fa87f83
--- /dev/null
+++ b/Documentation/filesystems/unionfs/00-INDEX
@@ -0,0 +1,8 @@
+00-INDEX
+   - this file.
+concepts.txt
+   - A brief introduction of concepts
+rename.txt
+   - Information regarding rename operations
+usage.txt
+   - Usage & known limitations
diff --git a/Documentation/filesystems/unionfs/concepts.txt 
b/Documentation/filesystems/unionfs/concepts.txt
new file mode 100644
index 000..0b9bcc9
--- /dev/null
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -0,0 +1,68 @@
+This file describes the concepts needed by a namespace unification file system.
+
+Branch Priority:
+
+
+Each branch is assigned a unique priority - starting from 0 (highest priority).
+No two branches can have the same priority.
+
+
+Branch Mode:
+
+
+Each branch is assigned a mode - read-write or read-only. This allows
+directories on media mounted read-write to be used in a read-only manner.
+
+
+Whiteouts:
+==
+
+A whiteout removes a file name from the namespace. Whiteouts are needed when
+one attempts to remove a file on a read-only branch.
+
+Suppose we have a two-branch union, where branch 0 is read-write and branch 1
+is read-only. And a file 'foo' on branch 1:
+
+./b0/
+./b1/
+./b1/foo
+
+The unified view would simply be:
+
+./union/
+./union/foo
+
+Since 'foo' is stored on a read-only branch, it cannot be removed. A whiteout
+is used to remove the name 'foo' from the unified namespace. Again, since
+branch 1 is read-only, the whiteout cannot be created there. So, we try on a
+higher priority (lower numerically) branch. And there we create the whiteout.
+
+./b0/
+./b0/.wh.foo
+./b1/
+./b1/foo
+
+Later, when Unionfs traverses branches (due to lookup or readdir), it eliminate
+'foo' from the namespace (as well as the whiteout itself.)
+
+
+Duplicate Elimination:
+==
+
+It is possible for files on different branches to have the same name. Unionfs
+then has to select which instance of the file to show to the user. Given the
+fact that each branch has a priority associated with it, the simplest
+solution is to take the instance from the highest priority (lowest numerical
+value) and "hide" the others.
+
+
+Copyup:
+===
+
+When a change is made to the contents of a file's data or meta-data, they
+have to be stored somewhere. The best way is to create a copy of the
+original file on a branch that is writable, and then redirect the write
+though to this copy. The copy must be made on a higher priority branch so
+that lookup & readdir return this newer "version" of the file rather than
+the original (see duplicate elimination).
+
diff --git a/Documentation/filesystems/unionfs/rename.txt 
b/Documentation/filesystems/unionfs/rename.txt
new file mode 100644
index 000..e20bb82
--- /dev/null
+++ b/Documentation/filesystems/unionfs/rename.txt
@@ -0,0 +1,31 @@
+Rename is a complex beast. The following table shows which rename(2) operations
+should succeed and which should fail.
+
+o: success
+E: error (either unionfs or vfs)
+X: EXDEV
+
+none = file does not exist
+file = file is a file
+dir  = file is a empty directory
+child= file is a non-empty directory
+wh   = file is a directory containing only whiteouts; this makes it logically
+   empty
+
+  nonefiledir child   wh
+file  o   o   E   E   E
+dir   o   E   o   E   o
+child X   E   X   E   X
+who   E   o   E   o
+
+
+Renaming directories:
+=
+
+Whenever a empty (either physically or 

[PATCH 30/35] Unionfs: Helper macros/inlines

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains many macros and inline functions used thoughout Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/fanout.h |  177 +++
 1 files changed, 177 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h
new file mode 100644
index 000..309c881
--- /dev/null
+++ b/fs/unionfs/fanout.h
@@ -0,0 +1,177 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef Sipek
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _FANOUT_H_
+#define _FANOUT_H_
+
+/* Inode to private data */
+static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode)
+{
+   return container_of(inode, struct unionfs_inode_info, vfs_inode);
+}
+
+#define ibstart(ino) (UNIONFS_I(ino)->bstart)
+#define ibend(ino) (UNIONFS_I(ino)->bend)
+
+/* Superblock to private data */
+#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info)
+#define sbstart(sb) 0
+#define sbend(sb) UNIONFS_SB(sb)->bend
+#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1)
+
+/* File to private Data */
+#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data))
+#define fbstart(file) (UNIONFS_F(file)->bstart)
+#define fbend(file) (UNIONFS_F(file)->bend)
+
+/* File to lower file. */
+static inline struct file *unionfs_lower_file(struct file *f)
+{
+   return UNIONFS_F(f)->lower_files[fbstart(f)];
+}
+
+static inline struct file *unionfs_lower_file_idx(const struct file *f, int 
index)
+{
+   return UNIONFS_F(f)->lower_files[index];
+}
+
+static inline void unionfs_set_lower_file_idx(struct file *f, int index, 
struct file *val)
+{
+   UNIONFS_F(f)->lower_files[index] = val;
+}
+
+static inline void unionfs_set_lower_file(struct file *f, struct file *val)
+{
+   UNIONFS_F(f)->lower_files[fbstart(f)] = val;
+}
+
+/* Inode to lower inode. */
+static inline struct inode *unionfs_lower_inode(const struct inode *i)
+{
+   return UNIONFS_I(i)->lower_inodes[ibstart(i)];
+}
+
+static inline struct inode *unionfs_lower_inode_idx(const struct inode *i, int 
index)
+{
+   return UNIONFS_I(i)->lower_inodes[index];
+}
+
+static inline void unionfs_set_lower_inode_idx(struct inode *i, int index,
+  struct inode *val)
+{
+   UNIONFS_I(i)->lower_inodes[index] = val;
+}
+
+static inline void unionfs_set_lower_inode(struct inode *i, struct inode *val)
+{
+   UNIONFS_I(i)->lower_inodes[ibstart(i)] = val;
+}
+
+/* Superblock to lower superblock. */
+static inline struct super_block *unionfs_lower_super(const struct super_block 
*sb)
+{
+   return UNIONFS_SB(sb)->data[sbstart(sb)].sb;
+}
+
+static inline struct super_block *unionfs_lower_super_idx(const struct 
super_block *sb, int index)
+{
+   return UNIONFS_SB(sb)->data[index].sb;
+}
+
+static inline void unionfs_set_lower_super_idx(struct super_block *sb, int 
index,
+  struct super_block *val)
+{
+   UNIONFS_SB(sb)->data[index].sb = val;
+}
+
+static inline void unionfs_set_lower_super(struct super_block *sb, struct 
super_block *val)
+{
+   UNIONFS_SB(sb)->data[sbstart(sb)].sb = val;
+}
+
+/* Branch count macros. */
+static inline int branch_count(struct super_block *sb, int index)
+{
+   return atomic_read(&UNIONFS_SB(sb)->data[index].sbcount);
+}
+
+static inline void set_branch_count(struct super_block *sb, int index, int val)
+{
+   atomic_set(&UNIONFS_SB(sb)->data[index].sbcount, val);
+}
+
+static inline void branchget(struct super_block *sb, int index)
+{
+   atomic_inc(&UNIONFS_SB(sb)->data[index].sbcount);
+}
+
+static inline void branchput(struct super_block *sb, int index)
+{
+   atomic_dec(&UNIONFS_SB(sb)->data[index].sbcount);
+}
+
+/* Dentry macros */
+static inline struct unionfs_dentry_info *UNIONFS_D(const struct dentry *dent)
+{
+   return dent->d_fsdata;
+}
+
+#define dbstart(dent) (UNIONFS_D(dent)->bstart)
+#define set_dbstart(dent, val) do { UNIONFS_D(dent)->bstart = val; } while(0)
+#define dbend(dent) (UNIONFS_D(dent)->bend)
+#define set_dbend(dent, val) do { UNIONFS_D(dent)->bend = val; } while(0)
+#define dbopaque(dent) (UNIONFS_D(dent)->bopaque)
+#define set_dbopaque(dent, val) do { UNIONFS_D(dent

[PATCH 13/35] lookup_one_len_nd - lookup_one_len with nameidata argument

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch renames lookup_one_len to lookup_one_len_nd, and adds a nameidata
argument. An inline function, lookup_one_len (which calls lookup_one_len_nd
with nd == NULL) preserves original behavior.

The following Unionfs patches depend on this one.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 fs/namei.c|8 
 include/linux/namei.h |   10 +-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8a7b923..76de391 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1290,8 +1290,8 @@ static struct dentry *lookup_hash(struct
return __lookup_hash(&nd->last, nd->dentry, nd);
 }
 
-/* SMP-safe */
-struct dentry * lookup_one_len(const char * name, struct dentry * base, int 
len)
+struct dentry * lookup_one_len_nd(const char *name, struct dentry * base,
+ int len, struct nameidata *nd)
 {
unsigned long hash;
struct qstr this;
@@ -1311,7 +1311,7 @@ struct dentry * lookup_one_len(const cha
}
this.hash = end_name_hash(hash);
 
-   return __lookup_hash(&this, base, NULL);
+   return __lookup_hash(&this, base, nd);
 access:
return ERR_PTR(-EACCES);
 }
@@ -2756,7 +2756,7 @@ EXPORT_SYMBOL(follow_up);
 EXPORT_SYMBOL(get_write_access); /* binfmt_aout */
 EXPORT_SYMBOL(getname);
 EXPORT_SYMBOL(lock_rename);
-EXPORT_SYMBOL(lookup_one_len);
+EXPORT_SYMBOL(lookup_one_len_nd);
 EXPORT_SYMBOL(page_follow_link_light);
 EXPORT_SYMBOL(page_put_link);
 EXPORT_SYMBOL(page_readlink);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index d39a5a6..8551806 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -81,7 +81,15 @@ extern struct file *lookup_instantiate_f
 extern struct file *nameidata_to_filp(struct nameidata *nd, int flags);
 extern void release_open_intent(struct nameidata *);
 
-extern struct dentry * lookup_one_len(const char *, struct dentry *, int);
+extern struct dentry * lookup_one_len_nd(const char *,
+   struct dentry *, int, struct nameidata *);
+
+/* SMP-safe */
+static inline struct dentry *lookup_one_len(const char *name,
+   struct dentry *dir, int len)
+{
+   return lookup_one_len_nd(name, dir, len, NULL);
+}
 
 extern int follow_down(struct vfsmount **, struct dentry **);
 extern int follow_up(struct vfsmount **, struct dentry **);
-- 
1.4.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/35] Unionfs: Directory file operations

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch provides directory file operations.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/dirfops.c |  264 ++
 1 files changed, 264 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
new file mode 100644
index 000..ab389eb
--- /dev/null
+++ b/fs/unionfs/dirfops.c
@@ -0,0 +1,264 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* Make sure our rdstate is playing by the rules. */
+static void verify_rdstate_offset(struct unionfs_dir_state *rdstate)
+{
+   BUG_ON(rdstate->offset >= DIREOF);
+   BUG_ON(rdstate->cookie >= MAXRDCOOKIE);
+}
+
+struct unionfs_getdents_callback {
+   struct unionfs_dir_state *rdstate;
+   void *dirent;
+   int entries_written;
+   int filldir_called;
+   int filldir_error;
+   filldir_t filldir;
+   struct super_block *sb;
+};
+
+/* based on generic filldir in fs/readir.c */
+static int unionfs_filldir(void *dirent, const char *name, int namelen,
+  loff_t offset, u64 ino, unsigned int d_type)
+{
+   struct unionfs_getdents_callback *buf = dirent;
+   struct filldir_node *found = NULL;
+   int err = 0;
+   int is_wh_entry = 0;
+
+   buf->filldir_called++;
+
+   if ((namelen > UNIONFS_WHLEN) &&
+   !strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
+   name += UNIONFS_WHLEN;
+   namelen -= UNIONFS_WHLEN;
+   is_wh_entry = 1;
+   }
+
+   found = find_filldir_node(buf->rdstate, name, namelen);
+
+   if (found)
+   goto out;
+
+   /* if 'name' isn't a whiteout, filldir it. */
+   if (!is_wh_entry) {
+   off_t pos = rdstate2offset(buf->rdstate);
+   u64 unionfs_ino = ino;
+
+   if (!err) {
+   err = buf->filldir(buf->dirent, name, namelen, pos,
+  unionfs_ino, d_type);
+   buf->rdstate->offset++;
+   verify_rdstate_offset(buf->rdstate);
+   }
+   }
+   /* If we did fill it, stuff it in our hash, otherwise return an error */
+   if (err) {
+   buf->filldir_error = err;
+   goto out;
+   }
+   buf->entries_written++;
+   if ((err = add_filldir_node(buf->rdstate, name, namelen,
+   buf->rdstate->bindex, is_wh_entry)))
+   buf->filldir_error = err;
+
+out:
+   return err;
+}
+
+static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
+{
+   int err = 0;
+   struct file *hidden_file = NULL;
+   struct inode *inode = NULL;
+   struct unionfs_getdents_callback buf;
+   struct unionfs_dir_state *uds;
+   int bend;
+   loff_t offset;
+
+   if ((err = unionfs_file_revalidate(file, 0)))
+   goto out;
+
+   inode = file->f_dentry->d_inode;
+
+   uds = UNIONFS_F(file)->rdstate;
+   if (!uds) {
+   if (file->f_pos == DIREOF) {
+   goto out;
+   } else if (file->f_pos > 0) {
+   uds = find_rdstate(inode, file->f_pos);
+   if (!uds) {
+   err = -ESTALE;
+   goto out;
+   }
+   UNIONFS_F(file)->rdstate = uds;
+   } else {
+   init_rdstate(file);
+   uds = UNIONFS_F(file)->rdstate;
+   }
+   }
+   bend = fbend(file);
+
+   while (uds->bindex <= bend) {
+   hidden_file = unionfs_lower_file_idx(file, uds->bindex);
+   if (!hidden_file) {
+   uds->bindex++;
+   uds->dirpos = 0;
+   continue;
+   }
+
+   /* prepare callback buffer */
+   buf.filldir_called = 0;
+   buf.filldir_error = 0;
+   buf.entries_written = 0;
+   buf.dirent = dirent;
+   buf.filldir 

[PATCH 22/35] Unionfs: Lookup helper functions

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch provides helper functions for the lookup operations in Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/lookup.c |  506 +++
 1 files changed, 506 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
new file mode 100644
index 000..7376e69
--- /dev/null
+++ b/fs/unionfs/lookup.c
@@ -0,0 +1,506 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+static int is_opaque_dir(struct dentry *dentry, int bindex);
+static int is_validname(const char *name);
+
+struct dentry *unionfs_lookup_backend(struct dentry *dentry, struct nameidata 
*nd,
+ int lookupmode)
+{
+   int err = 0;
+   struct dentry *hidden_dentry = NULL;
+   struct dentry *wh_hidden_dentry = NULL;
+   struct dentry *hidden_dir_dentry = NULL;
+   struct dentry *parent_dentry = NULL;
+   int bindex, bstart, bend, bopaque;
+   int dentry_count = 0;   /* Number of positive dentries. */
+   int first_dentry_offset = -1;
+   struct dentry *first_hidden_dentry = NULL;
+   struct vfsmount *first_hidden_mnt = NULL;
+   int locked_parent = 0;
+   int locked_child = 0;
+
+   int opaque;
+   char *whname = NULL;
+   const char *name;
+   int namelen;
+
+   /* We should already have a lock on this dentry in the case of a
+* partial lookup, or a revalidation. Otherwise it is returned from
+* new_dentry_private_data already locked.  */
+   if (lookupmode == INTERPOSE_PARTIAL || lookupmode == INTERPOSE_REVAL ||
+   lookupmode == INTERPOSE_REVAL_NEG)
+   verify_locked(dentry);
+   else {
+   BUG_ON(UNIONFS_D(dentry) != NULL);
+   locked_child = 1;
+   }
+   if (lookupmode != INTERPOSE_PARTIAL)
+   if ((err = new_dentry_private_data(dentry)))
+   goto out;
+   /* must initialize dentry operations */
+   dentry->d_op = &unionfs_dops;
+
+   parent_dentry = dget_parent(dentry);
+   /* We never partial lookup the root directory. */
+   if (parent_dentry != dentry) {
+   lock_dentry(parent_dentry);
+   locked_parent = 1;
+   } else {
+   dput(parent_dentry);
+   parent_dentry = NULL;
+   goto out;
+   }
+
+   name = dentry->d_name.name;
+   namelen = dentry->d_name.len;
+
+   /* No dentries should get created for possible whiteout names. */
+   if (!is_validname(name)) {
+   err = -EPERM;
+   goto out_free;
+   }
+
+   /* Now start the actual lookup procedure. */
+   bstart = dbstart(parent_dentry);
+   bend = dbend(parent_dentry);
+   bopaque = dbopaque(parent_dentry);
+   BUG_ON(bstart < 0);
+
+   /* It would be ideal if we could convert partial lookups to only have
+* to do this work when they really need to.  It could probably improve
+* performance quite a bit, and maybe simplify the rest of the code. */
+   if (lookupmode == INTERPOSE_PARTIAL) {
+   bstart++;
+   if ((bopaque != -1) && (bopaque < bend))
+   bend = bopaque;
+   }
+
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   hidden_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (lookupmode == INTERPOSE_PARTIAL && hidden_dentry)
+   continue;
+   BUG_ON(hidden_dentry != NULL);
+
+   hidden_dir_dentry = unionfs_lower_dentry_idx(parent_dentry, 
bindex);
+
+   /* if the parent hidden dentry does not exist skip this */
+   if (!(hidden_dir_dentry && hidden_dir_dentry->d_inode))
+   continue;
+
+   /* also skip it if the parent isn't a directory. */
+   if (!S_ISDIR(hidden_dir_dentry->d_inode->i_mode))
+   continue;
+
+   /* Reuse the whiteout name because its value doesn't change. */
+   if (!whname) {
+   

[PATCH 20/35] Unionfs: Directory manipulation helper functions

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains directory manipulation helper functions.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/dirhelper.c |  270 
 1 files changed, 270 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
new file mode 100644
index 000..714af2d
--- /dev/null
+++ b/fs/unionfs/dirhelper.c
@@ -0,0 +1,270 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* Delete all of the whiteouts in a given directory for rmdir.
+ *
+ * hidden directory inode should be locked
+ */
+int do_delete_whiteouts(struct dentry *dentry, int bindex,
+struct unionfs_dir_state *namelist)
+{
+   int err = 0;
+   struct dentry *hidden_dir_dentry = NULL;
+   struct dentry *hidden_dentry;
+   char *name = NULL, *p;
+   struct inode *hidden_dir;
+
+   int i;
+   struct list_head *pos;
+   struct filldir_node *cursor;
+
+   /* Find out hidden parent dentry */
+   hidden_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   BUG_ON(!S_ISDIR(hidden_dir_dentry->d_inode->i_mode));
+   hidden_dir = hidden_dir_dentry->d_inode;
+   BUG_ON(!S_ISDIR(hidden_dir->i_mode));
+
+   err = -ENOMEM;
+   name = __getname();
+   if (!name)
+   goto out;
+   strcpy(name, UNIONFS_WHPFX);
+   p = name + UNIONFS_WHLEN;
+
+   err = 0;
+   for (i = 0; !err && i < namelist->size; i++) {
+   list_for_each(pos, &namelist->list[i]) {
+   cursor =
+   list_entry(pos, struct filldir_node, file_list);
+   /* Only operate on whiteouts in this branch. */
+   if (cursor->bindex != bindex)
+   continue;
+   if (!cursor->whiteout)
+   continue;
+
+   strcpy(p, cursor->name);
+   hidden_dentry =
+   lookup_one_len(name, hidden_dir_dentry,
+  cursor->namelen + UNIONFS_WHLEN);
+   if (IS_ERR(hidden_dentry)) {
+   err = PTR_ERR(hidden_dentry);
+   break;
+   }
+   if (hidden_dentry->d_inode)
+   err = vfs_unlink(hidden_dir, hidden_dentry);
+   dput(hidden_dentry);
+   if (err)
+   break;
+   }
+   }
+
+   __putname(name);
+
+   /* After all of the removals, we should copy the attributes once. */
+   fsstack_copy_attr_times(dentry->d_inode, hidden_dir_dentry->d_inode);
+
+out:
+   return err;
+}
+
+/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */
+int delete_whiteouts(struct dentry *dentry, int bindex,
+struct unionfs_dir_state *namelist)
+{
+   int err;
+   struct super_block *sb;
+   struct dentry *hidden_dir_dentry;
+   struct inode *hidden_dir;
+
+   struct sioq_args args;
+
+   sb = dentry->d_sb;
+   unionfs_read_lock(sb);
+
+   BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
+   BUG_ON(bindex < dbstart(dentry));
+   BUG_ON(bindex > dbend(dentry));
+   err = is_robranch_super(sb, bindex);
+   if (err)
+   goto out;
+
+   hidden_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   BUG_ON(!S_ISDIR(hidden_dir_dentry->d_inode->i_mode));
+   hidden_dir = hidden_dir_dentry->d_inode;
+   BUG_ON(!S_ISDIR(hidden_dir->i_mode));
+
+   mutex_lock(&hidden_dir->i_mutex);
+   if (!permission(hidden_dir, MAY_WRITE | MAY_EXEC, NULL))
+   err = do_delete_whiteouts(dentry, bindex, namelist);
+   else {
+   args.deletewh.namelist = namelist;
+   args.deletewh.dentry = dentry;
+   args.deletewh.bindex = bindex;
+   run_sioq(__delete_whiteouts, &args);
+   err = args.err;
+   }
+   mutex_unlock(&hidd

[PATCH 16/35] Unionfs: Copyup Functionality

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains the functions used to perform copyup operations in unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/copyup.c |  665 +++
 1 files changed, 665 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
new file mode 100644
index 000..0557c02
--- /dev/null
+++ b/fs/unionfs/copyup.c
@@ -0,0 +1,665 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York*
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* Determine the mode based on the copyup flags, and the existing dentry. */
+static int copyup_permissions(struct super_block *sb,
+ struct dentry *old_hidden_dentry,
+ struct dentry *new_hidden_dentry)
+{
+   struct inode *i = old_hidden_dentry->d_inode;
+   struct iattr newattrs;
+   int err;
+
+   newattrs.ia_atime = i->i_atime;
+   newattrs.ia_mtime = i->i_mtime;
+   newattrs.ia_ctime = i->i_ctime;
+
+   newattrs.ia_gid = i->i_gid;
+   newattrs.ia_uid = i->i_uid;
+
+   newattrs.ia_mode = i->i_mode;
+
+   newattrs.ia_valid = ATTR_CTIME | ATTR_ATIME | ATTR_MTIME |
+   ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_FORCE |
+   ATTR_GID | ATTR_UID | ATTR_MODE;
+
+   err = notify_change(new_hidden_dentry, &newattrs);
+
+   return err;
+}
+
+int copyup_dentry(struct inode *dir, struct dentry *dentry,
+ int bstart, int new_bindex,
+ struct file **copyup_file, loff_t len)
+{
+   return copyup_named_dentry(dir, dentry, bstart, new_bindex,
+  dentry->d_name.name,
+  dentry->d_name.len, copyup_file, len);
+}
+
+/* create the new device/file/directory - use copyup_permission to copyup
+ * times, and mode
+ *
+ * if the object being copied up is a regular file, the file is only created,
+ * the contents have to be copied up separately
+ */
+static inline int __copyup_ndentry(struct dentry *old_hidden_dentry,
+  struct dentry *new_hidden_dentry,
+  struct dentry *new_hidden_parent_dentry,
+  char *symbuf)
+{
+   int err = 0;
+   umode_t old_mode = old_hidden_dentry->d_inode->i_mode;
+   struct sioq_args args;
+
+   if (S_ISDIR(old_mode)) {
+   args.mkdir.parent = new_hidden_parent_dentry->d_inode;
+   args.mkdir.dentry = new_hidden_dentry;
+   args.mkdir.mode = old_mode;
+
+   run_sioq(__unionfs_mkdir, &args);
+   err = args.err;
+   } else if (S_ISLNK(old_mode)) {
+   args.symlink.parent = new_hidden_parent_dentry->d_inode;
+   args.symlink.dentry = new_hidden_dentry;
+   args.symlink.symbuf = symbuf;
+   args.symlink.mode = old_mode;
+
+   run_sioq(__unionfs_symlink, &args);
+   err = args.err;
+   } else if (S_ISBLK(old_mode)
+  || S_ISCHR(old_mode)
+  || S_ISFIFO(old_mode)
+  || S_ISSOCK(old_mode)) {
+   args.mknod.parent = new_hidden_parent_dentry->d_inode;
+   args.mknod.dentry = new_hidden_dentry;
+   args.mknod.mode = old_mode;
+   args.mknod.dev = old_hidden_dentry->d_inode->i_rdev;
+
+   run_sioq(__unionfs_mknod, &args);
+   err = args.err;
+   } else if (S_ISREG(old_mode)) {
+   args.create.parent = new_hidden_parent_dentry->d_inode;
+   args.create.dentry = new_hidden_dentry;
+   args.create.mode = old_mode;
+   args.create.nd = NULL;
+
+   run_sioq(__unionfs_create, &args);
+   err = args.err;
+   } else {
+   printk(KERN_ERR "Unknown inode type %d\n",
+   old_mode);
+   BUG();
+   }
+
+   return err;
+}
+
+static inline int __copyup_reg_data(struct dentry *dentry,
+   struct dentry *new_hidden_dentry,
+   int n

[PATCH 35/35] Unionfs: Extended Attributes support

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Extended attribute support.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/Kconfig  |9 
 fs/unionfs/Makefile |2 +
 fs/unionfs/copyup.c |   75 +++
 fs/unionfs/inode.c  |   12 +
 fs/unionfs/union.h  |   15 ++
 fs/unionfs/xattr.c  |  123 +++
 6 files changed, 236 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 4b31ea4..b8b8e45 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1567,6 +1567,15 @@ config UNION_FS
 
  See  for details
 
+config UNION_FS_XATTR
+   bool "Unionfs extended attributes"
+   depends on UNION_FS
+   help
+ Extended attributes are name:value pairs associated with inodes by
+ the kernel or by users (see the attr(5) manual page).
+
+ If unsure, say N.
+
 endmenu
 
 menu "Network File Systems"
diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
index 25dd78f..7c98325 100644
--- a/fs/unionfs/Makefile
+++ b/fs/unionfs/Makefile
@@ -3,3 +3,5 @@ obj-$(CONFIG_UNION_FS) += unionfs.o
 unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
stale_inode.o branchman.o rdstate.o copyup.o dirhelper.o \
rename.o unlink.o lookup.o commonfops.o dirfops.o sioq.o
+
+unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o
diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
index 0557c02..ec1c649 100644
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -18,6 +18,75 @@
 
 #include "union.h"
 
+#ifdef CONFIG_UNION_FS_XATTR
+/* copyup all extended attrs for a given dentry */
+static int copyup_xattrs(struct dentry *old_hidden_dentry,
+struct dentry *new_hidden_dentry)
+{
+   int err = 0;
+   ssize_t list_size = -1;
+   char *name_list = NULL;
+   char *attr_value = NULL;
+   char *name_list_orig = NULL;
+
+   list_size = vfs_listxattr(old_hidden_dentry, NULL, 0);
+
+   if (list_size <= 0) {
+   err = list_size;
+   goto out;
+   }
+
+   name_list = xattr_alloc(list_size + 1, XATTR_LIST_MAX);
+   if (!name_list || IS_ERR(name_list)) {
+   err = PTR_ERR(name_list);
+   goto out;
+   }
+   list_size = vfs_listxattr(old_hidden_dentry, name_list, list_size);
+   attr_value = xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX);
+   if (!attr_value || IS_ERR(attr_value)) {
+   err = PTR_ERR(name_list);
+   goto out;
+   }
+   name_list_orig = name_list;
+   while (*name_list) {
+   ssize_t size;
+
+   //We need to lock here since vfs_getxattr doesn't lock for us.
+   mutex_lock(&old_hidden_dentry->d_inode->i_mutex);
+   size = vfs_getxattr(old_hidden_dentry, name_list,
+   attr_value, XATTR_SIZE_MAX);
+   mutex_unlock(&old_hidden_dentry->d_inode->i_mutex);
+   if (size < 0) {
+   err = size;
+   goto out;
+   }
+
+   if (size > XATTR_SIZE_MAX) {
+   err = -E2BIG;
+   goto out;
+   }
+   //We don't need to lock here since vfs_setxattr does it for us.
+   err = vfs_setxattr(new_hidden_dentry, name_list, attr_value,
+  size, 0);
+
+   if (err < 0)
+   goto out;
+   name_list += strlen(name_list) + 1;
+   }
+  out:
+   name_list = name_list_orig;
+
+   if (name_list)
+   xattr_free(name_list, list_size + 1);
+   if (attr_value)
+   xattr_free(attr_value, XATTR_SIZE_MAX);
+   /* It is no big deal if this fails, we just roll with the punches. */
+   if (err == -ENOTSUPP || err == -EOPNOTSUPP)
+   err = 0;
+   return err;
+}
+#endif /* CONFIG_UNION_FS_XATTR */
+
 /* Determine the mode based on the copyup flags, and the existing dentry. */
 static int copyup_permissions(struct super_block *sb,
  struct dentry *old_hidden_dentry,
@@ -343,6 +412,12 @@ int copyup_named_dentry(struct inode *di
if ((err = copyup_permissions(sb, old_hidden_dentry, 
new_hidden_dentry)))
goto out_unlink;
 
+#ifdef CONFIG_UNION_FS_XATTR
+   /* Selinux uses extended attributes for permissions. */
+   if ((err = copyup_xattrs(old_hidden_dentry, new_hidden_dentry)))
+   goto out_unlink;
+#endif
+
/* do not allow files getting deleted to be reinterposed */
if (!d_deleted(dentry))
unionfs_reinterpose(dentry);
diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
index c7806e9..2e45fb0 100644
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -917,10 +917,22 @@

[PATCH 06/35] struct path: Rename DM's struct path

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

Rename DM's struct path to struct dm_path.

Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
---
 drivers/md/dm-emc.c   |   10 +-
 drivers/md/dm-hw-handler.h|2 +-
 drivers/md/dm-mpath.c |6 +++---
 drivers/md/dm-mpath.h |4 ++--
 drivers/md/dm-path-selector.h |   12 ++--
 drivers/md/dm-round-robin.c   |   12 ++--
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/md/dm-emc.c b/drivers/md/dm-emc.c
index 2b2d45d..265c467 100644
--- a/drivers/md/dm-emc.c
+++ b/drivers/md/dm-emc.c
@@ -40,7 +40,7 @@ static inline void free_bio(struct bio *
 
 static int emc_endio(struct bio *bio, unsigned int bytes_done, int error)
 {
-   struct path *path = bio->bi_private;
+   struct dm_path *path = bio->bi_private;
 
if (bio->bi_size)
return 1;
@@ -61,7 +61,7 @@ static int emc_endio(struct bio *bio, un
return 0;
 }
 
-static struct bio *get_failover_bio(struct path *path, unsigned data_size)
+static struct bio *get_failover_bio(struct dm_path *path, unsigned data_size)
 {
struct bio *bio;
struct page *page;
@@ -96,7 +96,7 @@ static struct bio *get_failover_bio(stru
 }
 
 static struct request *get_failover_req(struct emc_handler *h,
-   struct bio *bio, struct path *path)
+   struct bio *bio, struct dm_path *path)
 {
struct request *rq;
struct block_device *bdev = bio->bi_bdev;
@@ -133,7 +133,7 @@ static struct request *get_failover_req(
 }
 
 static struct request *emc_trespass_get(struct emc_handler *h,
-   struct path *path)
+   struct dm_path *path)
 {
struct bio *bio;
struct request *rq;
@@ -191,7 +191,7 @@ static struct request *emc_trespass_get(
 }
 
 static void emc_pg_init(struct hw_handler *hwh, unsigned bypassed,
-   struct path *path)
+   struct dm_path *path)
 {
struct request *rq;
struct request_queue *q = bdev_get_queue(path->dev->bdev);
diff --git a/drivers/md/dm-hw-handler.h b/drivers/md/dm-hw-handler.h
index 15f5629..32eff28 100644
--- a/drivers/md/dm-hw-handler.h
+++ b/drivers/md/dm-hw-handler.h
@@ -32,7 +32,7 @@ struct hw_handler_type {
void (*destroy) (struct hw_handler *hwh);
 
void (*pg_init) (struct hw_handler *hwh, unsigned bypassed,
-struct path *path);
+struct dm_path *path);
unsigned (*error) (struct hw_handler *hwh, struct bio *bio);
int (*status) (struct hw_handler *hwh, status_type_t type,
   char *result, unsigned int maxlen);
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index d754e0b..b5348b1 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -31,7 +31,7 @@ struct pgpath {
struct priority_group *pg;  /* Owning PG */
unsigned fail_count;/* Cumulative failure count */
 
-   struct path path;
+   struct dm_path path;
 };
 
 #define path_to_pgpath(__pgp) container_of((__pgp), struct pgpath, path)
@@ -229,7 +229,7 @@ static void __switch_pg(struct multipath
 
 static int __choose_path_in_pg(struct multipath *m, struct priority_group *pg)
 {
-   struct path *path;
+   struct dm_path *path;
 
path = pg->ps.type->select_path(&pg->ps, &m->repeat_count);
if (!path)
@@ -955,7 +955,7 @@ static int bypass_pg_num(struct multipat
 /*
  * pg_init must call this when it has completed its initialisation
  */
-void dm_pg_init_complete(struct path *path, unsigned err_flags)
+void dm_pg_init_complete(struct dm_path *path, unsigned err_flags)
 {
struct pgpath *pgpath = path_to_pgpath(path);
struct priority_group *pg = pgpath->pg;
diff --git a/drivers/md/dm-mpath.h b/drivers/md/dm-mpath.h
index 8a4bf2b..b9cdcbb 100644
--- a/drivers/md/dm-mpath.h
+++ b/drivers/md/dm-mpath.h
@@ -11,7 +11,7 @@
 
 struct dm_dev;
 
-struct path {
+struct dm_path {
struct dm_dev *dev; /* Read-only */
unsigned is_active; /* Read-only */
 
@@ -20,6 +20,6 @@ struct path {
 };
 
 /* Callback for hwh_pg_init_fn to use when complete */
-void dm_pg_init_complete(struct path *path, unsigned err_flags);
+void dm_pg_init_complete(struct dm_path *path, unsigned err_flags);
 
 #endif
diff --git a/drivers/md/dm-path-selector.h b/drivers/md/dm-path-selector.h
index 732d06a..27357b8 100644
--- a/drivers/md/dm-path-selector.h
+++ b/drivers/md/dm-path-selector.h
@@ -44,7 +44,7 @@ struct path_selector_type {
 * Add an opaque path object, along with some selector specific
 * path args (eg, path priority).
 */
-   int (*add_path) (struct path_selector *ps, struct path *path,
+   int (*add_path) (struct path_selector *ps, struct dm_path *p

[PATCH 17/35] Unionfs: Dentry operations

2006-12-04 Thread Josef 'Jeff' Sipek
From: Josef "Jeff" Sipek <[EMAIL PROTECTED]>

This patch contains the dentry operations for Unionfs.

Signed-off-by: Josef "Jeff" Sipek <[EMAIL PROTECTED]>
Signed-off-by: David Quigley <[EMAIL PROTECTED]>
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/dentry.c |  236 +++
 1 files changed, 236 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
new file mode 100644
index 000..4e2ffa1
--- /dev/null
+++ b/fs/unionfs/dentry.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright (c) 2003-2006 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2006 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2006 Stony Brook University
+ * Copyright (c) 2003-2006 The Research Foundation of State University of New 
York
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* declarations added for "sparse" */
+extern int unionfs_d_revalidate_wrap(struct dentry *dentry,
+struct nameidata *nd);
+extern void unionfs_d_release(struct dentry *dentry);
+extern void unionfs_d_iput(struct dentry *dentry, struct inode *inode);
+
+/*
+ * returns 1 if valid, 0 otherwise.
+ */
+int unionfs_d_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+   int valid = 1;  /* default is valid (1); invalid is 0. */
+   struct dentry *hidden_dentry;
+   int bindex, bstart, bend;
+   int sbgen, dgen;
+   int positive = 0;
+   int locked = 0;
+   int restart = 0;
+   int interpose_flag;
+
+   struct nameidata lowernd; /* TODO: be gentler to the stack */
+
+   if (nd)
+   memcpy(&lowernd, nd, sizeof(struct nameidata));
+   else
+   memset(&lowernd, 0, sizeof(struct nameidata));
+
+restart:
+   verify_locked(dentry);
+
+   /* if the dentry is unhashed, do NOT revalidate */
+   if (d_deleted(dentry)) {
+   printk(KERN_DEBUG "unhashed dentry being revalidated: %*s\n",
+  dentry->d_name.len, dentry->d_name.name);
+   goto out;
+   }
+
+   BUG_ON(dbstart(dentry) == -1);
+   if (dentry->d_inode)
+   positive = 1;
+   dgen = atomic_read(&UNIONFS_D(dentry)->generation);
+   sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
+   /* If we are working on an unconnected dentry, then there is no
+* revalidation to be done, because this file does not exist within the
+* namespace, and Unionfs operates on the namespace, not data.
+*/
+   if (sbgen != dgen) {
+   struct dentry *result;
+   int pdgen;
+
+   unionfs_read_lock(dentry->d_sb);
+   locked = 1;
+
+   /* The root entry should always be valid */
+   BUG_ON(IS_ROOT(dentry));
+
+   /* We can't work correctly if our parent isn't valid. */
+   pdgen = atomic_read(&UNIONFS_D(dentry->d_parent)->generation);
+   if (!restart && (pdgen != sbgen)) {
+   unionfs_read_unlock(dentry->d_sb);
+   locked = 0;
+   /* We must be locked before our parent. */
+   if (!
+   (dentry->d_parent->d_op->
+d_revalidate(dentry->d_parent, nd))) {
+   valid = 0;
+   goto out;
+   }
+   restart = 1;
+   goto restart;
+   }
+   BUG_ON(pdgen != sbgen);
+
+   /* Free the pointers for our inodes and this dentry. */
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+   if (bstart >= 0) {
+   struct dentry *hidden_dentry;
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   hidden_dentry =
+   unionfs_lower_dentry_idx(dentry, bindex);
+   dput(hidden_dentry);
+   }
+   }
+   set_dbstart(dentry, -1);
+   set_dbend(dentry, -1);
+
+   interpose_flag = INTERPOSE_REVAL_NEG;
+   if (positive) {
+   interpose_flag = INTERPOSE_REVAL;
+   mutex_lock(&dentry->d_inode->i_mutex);
+   bstart = ibstart(dentry->d_inode);
+   bend = ibend(dentry->d_inode);
+