[PATCH 0/7] fix setuid/setgid clearing in networked filesystems (take 5)

2007-09-04 Thread Jeff Layton
When an unprivileged process attempts to modify a file that has the
setuid or setgid bits set, the VFS will attempt to clear these bits. The
VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid
mask, and then call notify_change to clear these bits and set the mode
accordingly.

With a networked filesystem (NFS and CIFS in particular but likely
others), the client machine or process may not have credentials that
allow for setting the mode. In some situations, this can lead to file
corruption, an operation failing outright because the setattr fails, or
to races that lead to a mode change being reverted.

In this situation, we'd like to just leave the handling of this to the
server and ignore these bits. The problem is that by the time the
setattr op is called, the VFS has already reinterpreted the ATTR_KILL_*
bits into a mode change and the setattr operation has no way to
know its intent.

The following patchset fixes this by making notify_change no longer
clear the ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before
handing it off to the setattr inode op. setattr can then check for the
presence of these bits, and if they're set it can assume that the mode
change was only for the purposes of clearing these bits.

This means that we now have an implicit assumption that notify_change is
never called with ATTR_MODE and either ATTR_KILL_S*ID bit set. Nothing
currently enforces that, so the first patch also adds a BUG_ON() if that
occurs.

The next two patches fix NFS and CIFS to take advantage of this new
scheme to ignore doing the mode change when these flags are set. The
last four patches fix up callers of notify_change to make sure that
they don't trip the new BUG() call.

This patchset should apply cleanly to 2.6.23-rc4-mm1. This is
basically the same patchset as take 4 with a few extra patches to
fix up the callers of notify_change, and some minor parenthetical
cleanups.

Signed-off-by: Jeff Layton [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations

2007-09-04 Thread Jeff Layton
Make notify_change not clear the ATTR_KILL_S*ID bits in the ia_vaid that
gets passed to the setattr inode operation. This allows the filesystems
to reinterpret whether this mode change is simply intended to clear the
setuid/setgid bits.

This means that notify_change should never be called with both ATTR_MODE
and either of the ATTR_KILL_S*ID bits set, since the filesystem would
have no way to know what part of the mode change was intentional. If
it is called this way, consider it a BUG().

Signed-off-by: Jeff Layton [EMAIL PROTECTED]
---
 fs/attr.c |   23 +--
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index ae58bd3..6b3b07e 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -103,12 +103,11 @@ EXPORT_SYMBOL(inode_setattr);
 int notify_change(struct dentry * dentry, struct iattr * attr)
 {
struct inode *inode = dentry-d_inode;
-   mode_t mode;
+   mode_t mode = inode-i_mode;
int error;
struct timespec now;
unsigned int ia_valid = attr-ia_valid;
 
-   mode = inode-i_mode;
now = current_fs_time(inode-i_sb);
 
attr-ia_ctime = now;
@@ -125,18 +124,22 @@ int notify_change(struct dentry * dentry, struct iattr * 
attr)
if (error)
return error;
}
+
+   /*
+* It's not valid to pass an iattr with both ATTR_MODE and
+* ATTR_KILL_S*ID set.
+*/
+   if ((ia_valid  (ATTR_KILL_SUID|ATTR_KILL_SGID)) 
+   (ia_valid  ATTR_MODE))
+   BUG();
+
if (ia_valid  ATTR_KILL_SUID) {
-   attr-ia_valid = ~ATTR_KILL_SUID;
if (mode  S_ISUID) {
-   if (!(ia_valid  ATTR_MODE)) {
-   ia_valid = attr-ia_valid |= ATTR_MODE;
-   attr-ia_mode = inode-i_mode;
-   }
-   attr-ia_mode = ~S_ISUID;
+   ia_valid = attr-ia_valid |= ATTR_MODE;
+   attr-ia_mode = (inode-i_mode  ~S_ISUID);
}
}
if (ia_valid  ATTR_KILL_SGID) {
-   attr-ia_valid = ~ ATTR_KILL_SGID;
if ((mode  (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
if (!(ia_valid  ATTR_MODE)) {
ia_valid = attr-ia_valid |= ATTR_MODE;
@@ -145,7 +148,7 @@ int notify_change(struct dentry * dentry, struct iattr * 
attr)
attr-ia_mode = ~S_ISGID;
}
}
-   if (!attr-ia_valid)
+   if (!(attr-ia_valid  ~(ATTR_KILL_SUID | ATTR_KILL_SGID)))
return 0;
 
if (ia_valid  ATTR_SIZE)
-- 
1.5.2.1

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change

2007-09-04 Thread Jeff Layton
If the ATTR_KILL_S*ID bits are set then any mode change is only for
clearing the setuid/setgid bits. For NFS skip the mode change and
let the server handle it.

Signed-off-by: Jeff Layton [EMAIL PROTECTED]
---
 fs/nfs/inode.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 45633f9..441bd8b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -327,6 +327,10 @@ nfs_setattr(struct dentry *dentry, struct iattr *attr)
 
nfs_inc_stats(inode, NFSIOS_VFSSETATTR);
 
+   /* skip mode change if it's just for clearing setuid/setgid */
+   if (attr-ia_valid  (ATTR_KILL_SUID | ATTR_KILL_SGID))
+   attr-ia_valid = ~ATTR_MODE;
+
if (attr-ia_valid  ATTR_SIZE) {
if (!S_ISREG(inode-i_mode) || attr-ia_size == 
i_size_read(inode))
attr-ia_valid = ~ATTR_SIZE;
-- 
1.5.2.1

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] CIFS: ignore mode change if it's just for clearing setuid/setgid bits

2007-09-04 Thread Jeff Layton
If the ATTR_KILL_S*ID bits are set then any mode change is only for
clearing the setuid/setgid bits. For NFS, skip the mode change and
let the server handle it.

Signed-off-by: Jeff Layton [EMAIL PROTECTED]
---
 fs/cifs/inode.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 66436f5..8fa3d63 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1547,6 +1547,11 @@ int cifs_setattr(struct dentry *direntry, struct iattr 
*attrs)
}
 
time_buf.Attributes = 0;
+
+   /* skip mode change if it's just for clearing setuid/setgid */
+   if (attrs-ia_valid  (ATTR_KILL_SUID|ATTR_KILL_SGID))
+   attrs-ia_valid = ~ATTR_MODE;
+
if (attrs-ia_valid  ATTR_MODE) {
cFYI(1, (Mode changed to 0x%x, attrs-ia_mode));
mode = attrs-ia_mode;
-- 
1.5.2.1

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] ecryptfs: allow lower fs to interpret ATTR_KILL_S*ID

2007-09-04 Thread Jeff Layton
Make sure ecryptfs doesn't trip the BUG() in notify_change. This also
allows the lower filesystem to interpret these bits in their own way.

Signed-off-by: Jeff Layton [EMAIL PROTECTED]
---
 fs/ecryptfs/inode.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 131954b..dac4199 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -959,6 +959,14 @@ static int ecryptfs_setattr(struct dentry *dentry, struct 
iattr *ia)
if (rc  0)
goto out;
}
+
+   /*
+* mode change is for clearing setuid/setgid bits. Allow lower fs
+* to interpret this in its own way.
+*/
+   if (ia-ia_valid  (ATTR_KILL_SUID | ATTR_KILL_SGID))
+   ia-ia_valid = ~ATTR_MODE;
+
rc = notify_change(lower_dentry, ia);
 out:
fsstack_copy_attr_all(inode, lower_inode, NULL);
-- 
1.5.2.1

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] knfsd: only set ATTR_KILL_S*ID if ATTR_MODE isn't being explicitly set

2007-09-04 Thread Jeff Layton
It's theoretically possible for a single SETATTR call to come in that
sets the mode and the uid/gid. In that case, assume the mode is
correct and don't set the ATTR_KILL_S*ID bits. Doing so would trip the
BUG() in notify_change.

Signed-off-by: Jeff Layton [EMAIL PROTECTED]
---
 fs/nfsd/vfs.c |   17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 70f2c86..3b5b8cf 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -369,14 +369,19 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, 
struct iattr *iap,
if (iap-ia_valid  ATTR_MODE) {
iap-ia_mode = S_IALLUGO;
imode = iap-ia_mode |= (imode  ~S_IALLUGO);
+   if ((iap-ia_valid  ATTR_UID)  iap-ia_uid != inode-i_uid)
+   iap-ia_valid |= ATTR_KILL_PRIV;
+   } else {
+   /*
+* Revoke setuid/setgid bit on chown/chgrp, unless the mode
+* is being explicitly set
+*/
+   if ((iap-ia_valid  ATTR_UID)  iap-ia_uid != inode-i_uid)
+   iap-ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV;
+   if ((iap-ia_valid  ATTR_GID)  iap-ia_gid != inode-i_gid)
+   iap-ia_valid |= ATTR_KILL_SGID;
}
 
-   /* Revoke setuid/setgid bit on chown/chgrp */
-   if ((iap-ia_valid  ATTR_UID)  iap-ia_uid != inode-i_uid)
-   iap-ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV;
-   if ((iap-ia_valid  ATTR_GID)  iap-ia_gid != inode-i_gid)
-   iap-ia_valid |= ATTR_KILL_SGID;
-
/* Change the attributes. */
 
iap-ia_valid |= ATTR_CTIME;
-- 
1.5.2.1

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] reiserfs: turn off ATTR_KILL_S*ID at beginning of reiserfs_setattr

2007-09-04 Thread Jeff Layton
reiserfs_setattr can call notify_change recursively using the same
iattr struct. This could cause it to trip the BUG() in notify_change.
Fix reiserfs to clear those bits near the beginning of the function.

Signed-off-by: Jeff Layton [EMAIL PROTECTED]
---
 fs/reiserfs/inode.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 9ea1200..0804289 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -3061,7 +3061,11 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr 
*attr)
 {
struct inode *inode = dentry-d_inode;
int error;
-   unsigned int ia_valid = attr-ia_valid;
+   unsigned int ia_valid;
+
+   /* must be turned off for recursive notify_change calls */
+   ia_valid = attr-ia_valid = ~(ATTR_KILL_SUID|ATTR_KILL_SGID);
+
reiserfs_write_lock(inode-i_sb);
if (attr-ia_valid  ATTR_SIZE) {
/* version 2 items will be caught by the s_maxbytes check
-- 
1.5.2.1

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] unionfs: fix unionfs_create and unionfs_setattr to handle ATTR_KILL_S*ID

2007-09-04 Thread Jeff Layton
Don't allow either function to trip the BUG() in notify_change. For
unionfs_setattr, clear ATTR_MODE if the either ATTR_KILL_S*ID is set.

unionfs_create is setting the mode explicitly already.  Don't set
ATTR_KILL_S*ID. Just fix up the mode to have the same effect. Also, move
locking the i_mutex to lower in the function. It's not needed until it
checks the i_size.

(Jeff Sipek indicated that he was planning to change some of this code,
so this patch may need changes if it goes in after his patchset)

Signed-off-by: Jeff Layton [EMAIL PROTECTED]
---
 fs/unionfs/inode.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
index 59bb418..f2e7f25 100644
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -78,15 +78,14 @@ static int unionfs_create(struct inode *parent, struct 
dentry *dentry,
struct dentry *lower_dir_dentry;
struct iattr newattrs;
 
-   mutex_lock(wh_dentry-d_inode-i_mutex);
newattrs.ia_valid = ATTR_CTIME | ATTR_MODE | ATTR_ATIME
-   | ATTR_MTIME | ATTR_UID | ATTR_GID | ATTR_FORCE
-   | ATTR_KILL_SUID | ATTR_KILL_SGID;
+   | ATTR_MTIME | ATTR_UID | ATTR_GID | ATTR_FORCE;
 
-   newattrs.ia_mode = mode  ~current-fs-umask;
+   newattrs.ia_mode = mode  ~(current-fs-umask|S_ISUID|S_ISGID);
newattrs.ia_uid = current-fsuid;
newattrs.ia_gid = current-fsgid;
 
+   mutex_lock(wh_dentry-d_inode-i_mutex);
if (wh_dentry-d_inode-i_size != 0) {
newattrs.ia_valid |= ATTR_SIZE;
newattrs.ia_size = 0;
@@ -1032,6 +1031,9 @@ static int unionfs_setattr(struct dentry *dentry, struct 
iattr *ia)
bend = dbend(dentry);
inode = dentry-d_inode;
 
+   if (ia-ia_valid  (ATTR_KILL_SUID | ATTR_KILL_SGID))
+   ia-ia_valid = ~ATTR_MODE;
+
for (bindex = bstart; (bindex = bend) || (bindex == bstart);
 bindex++) {
lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
-- 
1.5.2.1

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] System calls for online defrag

2007-09-04 Thread Andreas Dilger
On Sep 03, 2007  20:03 +0200, Jan Kara wrote:
   I've finally got to writing up some proposal how could look system calls
 allowing for online filesystem defragmentation and generally moving file
 blocks around for improving performance. Comments are welcome.
 
 int sys_movedata(int datafd, int spacefd, loff_t from, size_t len)
The call takes blocks used to carry data starting at offset @from of length
 @len in @spacefd and places them instead of corresponding blocks in @datafd.

Calling these @spacefd and @datafd is a bit confusing.  How about @srcfd
and @tgtfd instead?  For defragmentation, are you planning to have @datafd
be the real inode and @spacefd be the temporary inode with defragged data,
or the reverse?  It isn't really clear.

 Data is copied from @datafd to newly spliced data blocks. If @spacefd contains
 a hole in the specified interval, a hole is created also in @datafd in the
 corresponding place. A data block from @spacefd and also replace a hole in
 @datafd - zeros are copied to such data block. @from and @len should be
 multiples of filesystem block size (otherwise EINVAL is returned). Data blocks
 from @datafd in the interval are released, a hole is created in @spacefd.

This is mostly clear except the last sentence.  I would think that the data
blocks in @datafd are kept, getting a copy of the data, while those in
@spacefd are released?

   Another possibility would be to just replace data blocks without any copying
 of data (that would have to be done by the caller to before calling
 sys_movedata()). The problem here is how to avoid data loss if someone writes
 to the file after userspace has copied the data and before sys_movedata() is
 called.

Isn't that true in any case?

 ssize_t sys_allocate(int fd, int mode, loff_t goal, ssize_t len)
   Allocate new space to file @fd at offset defined by file position.  Both 
 file
 offset and @len should be a multiple of filesystem block size. The whole
 interval must not contain any allocated blocks. If the interval extends past
 EOF, the file size is changed accordingly.  @mode defines a way the filesystem
 will search for blocks. @mode is a bitwise OR of the following flags:
   ALLOC_FIXED_START - allocation must start at @goal; if not specified, @goal
 is just a hint where to start an allocation
   ALLOC_FIXED_LEN - allocate exactly space for @len; if not specified, upto
 @len bytes may be allocated.
   ALLOC_CONTINGUOUS - allocation must be one continguous run of blocks

How is this much different than sys_fallocate()?

 int sys_get_free_blocks(const char *fs, loff_t start, loff_t end, int count,
   struct alloc_extent *space)

One alternate possibility is to call the proposed FIEMAP on the block device,
to return lists of free/used extents?  We have a version of that patch for
ext4 and integration into filefrag, so it would be nice to avoid making up
yet another API/tool if that one is sufficient.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html