Re: [patch 01/26] mount options: add documentation

2008-01-25 Thread David Chinner

 In message [EMAIL PROTECTED], Miklos Szeredi writes:
  From: Miklos Szeredi [EMAIL PROTECTED]
  
  This series addresses the problem of showing mount options in
  /proc/mounts.
[...]
  The following filesystems still need fixing: CIFS, NFS, XFS, Unionfs,
  Reiser4.  For CIFS, NFS and XFS I wasn't able to understand how some
  of the options are used.  The last two are not yet in mainline, so I
  leave fixing those to their respective maintainers out of pure
  laziness.

XFS has already been updated. The fix is in the XFs git tree that
Andrew picks up for -mm releases and will be merged in the 2.6.25-rc1
window. Commit is here:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-2.6.git;a=commit;h=8c33fb6ca99aa17373bd3d5a507ac0eaefb7abb4

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 10/26] mount options: fix devpts

2008-01-25 Thread Miklos Szeredi
  Also add minor fix: when parsing the mode option, mask with
  S_IALLUGO instead of ~S_IFMT, which could leave unsed bits in the
  mask.
 
 umode_t is 16 bits, so it doesn't.  The change is still good, of course.

We still use 16 bit types?  Strange ;)

 
  +   if (config.mode != DEVPTS_DEFAULT_MODE)
  +   seq_printf(seq, ,mode=%03o, config.mode);
 
 I would rather this be unconditional, than that it be conditional on 
 something other than the user having specified it in the first place.

Yeah, it's a matter of taste.  I'll update the patch.

Actually, a lot of filesystems share the options 'uid=X', 'gid=X',
'mode=X' (or 'umask=X').  This could be handled by the VFS, saving
some code, and making things more consistent.  One day maybe...

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 09/26] mount options: fix capifs

2008-01-25 Thread Karsten Keil
On Thu, Jan 24, 2008 at 08:33:50PM +0100, Miklos Szeredi wrote:
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add a .show_options super operation to capifs.
 
 Use generic_show_options() and save the complete option string in
 capifs_remount().
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
Acked-by: Karsten Keil [EMAIL PROTECTED]
 ---
 
 Index: linux/drivers/isdn/capi/capifs.c
 ===
 --- linux.orig/drivers/isdn/capi/capifs.c 2007-10-09 22:31:38.0 
 +0200
 +++ linux/drivers/isdn/capi/capifs.c  2008-01-24 11:37:42.0 +0100
 @@ -52,6 +52,7 @@ static int capifs_remount(struct super_b
   gid_t gid = 0;
   umode_t mode = 0600;
   char *this_char;
 + char *new_opt = kstrdup(data, GFP_KERNEL);
  
   this_char = NULL;
   while ((this_char = strsep(data, ,)) != NULL) {
 @@ -72,11 +73,16 @@ static int capifs_remount(struct super_b
   return -EINVAL;
   }
   }
 +
 + kfree(s-s_options);
 + s-s_options = new_opt;
 +
   config.setuid  = setuid;
   config.setgid  = setgid;
   config.uid = uid;
   config.gid = gid;
   config.mode= mode;
 +
   return 0;
  }
  
 @@ -84,6 +90,7 @@ static struct super_operations capifs_so
  {
   .statfs = simple_statfs,
   .remount_fs = capifs_remount,
 + .show_options   = generic_show_options,
  };
  
  
 
 --

-- 
Karsten Keil
SuSE Labs
ISDN and VOIP development
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 
16746 (AG Nuernberg)
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Cyrill Gorcunov
On Jan 25, 2008 12:29 PM, Miklos Szeredi [EMAIL PROTECTED] wrote:
  | +   /* is this correct? */
  | +   if (sbi-s_anchor[2] != 0)
  | +   seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
 
  you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
  in sake of style unification but we should wait for Jan's
  decision (i'm not the expert in this area ;)

 I think UDF_SB_ANCHOR macro was removed by some patch in -mm.

 I'm more interested if the second element of the s_anchor array really
 does always have the value of the 'anchor=N' mount option.  I haven't
 been able to verify that fully.  Do you have some insight into that?

 Thanks,
 Miklos


Miklos,

I'll check this today evening (a bit busy now).

- Cyrill -
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Add vfsmount to vfs helper functions.

2008-01-25 Thread Kentaro Takeda
In the LSM ml, we are discussing about
how to know requested pathnames within LSM modules.

Currently, VFS helper functions don't pass struct vfsmount parameter.
Therefore, we cannot calculate requested pathnames within LSM modules
because LSM hooks can't know struct vfsmount parameter that corresponds with
struct dentry passed to VFS helper functions.

AppArmor is proposing a patch that appends struct vfsmount parameters to
VFS helper functions so that LSM modules (SELinux, AppArmor, TOMOYO) can
calculate requested pathnames.

The changes in include/linux/fs.h are shown below.
What do you think about these changes?

- Start of changes -
--- fs.h.orig
+++ fs.h
@@ -1070,13 +1070,13 @@
  */
 extern int vfs_permission(struct nameidata *, int);
 extern int vfs_create(struct inode *, struct dentry *, int, struct nameidata 
*);
-extern int vfs_mkdir(struct inode *, struct dentry *, int);
-extern int vfs_mknod(struct inode *, struct dentry *, int, dev_t);
-extern int vfs_symlink(struct inode *, struct dentry *, const char *, int);
-extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
-extern int vfs_rmdir(struct inode *, struct dentry *);
-extern int vfs_unlink(struct inode *, struct dentry *);
-extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct 
dentry *);
+extern int vfs_mkdir(struct inode *, struct dentry *, struct vfsmount *, int);
+extern int vfs_mknod(struct inode *, struct dentry *, struct vfsmount *, int, 
dev_t);
+extern int vfs_symlink(struct inode *, struct dentry *, struct vfsmount *, 
const char *, int);
+extern int vfs_link(struct dentry *, struct vfsmount *, struct inode *, struct 
dentry *, struct vfsmount *);
+extern int vfs_rmdir(struct inode *, struct dentry *, struct vfsmount *);
+extern int vfs_unlink(struct inode *, struct dentry *, struct vfsmount *);
+extern int vfs_rename(struct inode *, struct dentry *, struct vfsmount *, 
struct inode *, struct dentry *, struct vfsmount *);
 
 /*
  * VFS dentry helper functions.
@@ -1538,8 +1538,8 @@
 
 /* fs/open.c */
 
-extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs,
-  struct file *filp);
+extern int do_truncate(struct dentry *, struct vfsmount *, loff_t start,
+  unsigned int time_attrs, struct file *filp);
 extern long do_sys_open(int dfd, const char __user *filename, int flags,
int mode);
 extern struct file * dentry_open(struct dentry *, struct vfsmount *, int);
@@ -1695,7 +1695,7 @@
 #ifdef CONFIG_BLOCK
 extern sector_t bmap(struct inode *, sector_t);
 #endif
-extern int notify_change(struct dentry *, struct iattr *);
+extern int notify_change(struct dentry *, struct vfsmount *, struct iattr *);
 extern int permission(struct inode *, int, struct nameidata *);
 extern int generic_permission(struct inode *, int,
int (*check_acl)(struct inode *, int));
@@ -1757,9 +1757,9 @@
 extern void clear_inode(struct inode *);
 extern void destroy_inode(struct inode *);
 extern struct inode *new_inode(struct super_block *);
-extern int __remove_suid(struct dentry *, int);
+extern int __remove_suid(struct path *, int);
 extern int should_remove_suid(struct dentry *);
-extern int remove_suid(struct dentry *);
+extern int remove_suid(struct path *);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 extern void remove_inode_hash(struct inode *);
- End of changes -

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 01/26] mount options: add documentation

2008-01-25 Thread Miklos Szeredi
 Where did you check for the existence of a -show_options method for
 unionfs?  Unionfs does implement -show_options and supports all of the
 mount/remount options.  See:
 
 http://git.kernel.org/?p=linux/kernel/git/ezk/unionfs.git;a=blob;f=fs/unionfs/super.c;h=986c980261a5b171147d66ac05bf08423e2fd6b6;hb=HEAD#l963
 
 The unionfs -remount code supports branch-management options which can
 add/del/change a branch, but we don't show those directly in -show_options;
 it makes more sense to show the final (and thus most current) branch
 configuration.
 
 Could you update your records please?

Sure.  Sorry about that, I did actually look at unionfs, and it was
just an administration error and bad memory (in my head).

 BTW, I should be able to use your save_mount_options().

It is probably better not to use save_mount_options().  Especially,
since unionfs implemets a remount, that changes the tree only
partially AFAICS.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] ext3 freeze feature

2008-01-25 Thread Takashi Sato
Hi,

Currently, ext3 doesn't have the freeze feature which suspends write
requests.  So, we cannot get a backup which keeps the filesystem's
consistency with the storage device's features (snapshot, replication)
while it is mounted.
In many case, a commercial filesystems (e.g. VxFS) has the freeze
feature and it would be used to get the consistent backup.

So I am planning on implementing the ioctl of the freeze feature for ext3.
I think we can get the consistent backup with the following steps.
1. Freeze the filesystem with ioctl.
2. Separate the replication volume or get the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with ioctl.
4. Get the backup from the separated replication volume
   or the snapshot.

The usage of the ioctl is as below.
 int ioctl(int fd, int cmd, long *timeval)
 fd: The file descriptor of the mountpoint.
 cmd: EXT3_IOC_FREEZE for the freeze or EXT3_IOC_THAW for the unfreeze.
 timeval: The timeout value expressed in seconds.
  If it's 0, the timeout isn't set.
 Return value: 0 if the operation succeeds. Otherwise, -1.

I have made sure that write requests were suspended with the experimental
patch for this feature and attached it in this mail.

The points of the implementation are followings.
- Add calls of the freeze function (freeze_bdev) and
  the unfreeze function (thaw_bdev) in ext3_ioctl().

- ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
  is registered to the delayed work queue to unfreeze the filesystem
  automatically after the lapse of the specified time.

Any comments are very welcome.

Signed-off-by: Takashi Sato [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
linux-2.6.24-rc8/fs/ext3/ioctl.c linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c
--- linux-2.6.24-rc8/fs/ext3/ioctl.c2008-01-16 13:22:48.0 +0900
+++ linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c 2008-01-22 18:20:33.0 
+0900
@@ -254,6 +254,42 @@ flags_err:
return err;
}
 
+   case EXT3_IOC_FREEZE: {
+   long timeout_sec;
+   long timeout_msec;
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+   if (inode-i_sb-s_frozen != SB_UNFROZEN)
+   return -EINVAL;
+   /* arg(sec) to tick value */
+   get_user(timeout_sec, (long __user *) arg);
+   timeout_msec = timeout_sec * 1000;
+   if (timeout_msec  0)
+   return -EINVAL;
+
+   /* Freeze */
+   freeze_bdev(inode-i_sb-s_bdev);
+
+   /* set up unfreeze timer */
+   if (timeout_msec  0)
+   ext3_add_freeze_timeout(EXT3_SB(inode-i_sb),
+   timeout_msec);
+   return 0;
+   }
+   case EXT3_IOC_THAW: {
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+   if (inode-i_sb-s_frozen == SB_UNFROZEN)
+   return -EINVAL;
+
+   /* delete unfreeze timer */
+   ext3_del_freeze_timeout(EXT3_SB(inode-i_sb));
+
+   /* Unfreeze */
+   thaw_bdev(inode-i_sb-s_bdev, inode-i_sb);
+
+   return 0;
+   }
 
default:
return -ENOTTY;
diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
linux-2.6.24-rc8/fs/ext3/super.c linux-2.6.24-rc8-freeze/fs/ext3/super.c
--- linux-2.6.24-rc8/fs/ext3/super.c2008-01-16 13:22:48.0 +0900
+++ linux-2.6.24-rc8-freeze/fs/ext3/super.c 2008-01-22 18:20:33.0 
+0900
@@ -63,6 +63,7 @@ static int ext3_statfs (struct dentry * 
 static void ext3_unlockfs(struct super_block *sb);
 static void ext3_write_super (struct super_block * sb);
 static void ext3_write_super_lockfs(struct super_block *sb);
+static void ext3_freeze_timeout(struct work_struct *work);
 
 /*
  * Wrappers for journal_start/end.
@@ -323,6 +324,44 @@ void ext3_update_dynamic_rev(struct supe
 }
 
 /*
+ * ext3_add_freeze_timeout - Add timeout for ext3 freeze.
+ *
+ * @sbi: ext3 super block
+ * @timeout_msec   : timeout period
+ *
+ * Add the delayed work for ext3 freeze timeout
+ * to the delayed work queue.
+ */
+void ext3_add_freeze_timeout(struct ext3_sb_info *sbi,
+   long timeout_msec)
+{
+   s64 timeout_jiffies = msecs_to_jiffies(timeout_msec);
+
+   /*
+* setup freeze timeout function
+*/
+   INIT_DELAYED_WORK(sbi-s_freeze_timeout, ext3_freeze_timeout);
+
+   /* set delayed work queue */
+   cancel_delayed_work(sbi-s_freeze_timeout);
+   schedule_delayed_work(sbi-s_freeze_timeout, timeout_jiffies);
+}
+
+/*
+ * ext3_del_freeze_timeout - Delete timeout for ext3 freeze.
+ *
+ * @sbi: ext3 super block
+ *
+ * Delete the delayed work for ext3 freeze timeout
+ * from the delayed work queue.
+ */
+void ext3_del_freeze_timeout(struct 

Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Miklos Szeredi
 | +   /* is this correct? */
 | +   if (sbi-s_anchor[2] != 0)
 | +   seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
 
 you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
 in sake of style unification but we should wait for Jan's
 decision (i'm not the expert in this area ;)

I think UDF_SB_ANCHOR macro was removed by some patch in -mm.

I'm more interested if the second element of the s_anchor array really
does always have the value of the 'anchor=N' mount option.  I haven't
been able to verify that fully.  Do you have some insight into that?

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 21/26] mount options: partially fix nfs

2008-01-25 Thread Miklos Szeredi
 Miklos Szeredi wrote:
  From: Miklos Szeredi [EMAIL PROTECTED]
  
  Add posix, bsize=, namelen= options to /proc/mounts for nfs
  filesystems.
  
  Document several other options that are still missing.
 
 NFS lists only some options in /proc/mounts on purpose: only the 
 essential options are mentioned there to keep clutter down.  The three 
 you've added here are for all intents and purposes deprecated, which is 
 why they are not supported.
 
 NFS lists a more complete set of mount options for a mount point in 
 /proc/self/mountstats.  See nfs_show_stats().
 
 Since your cover letter does not explain why you are changing this code, 
 can you refer me to a description of why you are doing this?

Descritption is in the 01/26 patch.

 More below.
 
  Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
  ---
  
  Index: linux/fs/nfs/super.c
  ===
  --- linux.orig/fs/nfs/super.c   2008-01-19 11:56:34.0 +0100
  +++ linux/fs/nfs/super.c2008-01-21 20:41:30.0 +0100
  @@ -449,6 +449,7 @@ static void nfs_show_mount_options(struc
  } nfs_info[] = {
  { NFS_MOUNT_SOFT, ,soft, ,hard },
  { NFS_MOUNT_INTR, ,intr, ,nointr },
  +   { NFS_MOUNT_POSIX, ,posix,  },
  { NFS_MOUNT_NOCTO, ,nocto,  },
  { NFS_MOUNT_NOAC, ,noac,  },
  { NFS_MOUNT_NONLM, ,nolock,  },
  @@ -459,10 +460,17 @@ static void nfs_show_mount_options(struc
  };
  const struct proc_nfs_info *nfs_infop;
  struct nfs_client *clp = nfss-nfs_client;
  +   unsigned int default_namelen =
  +   clp-rpc_ops-version == 4 ? NFS4_MAXNAMLEN :
  +   clp-rpc_ops-version == 3 ? NFS3_MAXNAMLEN : NFS2_MAXNAMLEN;
   
  seq_printf(m, ,vers=%d, clp-rpc_ops-version);
  seq_printf(m, ,rsize=%d, nfss-rsize);
  seq_printf(m, ,wsize=%d, nfss-wsize);
  +   if (nfss-bsize != 0)
  +   seq_printf(m, ,bsize=%d, nfss-bsize);
  +   if (nfss-namelen != default_namelen)
  +   seq_printf(m, ,namelen=%d, nfss-namelen);
  if (nfss-acregmin != 3*HZ || showdefaults)
  seq_printf(m, ,acregmin=%d, nfss-acregmin/HZ);
  if (nfss-acregmax != 60*HZ || showdefaults)
  @@ -482,6 +490,18 @@ static void nfs_show_mount_options(struc
  seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout-to_initval 
  / HZ);
  seq_printf(m, ,retrans=%u, nfss-client-cl_timeout-to_retries);
  seq_printf(m, ,sec=%s, 
  nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor));
  +
  +   /*
  +* Missing options:
  +* port=
 
 Probably should be supported.
 
  +* addr=
 
 This one is already supported; see nfs_show_options().

Right, thanks.

 
  +* clientaddr=
 
 This one isn't, and should be... would be useful for tracking down 
 certain NFSv4 problems.
 
  +* mounthost=
  +* mountaddr=
   +   * mountport=
   +   * mountvers=
   +   * mountproto=
 
 And these mount* options are for the kernel's new mount protocol client. 
   They aren't really useful for understanding steady-state NFS client 
 behavior, they only effect mount-time behavior.

All mount options should be shown, which are needed to reconstruct a
previous mount.

For example, if you copy options out from /proc/mount, umount the
filesystem, and then create a new mount with the copied options, you
should get the same mount.

So not only those options are interesting which are useful for
understanding steady state behavior.

The only options, which should not be shown, are those which have a
permanent effect at mount time, like journal creation, etc.  And those
which are meaningless across different mounts, like communication file
descriptors.

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 21/26] mount options: partially fix nfs

2008-01-25 Thread Miklos Szeredi
 On Thu, 2008-01-24 at 20:34 +0100, Miklos Szeredi wrote:
  plain text document attachment (nfs_opts.patch)
  From: Miklos Szeredi [EMAIL PROTECTED]
  
  Add posix, bsize=, namelen= options to /proc/mounts for nfs
  filesystems.
  
  Document several other options that are still missing.
  
  Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
  ---
  
  Index: linux/fs/nfs/super.c
  ===
  --- linux.orig/fs/nfs/super.c   2008-01-19 11:56:34.0 +0100
  +++ linux/fs/nfs/super.c2008-01-21 20:41:30.0 +0100
  @@ -449,6 +449,7 @@ static void nfs_show_mount_options(struc
  } nfs_info[] = {
  { NFS_MOUNT_SOFT, ,soft, ,hard },
  { NFS_MOUNT_INTR, ,intr, ,nointr },
  +   { NFS_MOUNT_POSIX, ,posix,  },
  { NFS_MOUNT_NOCTO, ,nocto,  },
  { NFS_MOUNT_NOAC, ,noac,  },
  { NFS_MOUNT_NONLM, ,nolock,  },
  @@ -459,10 +460,17 @@ static void nfs_show_mount_options(struc
  };
  const struct proc_nfs_info *nfs_infop;
  struct nfs_client *clp = nfss-nfs_client;
  +   unsigned int default_namelen =
  +   clp-rpc_ops-version == 4 ? NFS4_MAXNAMLEN :
  +   clp-rpc_ops-version == 3 ? NFS3_MAXNAMLEN : NFS2_MAXNAMLEN;
  seq_printf(m, ,vers=%d, clp-rpc_ops-version);
  seq_printf(m, ,rsize=%d, nfss-rsize);
  seq_printf(m, ,wsize=%d, nfss-wsize);
  +   if (nfss-bsize != 0)
  +   seq_printf(m, ,bsize=%d, nfss-bsize);
  +   if (nfss-namelen != default_namelen)
  +   seq_printf(m, ,namelen=%d, nfss-namelen);
 
 You really just want to look at the value of nfss-namelen. It should
 always be set.

OK, I usually add the condition for (value != default_value) to avoid
unnecessary clutter.  But sure, there's no problem with showing the
option unconditionally.

 
  if (nfss-acregmin != 3*HZ || showdefaults)
  seq_printf(m, ,acregmin=%d, nfss-acregmin/HZ);
  if (nfss-acregmax != 60*HZ || showdefaults)
  @@ -482,6 +490,18 @@ static void nfs_show_mount_options(struc
  seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout-to_initval 
  / HZ);
  seq_printf(m, ,retrans=%u, nfss-client-cl_timeout-to_retries);
  seq_printf(m, ,sec=%s, 
  nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor));
  +
  +   /*
  +* Missing options:
  +* port=
  +* mountport=
  +* mountvers=
  +* mountproto=
  +* addr=
  +* clientaddr=
  +* mounthost=
  +* mountaddr=
  +*/
 
 The new text mount interface actually does allow us to store these
 values if we really do need to. That should be a separate patch,
 however.

OK.

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 26/26] mount options: fix usbfs

2008-01-25 Thread Miklos Szeredi
  From: Miklos Szeredi [EMAIL PROTECTED]
  
  Add a .show_options super operation to usbfs.
  
  Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 
 Looks good to me.  Do you want to take this through your tree, as it is
 dependant on other changes, or do you want me to take this through the
 USB tree?  Whatever is easier for you is fine for me.

Please take it, it should be independent of the other changes.

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Pekka Enberg
Hi,

 diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
 linux-2.6.24-rc8/include/linux/ext3_fs_sb.h 
 linux-2.6.24-rc8-freeze/include/linux/ext3_fs_sb.h
 --- linux-2.6.24-rc8/include/linux/ext3_fs_sb.h 2008-01-16 13:22:48.0 
 +0900
 +++ linux-2.6.24-rc8-freeze/include/linux/ext3_fs_sb.h  2008-01-22 
 18:20:33.0 +0900
 @@ -81,6 +81,8 @@ struct ext3_sb_info {
 char *s_qf_names[MAXQUOTAS];/* Names of quota files with 
 journalled quota */
 int s_jquota_fmt;   /* Format of quota to use */
  #endif
 +   /* Delayed work for freeze */
 +   struct delayed_work s_freeze_timeout;

Why not put this in struct super_block? Then you don't need this

 +/**
 + * get_super_block - get super_block
 + * @s_fs_info  : filesystem dependent information
 + *   (super_block.s_fs_info)
 + *
 + * Get super_block which holds s_fs_info from super_blocks.
 + * get_super_block() returns a pointer of super block or
 + * %NULL if it have failed.
 + */
 +struct super_block *get_super_block(void *s_fs_info)
 +{

And these can be put to generic code:

  /*
 + * ext3_add_freeze_timeout - Add timeout for ext3 freeze.
 + *
 + * @sbi: ext3 super block
 + * @timeout_msec   : timeout period
 + *
 + * Add the delayed work for ext3 freeze timeout
 + * to the delayed work queue.
 + */
 +void ext3_add_freeze_timeout(struct ext3_sb_info *sbi,
 +   long timeout_msec)
 +{
 +   s64 timeout_jiffies = msecs_to_jiffies(timeout_msec);
 +
 +   /*
 +* setup freeze timeout function
 +*/
 +   INIT_DELAYED_WORK(sbi-s_freeze_timeout, ext3_freeze_timeout);
 +
 +   /* set delayed work queue */
 +   cancel_delayed_work(sbi-s_freeze_timeout);
 +   schedule_delayed_work(sbi-s_freeze_timeout, timeout_jiffies);
 +}
 +
 +/*
 + * ext3_del_freeze_timeout - Delete timeout for ext3 freeze.
 + *
 + * @sbi: ext3 super block
 + *
 + * Delete the delayed work for ext3 freeze timeout
 + * from the delayed work queue.
 + */
 +void ext3_del_freeze_timeout(struct ext3_sb_info *sbi)
 +{
 +   if (delayed_work_pending(sbi-s_freeze_timeout))
 +   cancel_delayed_work(sbi-s_freeze_timeout);
 +}

 +/*
 + * ext3_freeze_timeout - Thaw the filesystem.
 + *
 + * @work   : work queue (delayed_work.work)
 + *
 + * Called by the delayed work when elapsing the timeout period.
 + * Thaw the filesystem.
 + */
 +static void ext3_freeze_timeout(struct work_struct *work)
 +{
 +   struct ext3_sb_info *sbi = container_of(work,
 +   struct ext3_sb_info,
 +   s_freeze_timeout.work);
 +   struct super_block *sb = get_super_block(sbi);
 +
 +   BUG_ON(sb == NULL);
 +
 +   if (sb-s_frozen != SB_UNFROZEN)
 +   thaw_bdev(sb-s_bdev, sb);
 +}
 +

I am also wondering whether we should have system call(s) for these:

On Jan 25, 2008 12:59 PM, Takashi Sato [EMAIL PROTECTED] wrote:
 +   case EXT3_IOC_FREEZE: {

 +   case EXT3_IOC_THAW: {

And just convert XFS to use them too?

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Takashi Sato

Hi,


I am also wondering whether we should have system call(s) for these:

On Jan 25, 2008 12:59 PM, Takashi Sato [EMAIL PROTECTED] wrote:

+   case EXT3_IOC_FREEZE: {



+   case EXT3_IOC_THAW: {


And just convert XFS to use them too?


I think it is reasonable to implement it as the generic system call, as you 
said.
Does XFS folks think so?

Cheers, Takashi
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Dmitri Monakhov
On 19:59 Fri 25 Jan , Takashi Sato wrote:
 Hi,
 
 Currently, ext3 doesn't have the freeze feature which suspends write
 requests.  So, we cannot get a backup which keeps the filesystem's
 consistency with the storage device's features (snapshot, replication)
 while it is mounted.
 In many case, a commercial filesystems (e.g. VxFS) has the freeze
 feature and it would be used to get the consistent backup.
First of all Linux already have at least one open-source(dm-snap),
and several commercial snapshot solutions. In fact dm-snaps it
not perfect:
a) bit map loading is not supported (this is useful for freezing 
   only used blocks) which causing significant  slowdown even for new writes
b) non patched dm-snap code has significant performance slowdown for all 
   rewrite requests. 
c) IMHO memory footprint is too big.

BUT, it works well for most file-systems.
 
 So I am planning on implementing the ioctl of the freeze feature for ext3.
 I think we can get the consistent backup with the following steps.
 1. Freeze the filesystem with ioctl.
So you plan to do it from userspace.. well good luck with it :)

 2. Separate the replication volume or get the snapshot
with the storage device's feature.
 3. Unfreeze the filesystem with ioctl.

You have to realize what delay between 1-3 stages have to be minimal.
for example dm-snap perform it only for explicit journal flushing.
From my experience if delay is more than 4-5 seconds whole system becomes
unstable.
BTW: you have to always remember that while locking ext3 via freeze_bdev
 sb-ext3_write_super_lockfs() will be called wich implemented as simple
journal lock. This means what some bio-s still may reach original device
even after file system was locked (i've observed this in real life 
situation).

 4. Get the backup from the separated replication volume
or the snapshot.
 
 The usage of the ioctl is as below.
  int ioctl(int fd, int cmd, long *timeval)
  fd: The file descriptor of the mountpoint.
  cmd: EXT3_IOC_FREEZE for the freeze or EXT3_IOC_THAW for the unfreeze.
  timeval: The timeout value expressed in seconds.
   If it's 0, the timeout isn't set.
  Return value: 0 if the operation succeeds. Otherwise, -1.
 
 I have made sure that write requests were suspended with the experimental
 patch for this feature and attached it in this mail.
 
 The points of the implementation are followings.
 - Add calls of the freeze function (freeze_bdev) and
   the unfreeze function (thaw_bdev) in ext3_ioctl().
 
 - ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
   is registered to the delayed work queue to unfreeze the filesystem
   automatically after the lapse of the specified time.
 
 Any comments are very welcome.
 
 Signed-off-by: Takashi Sato [EMAIL PROTECTED]
 ---
 diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
 linux-2.6.24-rc8/fs/ext3/ioctl.c linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c
 --- linux-2.6.24-rc8/fs/ext3/ioctl.c  2008-01-16 13:22:48.0 +0900
 +++ linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c   2008-01-22 18:20:33.0 
 +0900
 @@ -254,6 +254,42 @@ flags_err:
   return err;
   }
  
 + case EXT3_IOC_FREEZE: {
 + long timeout_sec;
 + long timeout_msec;
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 + if (inode-i_sb-s_frozen != SB_UNFROZEN)

 + return -EINVAL
WOW timeout extending is not supported !?
So you wanna say what caller have to set timer to the maximal possible
timeout from the very beginning.
IMHO it is better to use heart-beat timer approach, for example:
each second caller extend it's timeout for two seconds. in this approach
even after caller was killed by any reason, it's timeout will be expired in
two seconds.
 
if (inode-i_sb-s_frozen == SB_FROZEN)
/* extending timeout */
.. 


 + /* arg(sec) to tick value */
 + get_user(timeout_sec, (long __user *) arg);
 + timeout_msec = timeout_sec * 1000;
 + if (timeout_msec  0)
 + return -EINVAL;
 +
 + /* Freeze */
 + freeze_bdev(inode-i_sb-s_bdev);
 +
 + /* set up unfreeze timer */
 + if (timeout_msec  0)
 + ext3_add_freeze_timeout(EXT3_SB(inode-i_sb),
 + timeout_msec);
 + return 0;
 + }
 + case EXT3_IOC_THAW: {
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 + if (inode-i_sb-s_frozen == SB_UNFROZEN)
 + return -EINVAL;
 +
 + /* delete unfreeze timer */
 + ext3_del_freeze_timeout(EXT3_SB(inode-i_sb));
 +
 + /* Unfreeze */
 + thaw_bdev(inode-i_sb-s_bdev, inode-i_sb);
 +
 + return 0;
 + }
  
   default:
   return -ENOTTY;
 diff -uprN -X 

Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)

2008-01-25 Thread Chris Mason
On Friday 25 January 2008, Jan Kara wrote:

  If ext3's DIO code only touches transactions in get_block, then it can
  violate data=ordered rules.  Basically the transaction that allocates
  the blocks might commit before the DIO code gets around to writing them.
 
  A crash in the wrong place will expose stale data on disk.

   Hmm, I've looked at it and I don't think so - look at the rationale in
 the patch below... That patch should fix the lock-inversion problem (at
 least I see no lockdep warnings on my test machine).


Ah ok, when I was looking at this I was allowing holes to get filled without 
falling back to buffered.  But, with the orphan inode entry protecting things 
I see how you're safe with this patch.

-chris
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Jan Kara
On Fri 25-01-08 16:56:13, Miklos Szeredi wrote:
Please use just 'int' for 'remount' option. We are slowly trying to
  get rid of these strange things in UDF code so adding new ones isn't
  desirable.
 
 What's wrong with bool?
 
 I'm not advocating mass replacements, but all new code _should_ use
 it, because it's a very useful and good type.
 
 We are just not very much used to it yet, but don't tell me it's
 confusing to see a type like this ;)
  It's not so confusing but one really isn't used to it ;) But OK, I don't
mind that much...
Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/9] bfs: assorted cleanups

2008-01-25 Thread Dmitri Vorobiev
Hi Adrian,

Please drop the previous version of the BFS patches I sent
to the trivial patches yesterday. I am now sending a new
version, where part #5 has been redone addressing the
comments I received.

Please note that the driver maintainer (I'm Cc:in him now)
has approved the changes in a private email to me:

 If you don't mind --- please proceed the way you were,
 but you may add whatever is the appropriate keyword --- 
 Approved-by: or CC'd to: or whatever they do nowadays
 to indicate that the official maintainer has seen and not
 objected to the patches.

 Kind regards
 Tigran

Thanks,

Dmitri
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/9] bfs: coding style cleanup in fs/bfs/dir.c

2008-01-25 Thread Dmitri Vorobiev
Clean up errors found by checkpatch.pl.

Before the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/dir.c | grep total
total: 7 errors, 1 warnings, 370 lines checked

After the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/dir.c | grep total
total: 0 errors, 1 warnings, 370 lines checked

No functional changes introduced.

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/dir.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index 1fd056d..5462a5b 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -72,7 +72,7 @@ static int bfs_readdir(struct file *f, void *dirent, 
filldir_t filldir)
}
 
unlock_kernel();
-   return 0;   
+   return 0;
 }
 
 const struct file_operations bfs_dir_operations = {
@@ -117,7 +117,7 @@ static int bfs_create(struct inode *dir, struct dentry 
*dentry, int mode,
BFS_I(inode)-i_sblock = 0;
BFS_I(inode)-i_eblock = 0;
insert_inode_hash(inode);
-mark_inode_dirty(inode);
+   mark_inode_dirty(inode);
dump_imap(create, s);
 
err = bfs_add_entry(dir, dentry-d_name.name, dentry-d_name.len,
@@ -228,8 +228,8 @@ static int bfs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
return -EINVAL;
 
lock_kernel();
-   old_bh = bfs_find_entry(old_dir, 
-   old_dentry-d_name.name, 
+   old_bh = bfs_find_entry(old_dir,
+   old_dentry-d_name.name,
old_dentry-d_name.len, old_de);
 
if (!old_bh || (le16_to_cpu(old_de-ino) != old_inode-i_ino))
@@ -237,8 +237,8 @@ static int bfs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
 
error = -EPERM;
new_inode = new_dentry-d_inode;
-   new_bh = bfs_find_entry(new_dir, 
-   new_dentry-d_name.name, 
+   new_bh = bfs_find_entry(new_dir,
+   new_dentry-d_name.name,
new_dentry-d_name.len, new_de);
 
if (new_bh  !new_inode) {
@@ -246,7 +246,7 @@ static int bfs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
new_bh = NULL;
}
if (!new_bh) {
-   error = bfs_add_entry(new_dir, 
+   error = bfs_add_entry(new_dir,
new_dentry-d_name.name,
new_dentry-d_name.len,
old_inode-i_ino);
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-25 Thread Bryan Henderson
 Incidentally, some context for the AIX approach to the OOM problem: a 
 process may exclude itself from OOM vulnerability altogether.  It 
places 
 itself in early allocation mode, which means at the time it creates 
 virtual memory, it reserves enough backing store for the worst case. 
The 
 memory manager does not send such a process the SIGDANGER signal or 
 terminate it when it runs out of paging space.  Before c. 2000, this 
was 
 the only mode.  Now the default is late allocation mode, which is 
similar 
 to Linux.

This is an interesting approach. It feels like some programs might be 
interested in choosing this mode instead of risking OOM. 

It's the way virtual memory always worked when it was first invented.  The 
system not only reserved space to back every page of virtual memory; it 
assigned the particular blocks for it.  Late allocation was a later 
innovation, and I believe its main goal was to make it possible to use the 
cheaper disk drives for paging instead of drums.  Late allocation gives 
you better locality on disk, so the seeking doesn't eat you alive (drums 
don't seek).  Even then, I assume (but am not sure) that the system at 
least reserved the space in an account somewhere so at pageout time there 
was guaranteed to be a place to which to page out.  Overcommitting page 
space to save on disk space was a later idea.

I was surprised to see AIX do late allocation by default, because IBM's 
traditional style is bulletproof systems.  A system where a process can be 
killed at unpredictable times because of resource demands of unrelated 
processes doesn't really fit that style.

It's really a fairly unusual application that benefits from late 
allocation: one that creates a lot more virtual memory than it ever 
touches.  For example, a sparse array.  Or am I missing something?

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread David Chinner
On Sat, Jan 26, 2008 at 04:35:26PM +1100, David Chinner wrote:
 On Fri, Jan 25, 2008 at 07:59:38PM +0900, Takashi Sato wrote:
  The points of the implementation are followings.
  - Add calls of the freeze function (freeze_bdev) and
the unfreeze function (thaw_bdev) in ext3_ioctl().
  
  - ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
is registered to the delayed work queue to unfreeze the filesystem
automatically after the lapse of the specified time.
 
 Seems like pointless complexity to me - what happens if a
 timeout occurs while the filsystem is still freezing?
 
 It's not uncommon for a freeze to take minutes if memory
 is full of dirty data that needs to be flushed out, esp. if
 dm-snap is doing COWs for every write issued

Sorry, ignore this bit - I just realised the timer is set
up after the freeze has occurred

Still, that makes it potentially dangerous to whatever is being
done while the filesystem is frozen

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread David Chinner
On Fri, Jan 25, 2008 at 07:59:38PM +0900, Takashi Sato wrote:
 The points of the implementation are followings.
 - Add calls of the freeze function (freeze_bdev) and
   the unfreeze function (thaw_bdev) in ext3_ioctl().
 
 - ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
   is registered to the delayed work queue to unfreeze the filesystem
   automatically after the lapse of the specified time.

Seems like pointless complexity to me - what happens if a
timeout occurs while the filsystem is still freezing?

It's not uncommon for a freeze to take minutes if memory
is full of dirty data that needs to be flushed out, esp. if
dm-snap is doing COWs for every write issued

 + case EXT3_IOC_FREEZE: {

 + if (inode-i_sb-s_frozen != SB_UNFROZEN)
 + return -EINVAL;

 + freeze_bdev(inode-i_sb-s_bdev);

 + case EXT3_IOC_THAW: {
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 + if (inode-i_sb-s_frozen == SB_UNFROZEN)
 + return -EINVAL;
.
 + /* Unfreeze */
 + thaw_bdev(inode-i_sb-s_bdev, inode-i_sb);

That's inherently unsafe - you can have multiple unfreezes
running in parallel which seriously screws with the bdev semaphore
count that is used to lock the device due to doing multiple up()s
for every down.

Your timeout thingy guarantee that at some point you will get
multiple up()s occuring due to the timer firing racing with
a thaw ioctl. 

If this interface is to be more widely exported, then it needs
a complete revamp of the bdev is locked while it is frozen so
that there is no chance of a double up() ever occuring on the
bd_mount_sem due to racing thaws.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-25 Thread Al Boldi
Chris Snook wrote:
 Al Boldi wrote:
  Greetings!
 
  data=ordered mode has proven reliable over the years, and it does this
  by ordering filedata flushes before metadata flushes.  But this
  sometimes causes contention in the order of a 10x slowdown for certain
  apps, either due to the misuse of fsync or due to inherent behaviour
  like db's, as well as inherent starvation issues exposed by the
  data=ordered mode.
 
  data=writeback mode alleviates data=order mode slowdowns, but only works
  per-mount and is too dangerous to run as a default mode.
 
  This RFC proposes to introduce a tunable which allows to disable fsync
  and changes ordered into writeback writeout on a per-process basis like
  this:
 
echo 1  /proc/`pidof process`/softsync
 
 
  Your comments are much welcome!

 This is basically a kernel workaround for stupid app behavior.

Exactly right to some extent, but don't forget the underlying data=ordered 
starvation problem, which looks like a genuinely deep problem maybe related 
to blockIO.

 It
 wouldn't be the first time we've provided such an option, but we shouldn't
 do it without a very good justification.  At the very least, we need a
 test case that demonstrates the problem

See the 'konqueror deadlocks in 2.6.22' thread.

 and benchmark results that prove that this approach actually fixes it.

8M-record insert into indexed db-table:
 ordered  writeback
sqlite3:  75m22s8m45s
mysql4 :  23m35s5m29s

 I suspect we can find a cleaner fix for the problem.

I hope so, but even with a fix available addressing the data=ordered 
starvation issue, this tunable could remain useful for those apps that 
misbehave.


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-25 Thread Al Boldi
Diego Calleja wrote:
 El Thu, 24 Jan 2008 23:36:00 +0300, Al Boldi [EMAIL PROTECTED] escribió:
  Greetings!
 
  data=ordered mode has proven reliable over the years, and it does this
  by ordering filedata flushes before metadata flushes.  But this
  sometimes causes contention in the order of a 10x slowdown for certain
  apps, either due to the misuse of fsync or due to inherent behaviour
  like db's, as well as inherent starvation issues exposed by the
  data=ordered mode.

 There's a related bug in bugzilla:
 http://bugzilla.kernel.org/show_bug.cgi?id=9546

 The diagnostic from Jan Kara is different though, but I think it may be
 the same problem...

 One process does data-intensive load. Thus in the ordered mode the
 transaction is tiny but has tons of data buffers attached. If commit
 happens, it takes a long time to sync all the data before the commit
 can proceed... In the writeback mode, we don't wait for data buffers, in
 the journal mode amount of data to be written is really limited by the
 maximum size of a transaction and so we write by much smaller chunks
 and better latency is thus ensured.


 I'm hitting this bug too...it's surprising that there's not many people
 reporting more bugs about this, because it's really annoying.


 There's a patch by Jan Kara (that I'm including here because bugzilla
 didn't include it and took me a while to find it) which I don't know if
 it's supposed to fix the problem , but it'd be interesting to try:


Thanks a lot, but it doesn't fix it.

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread David Chinner
On Fri, Jan 25, 2008 at 09:42:30PM +0900, Takashi Sato wrote:
 I am also wondering whether we should have system call(s) for these:
 
 On Jan 25, 2008 12:59 PM, Takashi Sato [EMAIL PROTECTED] wrote:
 +   case EXT3_IOC_FREEZE: {
 
 +   case EXT3_IOC_THAW: {
 
 And just convert XFS to use them too?
 
 I think it is reasonable to implement it as the generic system call, as you
 said.  Does XFS folks think so?

Sure.

Note that we can't immediately remove the XFS ioctls otherwise
we'd break userspace utilities that use them

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [UNIONFS] 00/29 Unionfs and related patches pre-merge review (v2)

2008-01-25 Thread Erez Zadok
In message [EMAIL PROTECTED], Al Viro writes:
 After grep for locking-related things:
 
   * lock_parent(): who said that you won't get dentry moved
 before managing to grab i_mutex on parent?  While we are at it,
 who said that you won't get dentry moved between fetching d_parent
 and doing dget()?  In that case parent could've been _freed_ before
 you get to dget().

OK, so looks like I should use dget_parent() in my lock_parent(), as I've
done elsewhere.  I'll also take a look at all instances in which I get
dentry-d_parent and see if a d_lock is needed there.

   * in create_parents():
 +   struct inode *inode = lower_dentry-d_inode;
 +   /*
 +* If we get here, it means that we created a new
 +* dentry+inode, but copying permissions failed.
 +* Therefore, we should delete this inode and dput
 +* the dentry so as not to leave cruft behind.
 +*/
 +   if (lower_dentry-d_op  lower_dentry-d_op-d_iput)
 +   lower_dentry-d_op-d_iput(lower_dentry,
 +  inode);
 +   else
 +   iput(inode);
 +   lower_dentry-d_inode = NULL;
 +   dput(lower_dentry);
 +   lower_dentry = ERR_PTR(err);
 +   goto out;
 Really?  So what happens if it had become positive after your test and
 somebody had looked it up in lower layer and just now happens to be
 in the middle of operations on it?  Will be thucking frilled by that...

Good catch.  That -d_iput call was an old fix to a bug that has since been
fixed more cleanly and generically in our copyup_permission routine and our
unionfs_d_iput.  I've removed the above -d_iput if and tested to verify
that it's indeed unnecessary.

   * __unionfs_rename():
 +   lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
 +   err = vfs_rename(lower_old_dir_dentry-d_inode, lower_old_dentry,
 +lower_new_dir_dentry-d_inode, lower_new_dentry);
 +   unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
 
 Uh-huh...  To start with, what guarantees that your lower_old_dentry
 is still a child of your lower_old_dir_dentry?

We dget/dget_parent the old/new dentry and parents a few lines above
(actually, it looked like I forgot to dget(lower_new_dentry) -- fixed).
This is a generic stackable f/s issue: ecryptfs does the same stuff before
calling vfs_rename() on the lower objects.

 What's more, you are
 not checking the result of lock_rename(), i.e. asking for serious trouble.

OK.  I'm now checking for the return from lock_rename for ancestor/rename
rules.  I'm CC'ing Mike Halcrow so he can do the same for ecryptfs.

   * revalidation stuff: err...  how the devil can it work for
 directories, when there's nothing to prevent changes in underlying
 layers between -d_revalidate() and operation itself?  For the upper
 layer (unionfs itself) everything's more or less fine, but the rest
 of that...

In a stacked f/s, we keep references to the lower dentries/inodes, so they
can't disappear on us (that happens in our interpose function, called from
our -lookup).  On entry to every f/s method in unionfs, we first perform
lightweight revalidation of our dentry against the lower ones: we check if
m/ctime changed (users modifying lower files) or if the generation# b/t our
super and the our dentries have changed (branch-management took place); if
needed, then we perform a full revalidation of all lower objects (while
holding a lock on the branch configuration).  If we have to do a full reval
upon entry to our -op, and the reval failed, then we return an appropriate
error; o/w we proceed.  (In certain cases, the VFS re-issues a lookup if the
f/s says that it's dentry is invalid.)

Without changes to the VFS, I don't see how else I can ensure cache
coherency cleanly, while allowing users to modify lower files; this feature
is very useful to some unionfs users, who depend on it (so even if I could
lock out the lower directories from being modified, there will be users
who'd still want to be able to modify lower files).

BTW, my sense of the relationship b/t upper and lower objects and their
validity in a stackable f/s, is that it's similar to the relationship b/t
the NFS client and server -- the client can't be sure that a file on the
server doesn't change b/t -revalidate and -op (hence nfs's reliance on dir
mtime checks).

Perhaps this general topic is a good one to discuss at more length at LSF?
Suggestions are welcome.

Thanks,
Erez.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Unionfs: lock_rename related locking fixes

2008-01-25 Thread Erez Zadok
CC: Mike Halcrow [EMAIL PROTECTED]

Signed-off-by: Erez Zadok [EMAIL PROTECTED]
---
 fs/unionfs/rename.c |   16 +++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
index 9306a2b..5ab13f9 100644
--- a/fs/unionfs/rename.c
+++ b/fs/unionfs/rename.c
@@ -29,6 +29,7 @@ static int __unionfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
struct dentry *lower_new_dir_dentry;
struct dentry *lower_wh_dentry;
struct dentry *lower_wh_dir_dentry;
+   struct dentry *trap;
char *wh_name = NULL;
 
lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
@@ -95,6 +96,7 @@ static int __unionfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
goto out;
 
dget(lower_old_dentry);
+   dget(lower_new_dentry);
lower_old_dir_dentry = dget_parent(lower_old_dentry);
lower_new_dir_dentry = dget_parent(lower_new_dentry);
 
@@ -122,9 +124,20 @@ static int __unionfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
 
/* see Documentation/filesystems/unionfs/issues.txt */
lockdep_off();
-   lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+   trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+   /* source should not be ancenstor of target */
+   if (trap == lower_old_dentry) {
+   err = -EINVAL;
+   goto out_err_unlock;
+   }
+   /* target should not be ancenstor of source */
+   if (trap == lower_new_dentry) {
+   err = -ENOTEMPTY;
+   goto out_err_unlock;
+   }
err = vfs_rename(lower_old_dir_dentry-d_inode, lower_old_dentry,
 lower_new_dir_dentry-d_inode, lower_new_dentry);
+out_err_unlock:
unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
lockdep_on();
 
@@ -132,6 +145,7 @@ out_dput:
dput(lower_old_dir_dentry);
dput(lower_new_dir_dentry);
dput(lower_old_dentry);
+   dput(lower_new_dentry);
 
 out:
if (!err) {
-- 
1.5.2.2

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Unionfs: d_parent related locking fixes

2008-01-25 Thread Erez Zadok
Signed-off-by: Erez Zadok [EMAIL PROTECTED]
---
 fs/unionfs/copyup.c |3 +--
 fs/unionfs/union.h  |4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
index 8663224..9beac01 100644
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -716,8 +716,7 @@ struct dentry *create_parents(struct inode *dir, struct 
dentry *dentry,
child_dentry = parent_dentry;
 
/* find the parent directory dentry in unionfs */
-   parent_dentry = child_dentry-d_parent;
-   dget(parent_dentry);
+   parent_dentry = dget_parent(child_dentry);
 
/* find out the lower_parent_dentry in the given branch */
lower_parent_dentry =
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index d324f83..4b4d6c9 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -487,13 +487,13 @@ extern int parse_branch_mode(const char *name, int 
*perms);
 /* locking helpers */
 static inline struct dentry *lock_parent(struct dentry *dentry)
 {
-   struct dentry *dir = dget(dentry-d_parent);
+   struct dentry *dir = dget_parent(dentry);
mutex_lock_nested(dir-d_inode-i_mutex, I_MUTEX_PARENT);
return dir;
 }
 static inline struct dentry *lock_parent_wh(struct dentry *dentry)
 {
-   struct dentry *dir = dget(dentry-d_parent);
+   struct dentry *dir = dget_parent(dentry);
 
mutex_lock_nested(dir-d_inode-i_mutex, UNIONFS_DMUTEX_WHITEOUT);
return dir;
-- 
1.5.2.2

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Unionfs: use first writable branch (fix/cleanup)

2008-01-25 Thread Erez Zadok
Cleanup code in -create, -symlink, and -mknod: refactor common code into
helper functions.  Also, this allows writing to multiple branches again,
which was broken by an earlier patch.

Signed-off-by: Erez Zadok [EMAIL PROTECTED]
---
 fs/unionfs/inode.c |  395 +---
 1 files changed, 156 insertions(+), 239 deletions(-)

diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
index e15ddb9..0b92da2 100644
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -18,14 +18,159 @@
 
 #include union.h
 
+/*
+ * Helper function when creating new objects (create, symlink, and mknod).
+ * Checks to see if there's a whiteout in @lower_dentry's parent directory,
+ * whose name is taken from @dentry.  Then tries to remove that whiteout, if
+ * found.
+ *
+ * Return 0 if no whiteout was found, or if one was found and successfully
+ * removed (a zero tells the caller that @lower_dentry belongs to a good
+ * branch to create the new object in).  Return -ERRNO if an error occurred
+ * during whiteout lookup or in trying to unlink the whiteout.
+ */
+static int check_for_whiteout(struct dentry *dentry,
+ struct dentry *lower_dentry)
+{
+   int err = 0;
+   struct dentry *wh_dentry = NULL;
+   struct dentry *lower_dir_dentry;
+   char *name = NULL;
+
+   /*
+* check if whiteout exists in this branch, i.e. lookup .wh.foo
+* first.
+*/
+   name = alloc_whname(dentry-d_name.name, dentry-d_name.len);
+   if (unlikely(IS_ERR(name))) {
+   err = PTR_ERR(name);
+   goto out;
+   }
+
+   wh_dentry = lookup_one_len(name, lower_dentry-d_parent,
+  dentry-d_name.len + UNIONFS_WHLEN);
+   if (IS_ERR(wh_dentry)) {
+   err = PTR_ERR(wh_dentry);
+   wh_dentry = NULL;
+   goto out;
+   }
+
+   if (!wh_dentry-d_inode) /* no whiteout exists */
+   goto out;
+
+   /* .wh.foo has been found, so let's unlink it */
+   lower_dir_dentry = lock_parent_wh(wh_dentry);
+   /* see Documentation/filesystems/unionfs/issues.txt */
+   lockdep_off();
+   err = vfs_unlink(lower_dir_dentry-d_inode, wh_dentry);
+   lockdep_on();
+   unlock_dir(lower_dir_dentry);
+
+   /*
+* Whiteouts are special files and should be deleted no matter what
+* (as if they never existed), in order to allow this create
+* operation to succeed.  This is especially important in sticky
+* directories: a whiteout may have been created by one user, but
+* the newly created file may be created by another user.
+* Therefore, in order to maintain Unix semantics, if the vfs_unlink
+* above failed, then we have to try to directly unlink the
+* whiteout.  Note: in the ODF version of unionfs, whiteout are
+* handled much more cleanly.
+*/
+   if (err == -EPERM) {
+   struct inode *inode = lower_dir_dentry-d_inode;
+   err = inode-i_op-unlink(inode, wh_dentry);
+   }
+   if (err)
+   printk(KERN_ERR unionfs: could not 
+  unlink whiteout, err = %d\n, err);
+
+out:
+   dput(wh_dentry);
+   kfree(name);
+   return err;
+}
+
+/*
+ * Find a writeable branch to create new object in.  Checks all writeble
+ * branches of the parent inode, from istart to iend order; if none are
+ * suitable, also tries branch 0 (which may require a copyup).
+ *
+ * Return a lower_dentry we can use to create object in, or ERR_PTR.
+ */
+static struct dentry *find_writeable_branch(struct inode *parent,
+   struct dentry *dentry)
+{
+   int err = -EINVAL;
+   int bindex, istart, iend;
+   struct dentry *lower_dentry = NULL;
+
+   istart = ibstart(parent);
+   iend = ibend(parent);
+   if (istart  0)
+   goto out;
+
+begin:
+   for (bindex = istart; bindex = iend; bindex++) {
+   /* skip non-writeable branches */
+   err = is_robranch_super(dentry-d_sb, bindex);
+   if (err) {
+   err = -EROFS;
+   continue;
+   }
+   lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (!lower_dentry)
+   continue;
+   /*
+* check for whiteouts in writeable branch, and remove them
+* if necessary.
+*/
+   err = check_for_whiteout(dentry, lower_dentry);
+   if (err)
+   continue;
+   }
+   /*
+* If istart wasn't already branch 0, and we got any error, then try
+* branch 0 (which may require copyup)
+*/
+   if (err  istart  0) {
+   istart = iend = 0;
+   goto begin;
+   }
+
+   /*
+* If we tried even branch 0, 

Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-25 Thread Al Boldi
[EMAIL PROTECTED] wrote:
 On Thu, 24 Jan 2008 23:36:00 +0300, Al Boldi said:
  This RFC proposes to introduce a tunable which allows to disable fsync
  and changes ordered into writeback writeout on a per-process basis like
  this:
:
:
 But if you want to give them enough rope to shoot themselves in the foot
 with, I'd suggest abusing LD_PRELOAD to replace the fsync() glibc code
 instead.  No need to clutter the kernel with rope that can be (and has
 been) done in userspace.

Ok that's possible, but as you cannot use LD_PRELOAD to deal with changing 
ordered into writeback mode, we might as well allow them to disable fsync 
here, because it is in the same use-case.


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Unionfs: remove unnecessary call to d_iput

2008-01-25 Thread Erez Zadok
This old code was to fix a bug which has long since been fixed in our
copyup_permission and unionfs_d_iput.

Signed-off-by: Erez Zadok [EMAIL PROTECTED]
---
 fs/unionfs/copyup.c |   13 -
 1 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
index 16b2c7c..8663224 100644
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -807,19 +807,6 @@ begin:
 lower_dentry);
unlock_dir(lower_parent_dentry);
if (err) {
-   struct inode *inode = lower_dentry-d_inode;
-   /*
-* If we get here, it means that we created a new
-* dentry+inode, but copying permissions failed.
-* Therefore, we should delete this inode and dput
-* the dentry so as not to leave cruft behind.
-*/
-   if (lower_dentry-d_op  lower_dentry-d_op-d_iput)
-   lower_dentry-d_op-d_iput(lower_dentry,
-  inode);
-   else
-   iput(inode);
-   lower_dentry-d_inode = NULL;
dput(lower_dentry);
lower_dentry = ERR_PTR(err);
goto out;
-- 
1.5.2.2

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL -mm] 0/4 Unionfs updates/fixes/cleanups

2008-01-25 Thread Erez Zadok

The following is a series of patchsets related to Unionfs.  This is the
fifth set of patchsets resulting from an lkml review of the entire unionfs
code base, in preparation for a merge into mainline.  The most significant
changes here are a few locking related fixes, and a correction to broken
logic which didn't allow writing to the first available writable branch.

These patches were tested (where appropriate) on 2.6.24, MM, as well as the
backports to 2.6.{23,22,21,20,19,18,9} on ext2/3/4, xfs, reiserfs, nfs2/3/4,
jffs2, ramfs, tmpfs, cramfs, and squashfs (where available).  Also tested
with LTP-full and with a continuous parallel kernel compile (while forcing
cache flushing, manipulating lower branches, etc.).  See
http://unionfs.filesystems.org/ to download back-ported unionfs code.

Please pull from the 'master' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/ezk/unionfs.git

to receive the following:

Erez Zadok (4):
  Unionfs: use first writable branch (fix/cleanup)
  Unionfs: remove unnecessary call to d_iput
  Unionfs: d_parent related locking fixes
  Unionfs: lock_rename related locking fixes

 copyup.c |   16 --
 inode.c  |  395 ---
 rename.c |   16 ++
 union.h  |4 
 4 files changed, 174 insertions(+), 257 deletions(-)

---
Erez Zadok
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/9] bfs: coding style cleanup in fs/bfs/bfs.h

2008-01-25 Thread Dmitri Vorobiev
Clean up errors found by checkpatch.pl.

Before the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/bfs.h | grep total
total: 2 errors, 0 warnings, 57 lines checked

After the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/bfs.h | grep total
total: 0 errors, 0 warnings, 57 lines checked

No functional changes introduced.

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/bfs.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/bfs/bfs.h b/fs/bfs/bfs.h
index ac7a8b1..090b96e 100644
--- a/fs/bfs/bfs.h
+++ b/fs/bfs/bfs.h
@@ -16,8 +16,8 @@ struct bfs_sb_info {
unsigned long si_freei;
unsigned long si_lf_eblk;
unsigned long si_lasti;
-   unsigned long * si_imap;
-   struct buffer_head * si_sbh;/* buffer header w/superblock */
+   unsigned long *si_imap;
+   struct buffer_head *si_sbh; /* buffer header w/superblock */
 };
 
 /*
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/9] bfs: coding style cleanup in fs/bfs/file.c

2008-01-25 Thread Dmitri Vorobiev
Clean up errors found by checkpatch.pl.

Before the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/file.c | grep total
total: 6 errors, 0 warnings, 191 lines checked

After the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/file.c | grep total
total: 0 errors, 0 warnings, 191 lines checked

No functional changes introduced.

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/file.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/bfs/file.c b/fs/bfs/file.c
index b11e63e..f32b2a2 100644
--- a/fs/bfs/file.c
+++ b/fs/bfs/file.c
@@ -55,7 +55,7 @@ static int bfs_move_blocks(struct super_block *sb, unsigned 
long start,
 
dprintf(%08lx-%08lx-%08lx\n, start, end, where);
for (i = start; i = end; i++)
-   if(bfs_move_block(i, where + i, sb)) {
+   if (bfs_move_block(i, where + i, sb)) {
dprintf(failed to move block %08lx - %08lx\n, i,
where + i);
return -EIO;
@@ -77,7 +77,7 @@ static int bfs_get_block(struct inode *inode, sector_t block,
if (!create) {
if (phys = bi-i_eblock) {
dprintf(c=%d, b=%08lx, phys=%09lx (granted)\n,
-create, (unsigned long)block, phys);
+   create, (unsigned long)block, phys);
map_bh(bh_result, sb, phys);
}
return 0;
@@ -88,7 +88,7 @@ static int bfs_get_block(struct inode *inode, sector_t block,
 * range of blocks allocated for this file, we can grant it.
 */
if (bi-i_sblock  (phys = bi-i_eblock)) {
-   dprintf(c=%d, b=%08lx, phys=%08lx (interim block granted)\n, 
+   dprintf(c=%d, b=%08lx, phys=%08lx (interim block granted)\n,
create, (unsigned long)block, phys);
map_bh(bh_result, sb, phys);
return 0;
@@ -107,7 +107,7 @@ static int bfs_get_block(struct inode *inode, sector_t 
block,
 * anywhere.
 */
if (bi-i_eblock == info-si_lf_eblk) {
-   dprintf(c=%d, b=%08lx, phys=%08lx (simple extension)\n, 
+   dprintf(c=%d, b=%08lx, phys=%08lx (simple extension)\n,
create, (unsigned long)block, phys);
map_bh(bh_result, sb, phys);
info-si_freeb -= phys - bi-i_eblock;
@@ -126,7 +126,7 @@ static int bfs_get_block(struct inode *inode, sector_t 
block,
}
 
if (bi-i_sblock) {
-   err = bfs_move_blocks(inode-i_sb, bi-i_sblock, 
+   err = bfs_move_blocks(inode-i_sb, bi-i_sblock,
bi-i_eblock, phys);
if (err) {
dprintf(failed to move ino=%08lx - fs corruption\n,
@@ -137,7 +137,7 @@ static int bfs_get_block(struct inode *inode, sector_t 
block,
err = 0;
 
dprintf(c=%d, b=%08lx, phys=%08lx (moved)\n,
-create, (unsigned long)block, phys);
+   create, (unsigned long)block, phys);
bi-i_sblock = phys;
phys += block;
info-si_lf_eblk = bi-i_eblock = phys;
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/9] bfs: move function prototype to the proper header file

2008-01-25 Thread Dmitri Vorobiev
The dump_imap() routine is defined in bs/bfs/inode.c and used both in
the same file and in fs/bfs/dir.c. This patch adds an extern function
declaration to the private bfs.h header file.

The effect is that one warning issued by checkpatch.pl is gone.

Before the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/dir.c | grep total
total: 0 errors, 1 warnings, 370 lines checked

After the patch:

$ ./scripts/checkpatch.pl --file  fs/bfs/dir.c | grep total
total: 0 errors, 0 warnings, 368 lines checked

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/bfs.h   |3 +++
 fs/bfs/dir.c   |2 --
 fs/bfs/inode.c |2 --
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/bfs/bfs.h b/fs/bfs/bfs.h
index 090b96e..ecc74bb 100644
--- a/fs/bfs/bfs.h
+++ b/fs/bfs/bfs.h
@@ -54,4 +54,7 @@ extern const struct address_space_operations bfs_aops;
 extern const struct inode_operations bfs_dir_inops;
 extern const struct file_operations bfs_dir_operations;
 
+/* inode.c */
+void dump_imap(const char *, struct super_block *);
+
 #endif /* _FS_BFS_BFS_H */
diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index 5462a5b..2964505 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -81,8 +81,6 @@ const struct file_operations bfs_dir_operations = {
.fsync  = file_fsync,
 };
 
-extern void dump_imap(const char *, struct super_block *);
-
 static int bfs_create(struct inode *dir, struct dentry *dentry, int mode,
struct nameidata *nd)
 {
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 5191990..91d5686 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -30,8 +30,6 @@ MODULE_LICENSE(GPL);
 #define dprintf(x...)
 #endif
 
-void dump_imap(const char *prefix, struct super_block *s);
-
 static void bfs_read_inode(struct inode *inode)
 {
unsigned long ino = inode-i_ino;
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-25 Thread david

On Thu, 24 Jan 2008, Andreas Dilger wrote:


On Jan 24, 2008  23:36 +0300, Al Boldi wrote:

data=ordered mode has proven reliable over the years, and it does this by
ordering filedata flushes before metadata flushes.  But this sometimes
causes contention in the order of a 10x slowdown for certain apps, either
due to the misuse of fsync or due to inherent behaviour like db's, as well
as inherent starvation issues exposed by the data=ordered mode.

data=writeback mode alleviates data=order mode slowdowns, but only works
per-mount and is too dangerous to run as a default mode.

This RFC proposes to introduce a tunable which allows to disable fsync and
changes ordered into writeback writeout on a per-process basis like this:

  echo 1  /proc/`pidof process`/softsync


If fsync performance is an issue for you, run the filesystem in data=journal
mode, put the journal on a separate disk and make it big enough that you
don't block on it to flush the data to the filesystem (but not so big that
it is consuming all of your RAM).


my understanding is that the journal is limited to 128M or so. This 
prevents you from making it big enough to avoid all problems.


David Lang


That keeps your data guarantees without hurting performance.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-25 Thread Andreas Dilger
On Jan 24, 2008  23:36 +0300, Al Boldi wrote:
 data=ordered mode has proven reliable over the years, and it does this by 
 ordering filedata flushes before metadata flushes.  But this sometimes 
 causes contention in the order of a 10x slowdown for certain apps, either 
 due to the misuse of fsync or due to inherent behaviour like db's, as well 
 as inherent starvation issues exposed by the data=ordered mode.
 
 data=writeback mode alleviates data=order mode slowdowns, but only works 
 per-mount and is too dangerous to run as a default mode.
 
 This RFC proposes to introduce a tunable which allows to disable fsync and 
 changes ordered into writeback writeout on a per-process basis like this:
 
   echo 1  /proc/`pidof process`/softsync

If fsync performance is an issue for you, run the filesystem in data=journal
mode, put the journal on a separate disk and make it big enough that you
don't block on it to flush the data to the filesystem (but not so big that
it is consuming all of your RAM).

That keeps your data guarantees without hurting performance.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-25 Thread Andreas Dilger
On Jan 24, 2008  17:25 -0700, Zan Lynx wrote:
 Have y'all been following the /dev/mem_notify patches?
 http://article.gmane.org/gmane.linux.kernel/628653

Having the notification be via poll() is a very restrictive processing
model.  Having the notification be via a signal means that any kind of
process (and not just those that are event loop driven) can register
a callback at some arbitrary point in the code and be notified.  I
don't object to the poll() interface, but it would be good to have a
signal mechanism also.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Theodore Tso
On Fri, Jan 25, 2008 at 10:34:25AM -0600, Eric Sandeen wrote:
  But it was this concern which is why ext3 never exported freeze
  functionality to userspace, even though other commercial filesystems
  do support this.  It wasn't that it wasn't considered, but the concern
  about whether or not it was sufficiently safe to make available.
 
 What's the safety concern; that the admin will forget to unfreeze?

That the admin would manage to deadlock him/herself and wedge up the
whole system...

 I'm also not sure I see the point of the timeout in the original patch;
 either you are done snapshotting and ready to unfreeze, or you're not;
 1, or 2, or 3 seconds doesn't really matter.  When you're done, you're
 done, and you can only unfreeze then.  Shouldn't this be done
 programmatically, and not with some pre-determined timeout?

This is only a guess, but I suspect it was a fail-safe in case the
admin did manage to deadlock him/herself.  

I would think a better approach would be to make the filesystem
unfreeze if the file descriptor that was used to freeze the filesystem
is closed, and then have explicit deadlock detection that kills the
process doing the freeze, at which point the filesystem unlocks and
the system can recover.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Eric Sandeen
Theodore Tso wrote:
 The other approach would be to say, oh well, the freeze ioctl is
 inherently dangerous, and root is allowed to himself in the foot, so
 who cares.  :-)

I tend to agree.  Either you need your fs frozen, or not, and if you do,
be prepared for the consequences.

 But it was this concern which is why ext3 never exported freeze
 functionality to userspace, even though other commercial filesystems
 do support this.  It wasn't that it wasn't considered, but the concern
 about whether or not it was sufficiently safe to make available.

What's the safety concern; that the admin will forget to unfreeze?

 And I do agree that we probably should just implement this in
 filesystem independent way, in which case all of the filesystems that
 support this already have super_operations functions
 write_super_lockfs() and unlockfs().

That's what I was thinking; can't the path to freeze_bdev just be
elevated out of dm-ioctl.c to fs/ioctl.c and exposed, such that any
filesystem which implements .write_super_lockfs can be frozen?  This is
essentially what the xfs_freeze userspace does via
xfs_ioctl/XFS_IOC_FREEZE - which, AFAIK, isn't used much now that the
lvm hooks are in place.

I'm also not sure I see the point of the timeout in the original patch;
either you are done snapshotting and ready to unfreeze, or you're not;
1, or 2, or 3 seconds doesn't really matter.  When you're done, you're
done, and you can only unfreeze then.  Shouldn't this be done
programmatically, and not with some pre-determined timeout?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 18/26] mount options: fix isofs

2008-01-25 Thread Jan Kara
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add a .show_options super operation to isofs.
 
 Use generic_show_options() and save the complete option string in
 isofs_fill_super().
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---
 
 Index: linux/fs/isofs/inode.c
 ===
 --- linux.orig/fs/isofs/inode.c   2008-01-17 19:00:55.0 +0100
 +++ linux/fs/isofs/inode.c2008-01-23 22:07:51.0 +0100
 @@ -110,6 +110,7 @@ static const struct super_operations iso
   .put_super  = isofs_put_super,
   .statfs = isofs_statfs,
   .remount_fs = isofs_remount,
 + .show_options   = generic_show_options,
  };
  
  
 @@ -554,6 +555,8 @@ static int isofs_fill_super(struct super
   int table, error = -EINVAL;
   unsigned int vol_desc_start;
  
 + save_mount_options(s, data);
 +
   sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
   if (!sbi)
   return -ENOMEM;
  Looks, fine.
  Acked-by: Jan Kara [EMAIL PROTECTED]


Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Theodore Tso
On Fri, Jan 25, 2008 at 03:18:51PM +0300, Dmitri Monakhov wrote:
 First of all Linux already have at least one open-source(dm-snap),
 and several commercial snapshot solutions. 

Yes, but it requires that the filesystem be stored under LVM.  Unlike
what EVMS v1 allowed us to do, we can't currently take a snapshot of a
bare block device.  This patch could potentially be useful for systems
which aren't using LVM, however

 You have to realize what delay between 1-3 stages have to be minimal.
 for example dm-snap perform it only for explicit journal flushing.
 From my experience if delay is more than 4-5 seconds whole system becomes
 unstable.

That's the problem.  You can't afford to freeze for very long.

What you *could* do is to start putting processes to sleep if they
attempt to write to the frozen filesystem, and then detect the
deadlock case where the process holding the file descriptor used to
freeze the filesystem gets frozen because it attempted to write to the
filesystem --- at which point it gets some kind of signal (which
defaults to killing the process), and the filesystem is unfrozen and
as part of the unfreeze you wake up all of the processes that were put
to sleep for touching the frozen filesystem.

The other approach would be to say, oh well, the freeze ioctl is
inherently dangerous, and root is allowed to himself in the foot, so
who cares.  :-)

But it was this concern which is why ext3 never exported freeze
functionality to userspace, even though other commercial filesystems
do support this.  It wasn't that it wasn't considered, but the concern
about whether or not it was sufficiently safe to make available.

And I do agree that we probably should just implement this in
filesystem independent way, in which case all of the filesystems that
support this already have super_operations functions
write_super_lockfs() and unlockfs().

So if this is done using a new system call, there should be no
filesystem-specific changes needed, and all filesystems which support
those super_operations method functions would be able to provide this
functionality to the new system call.

 - Ted

P.S.  Oh yeah, it should be noted that freezing at the filesystem
layer does *not* guarantee that changes to the block device aren't
happening via mmap()'ed files.  The LVM needs to freeze writes the
block device level if it wants to guarantee a completely stable
snapshot image.  So the proposed patch doens't quite give you those
guarantees, if that was the intended goal.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-25 Thread Bodo Eggert
On Fri, 25 Jan 2008, Bryan Henderson wrote:

  AIX basically did this with SIGDANGER (the signal is ignored by
  default), except there wasn't the ability for the process to tell the
  kernel at what level of memory pressure before it should start getting
  notified, and there was no way for the kernel to tell how bad the
  memory pressure actually was.  On the other hand, it was a relatively
  simple design.
 
 AIX does provide a system call to find out how much paging backing store 
 space is available and the thresholds set by the system administrator. 
 Running out of paging space is the only memory pressure AIX is concerned 
 about.  While I think having processes make memory usage decisions based 
 on that is a shoddy way to manage system resources, that's what it is 
 intended for.

If you start partitioning the system into virtual servers (or something
similar), being close to swapping may be somebody else's problem.
(They shouldn't have exceeded their guaranteed memory limit).


 Incidentally, some context for the AIX approach to the OOM problem: a 
 process may exclude itself from OOM vulnerability altogether.  It places 
 itself in early allocation mode, which means at the time it creates 
 virtual memory, it reserves enough backing store for the worst case.  The 
 memory manager does not send such a process the SIGDANGER signal or 
 terminate it when it runs out of paging space.  Before c. 2000, this was 
 the only mode.  Now the default is late allocation mode, which is similar 
 to Linux.

This is an interesting approach. It feels like some programs might be 
interested in choosing this mode instead of risking OOM. 
-- 
The programmer's National Anthem is ''
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/9] bfs: coding style cleanup in include/linux/bfs_fs.h

2008-01-25 Thread Dmitri Vorobiev
Clean up errors found by checkpatch.pl.

Before the patch:

$ ./scripts/checkpatch.pl --file  include/linux/bfs_fs.h | grep total
total: 5 errors, 3 warnings, 80 lines checked

After the patch:

$ ./scripts/checkpatch.pl --file  include/linux/bfs_fs.h | grep total
total: 0 errors, 0 warnings, 83 lines checked

No functional changes introduced.

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 include/linux/bfs_fs.h |   17 ++---
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/bfs_fs.h b/include/linux/bfs_fs.h
index 8ed6dfd..d7b11a6 100644
--- a/include/linux/bfs_fs.h
+++ b/include/linux/bfs_fs.h
@@ -36,7 +36,7 @@ struct bfs_inode {
__u32 i_padding[4];
 };
 
-#define BFS_NAMELEN14  
+#define BFS_NAMELEN14
 #define BFS_DIRENT_SIZE16
 #define BFS_DIRS_PER_BLOCK 32
 
@@ -61,20 +61,23 @@ struct bfs_super_block {
 
 
 #define BFS_OFF2INO(offset) \
-offset) - BFS_BSIZE) / sizeof(struct bfs_inode)) + BFS_ROOT_INO)
+   offset) - BFS_BSIZE) / sizeof(struct bfs_inode)) + BFS_ROOT_INO)
 
 #define BFS_INO2OFF(ino) \
((__u32)(((ino) - BFS_ROOT_INO) * sizeof(struct bfs_inode)) + BFS_BSIZE)
 #define BFS_NZFILESIZE(ip) \
-((le32_to_cpu((ip)-i_eoffset) + 1) -  le32_to_cpu((ip)-i_sblock) * 
BFS_BSIZE)
+   ((le32_to_cpu((ip)-i_eoffset) + 1) - \
+le32_to_cpu((ip)-i_sblock) * BFS_BSIZE)
 
 #define BFS_FILESIZE(ip) \
-((ip)-i_sblock == 0 ? 0 : BFS_NZFILESIZE(ip))
+   ((ip)-i_sblock == 0 ? 0 : BFS_NZFILESIZE(ip))
 
 #define BFS_FILEBLOCKS(ip) \
-((ip)-i_sblock == 0 ? 0 : (le32_to_cpu((ip)-i_eblock) + 1) -  
le32_to_cpu((ip)-i_sblock))
+   ((ip)-i_sblock == 0 ? 0 : (le32_to_cpu((ip)-i_eblock) + 1) - \
+le32_to_cpu((ip)-i_sblock))
 #define BFS_UNCLEAN(bfs_sb, sb)\
-   ((le32_to_cpu(bfs_sb-s_from) != -1)  (le32_to_cpu(bfs_sb-s_to) != 
-1)  !(sb-s_flags  MS_RDONLY))
-
+   ((le32_to_cpu(bfs_sb-s_from) != -1)  \
+   (le32_to_cpu(bfs_sb-s_to) != -1)  \
+   !(sb-s_flags  MS_RDONLY))
 
 #endif /* _LINUX_BFS_FS_H */
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/9] bfs: remove a useless variable

2008-01-25 Thread Dmitri Vorobiev
In the bfs_fill_super() routine, the sblock variable is declared
and assigned a value. However, this value is never used. This trivial
patch removes this useless variable.

This was compile-tested by building the bfs driver both as a module
and as a part of the kernel proper. Both build finished successfully.

Run time test was performed using a BFS image and verifying that the
filesystem remained functional after the change.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/inode.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index a64a71d..2284657 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -368,7 +368,7 @@ static int bfs_fill_super(struct super_block *s, void 
*data, int silent)
struct bfs_inode *di;
int block = (i - BFS_ROOT_INO) / BFS_INODES_PER_BLOCK + 1;
int off = (i - BFS_ROOT_INO) % BFS_INODES_PER_BLOCK;
-   unsigned long sblock, eblock;
+   unsigned long eblock;
 
if (!off) {
brelse(bh);
@@ -387,7 +387,6 @@ static int bfs_fill_super(struct super_block *s, void 
*data, int silent)
set_bit(i, info-si_imap);
info-si_freeb -= BFS_FILEBLOCKS(di);
 
-   sblock =  le32_to_cpu(di-i_sblock);
eblock =  le32_to_cpu(di-i_eblock);
if (eblock  info-si_lf_eblk)
info-si_lf_eblk = eblock;
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Cyrill Gorcunov
[Miklos Szeredi - Fri, Jan 25, 2008 at 10:29:21AM +0100]
|  | + /* is this correct? */
|  | + if (sbi-s_anchor[2] != 0)
|  | + seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
|  
|  you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
|  in sake of style unification but we should wait for Jan's
|  decision (i'm not the expert in this area ;)
| 
| I think UDF_SB_ANCHOR macro was removed by some patch in -mm.
| 
| I'm more interested if the second element of the s_anchor array really
| does always have the value of the 'anchor=N' mount option.  I haven't
| been able to verify that fully.  Do you have some insight into that?
| 
| Thanks,
| Miklos
| 

Hello Miklos,

well, actually - no. anchor entities can be set to 0 if we have been failed
to read them in udf_find_anchor(). So it seems you've to use some
additional flag to store it.

Btw, Miklos the patch is over -mm tree?

- Cyrill -
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-25 Thread Bryan Henderson
 AIX basically did this with SIGDANGER (the signal is ignored by
 default), except there wasn't the ability for the process to tell the
 kernel at what level of memory pressure before it should start getting
 notified, and there was no way for the kernel to tell how bad the
 memory pressure actually was.  On the other hand, it was a relatively
 simple design.

AIX does provide a system call to find out how much paging backing store 
space is available and the thresholds set by the system administrator. 
Running out of paging space is the only memory pressure AIX is concerned 
about.  While I think having processes make memory usage decisions based 
on that is a shoddy way to manage system resources, that's what it is 
intended for.

Incidentally, some context for the AIX approach to the OOM problem: a 
process may exclude itself from OOM vulnerability altogether.  It places 
itself in early allocation mode, which means at the time it creates 
virtual memory, it reserves enough backing store for the worst case.  The 
memory manager does not send such a process the SIGDANGER signal or 
terminate it when it runs out of paging space.  Before c. 2000, this was 
the only mode.  Now the default is late allocation mode, which is similar 
to Linux.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 12/26] mount options: fix ext4

2008-01-25 Thread Mingming Cao
On Thu, 2008-01-24 at 20:33 +0100, Miklos Szeredi wrote:
 plain text document attachment (ext4_opts.patch)
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add stripe= option to /proc/mounts for ext4 filesystems.
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---
 
 Index: linux/fs/ext4/super.c
 ===
 --- linux.orig/fs/ext4/super.c2008-01-23 12:57:07.0 +0100
 +++ linux/fs/ext4/super.c 2008-01-23 21:43:51.0 +0100
 @@ -742,7 +742,8 @@ static int ext4_show_options(struct seq_
   seq_puts(seq, ,nomballoc);
   if (!test_opt(sb, DELALLOC))
   seq_puts(seq, ,nodelalloc);
 -
 + if (sbi-s_stripe)
 + seq_printf(seq, ,stripe=%lu, sbi-s_stripe);
 
   /*
* journal mode get enabled in different ways
 

Added to ext4 patch queue. Thanks!
http://repo.or.cz/w/ext4-patch-queue.git

Mingming


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 11/26] mount options: fix ext2

2008-01-25 Thread Jan Kara
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add noreservation option to /proc/mounts for ext2 filesystems.
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---
 
 Index: linux/fs/ext2/super.c
 ===
 --- linux.orig/fs/ext2/super.c2008-01-17 19:00:55.0 +0100
 +++ linux/fs/ext2/super.c 2008-01-23 21:38:08.0 +0100
 @@ -285,6 +285,9 @@ static int ext2_show_options(struct seq_
   seq_puts(seq, ,xip);
  #endif
  
 + if (!test_opt(sb, RESERVATION))
 + seq_puts(seq, ,noreservation);
 +
   return 0;
  }
  Looks fine. Acked-by: Jan Kara [EMAIL PROTECTED]

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 12/26] mount options: fix ext4

2008-01-25 Thread Jan Kara
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add stripe= option to /proc/mounts for ext4 filesystems.
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---
 
 Index: linux/fs/ext4/super.c
 ===
 --- linux.orig/fs/ext4/super.c2008-01-23 12:57:07.0 +0100
 +++ linux/fs/ext4/super.c 2008-01-23 21:43:51.0 +0100
 @@ -742,7 +742,8 @@ static int ext4_show_options(struct seq_
   seq_puts(seq, ,nomballoc);
   if (!test_opt(sb, DELALLOC))
   seq_puts(seq, ,nodelalloc);
 -
 + if (sbi-s_stripe)
 + seq_printf(seq, ,stripe=%lu, sbi-s_stripe);
  I think this should go via ext4 patch queue... Besides that the patch
is fine. Mingming, will you pickup the patch?

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)

2008-01-25 Thread Jan Kara
On Mon 14-01-08 13:14:54, Chris Mason wrote:
 On Mon, 14 Jan 2008 18:06:09 +0100
 Jan Kara [EMAIL PROTECTED] wrote:
  On Wed 02-01-08 12:42:19, Zach Brown wrote:
   Erez Zadok wrote:
Setting: ltp-full-20071031, dio01 test on ext3 with Linus's
latest tree. Kernel w/ SMP, preemption, and lockdep configured.
   
   This is a real lock ordering problem.  Thanks for reporting it.
   
   The updating of atime inside sys_mmap() orders the mmap_sem in the
   vfs outside of the journal handle in ext3's inode dirtying:
   
 
 [ lock inversion traces ]
  
   Two fixes come to mind:
   
   1) use something like Peter's -mmap_prepare() to update atime
   before acquiring the mmap_sem.
   ( http://lkml.org/lkml/2007/11/11/97 ).  I don't know if this would
   leave more paths which do a journal_start() while holding the
   mmap_sem.
   
   2) rework ext3's dio to only hold the jbd handle in
   ext3_get_block(). Chris has a patch for this kicking around
   somewhere but I'm told it has problems exposing old blocks in
   ordered data mode.
   
   Does anyone have preferences?  I could go either way.  I certainly
   don't like the idea of journal handles being held across the
   entirety of fs/direct-io.c.  It's yet another case of O_DIRECT
   differing wildly from the buffered path :(.
I've looked more into it and I think that 2) is the only way to go
  since transaction start ranks below page lock (standard buffered
  write path) and page lock ranks below mmap_sem. So we have at least
  one more dependency mmap_sem must go before transaction start...
 
 Just to clarify a little bit:
 
 If ext3's DIO code only touches transactions in get_block, then it can
 violate data=ordered rules.  Basically the transaction that allocates
 the blocks might commit before the DIO code gets around to writing them.
 
 A crash in the wrong place will expose stale data on disk.
  Hmm, I've looked at it and I don't think so - look at the rationale in
the patch below... That patch should fix the lock-inversion problem (at
least I see no lockdep warnings on my test machine).

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
---

We cannot start transaction in ext3_direct_IO() and just let it last during the
whole write because dio_get_page() acquires mmap_sem which ranks above
transaction start (e.g. because we have dependency chain
mmap_sem-PageLock-journal_start, or because we update atime while holding
mmap_sem) and thus deadlocks could happen. We solve the problem by starting
a transaction separately for each ext3_get_block() call.

We *could* have a problem that we allocate a block and before its data are
written out the machine crashes and thus we expose stale data. But that
does not happen because for hole-filling generic code falls back to buffered
writes and for file extension, we add inode to orphan list and thus in
case of crash, journal replay will truncate inode back to the original size.

Signed-off-by: Jan Kara [EMAIL PROTECTED]

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 9b162cd..5ab7c57 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -941,55 +941,45 @@ out:
return err;
 }
 
-#define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
+/* Maximum number of blocks we map for direct IO at once. */
+#define DIO_MAX_BLOCKS 4096
+/*
+ * Number of credits we need for writing DIO_MAX_BLOCKS:
+ * We need sb + group descriptor + bitmap + inode - 4
+ * For B blocks with A block pointers per block we need:
+ * 1 (triple ind.) + (B/A/A + 2) (doubly ind.) + (B/A + 2) (indirect).
+ * If we plug in 4096 for B and 256 for A (for 1KB block size), we get 25.
+ */
+#define DIO_CREDITS 25
 
 static int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
 {
handle_t *handle = ext3_journal_current_handle();
-   int ret = 0;
+   int ret = 0, started = 0;
unsigned max_blocks = bh_result-b_size  inode-i_blkbits;
 
-   if (!create)
-   goto get_block; /* A read */
-
-   if (max_blocks == 1)
-   goto get_block; /* A single block get */
-
-   if (handle-h_transaction-t_state == T_LOCKED) {
-   /*
-* Huge direct-io writes can hold off commits for long
-* periods of time.  Let this commit run.
-*/
-   ext3_journal_stop(handle);
-   handle = ext3_journal_start(inode, DIO_CREDITS);
-   if (IS_ERR(handle))
+   if (create  !handle) {/* Direct IO write... */
+   if (max_blocks  DIO_MAX_BLOCKS)
+   max_blocks = DIO_MAX_BLOCKS;
+   handle = ext3_journal_start(inode, DIO_CREDITS +
+   2 * EXT3_QUOTA_TRANS_BLOCKS(sb));
+   if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
-   goto get_block;
-   }
-
-   

Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Jan Kara
On Fri 25-01-08 16:50:15, Miklos Szeredi wrote:
| + /* is this correct? */
| + if (sbi-s_anchor[2] != 0)
| + seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);

you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
in sake of style unification but we should wait for Jan's
decision (i'm not the expert in this area ;)
   
   I think UDF_SB_ANCHOR macro was removed by some patch in -mm.
Yes, it's going to be removed so don't use it. Actually, basing this
  patch on top of -mm is a good idea because there are quite some changes
  in Andrew's queue.
  
   I'm more interested if the second element of the s_anchor array really
   does always have the value of the 'anchor=N' mount option.  I haven't
   been able to verify that fully.  Do you have some insight into that?
As Cyrill wrote, it could be zeroed out in case there is no anchor in
  the specified block. So I guess you have to store the passed value
  somewhere else..
 
 But in that case, would the value of the anchor= option matter?
  No, it would not.

 This is actually a somewhat philosophical question about what the
 mount options in /proc/mounts mean:
 
  1) Options _given_ by the user for the mount
  2) Options which are _effective_ for the mount
 
 If we take interpretation 2) and there was no anchor (whatever that
 means), then the anchor=N option wasn't effective, and not giving it
 would have had the same effect.
 
 This could be confusing to the user, though...
  Hmm, given that options are modified by remount for some filesystems,
it's probably the best to display the effective state. So your code should
display the right thing as it is.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Miklos Szeredi
   | +   /* is this correct? */
   | +   if (sbi-s_anchor[2] != 0)
   | +   seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
   
   you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
   in sake of style unification but we should wait for Jan's
   decision (i'm not the expert in this area ;)
  
  I think UDF_SB_ANCHOR macro was removed by some patch in -mm.
   Yes, it's going to be removed so don't use it. Actually, basing this
 patch on top of -mm is a good idea because there are quite some changes
 in Andrew's queue.
 
  I'm more interested if the second element of the s_anchor array really
  does always have the value of the 'anchor=N' mount option.  I haven't
  been able to verify that fully.  Do you have some insight into that?
   As Cyrill wrote, it could be zeroed out in case there is no anchor in
 the specified block. So I guess you have to store the passed value
 somewhere else..

But in that case, would the value of the anchor= option matter?

This is actually a somewhat philosophical question about what the
mount options in /proc/mounts mean:

 1) Options _given_ by the user for the mount
 2) Options which are _effective_ for the mount

If we take interpretation 2) and there was no anchor (whatever that
means), then the anchor=N option wasn't effective, and not giving it
would have had the same effect.

This could be confusing to the user, though...

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Jan Kara
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add a .show_options super operation to udf.
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---
 
 Index: linux/fs/udf/super.c
 ===
 --- linux.orig/fs/udf/super.c 2008-01-24 13:48:37.0 +0100
 +++ linux/fs/udf/super.c  2008-01-24 15:58:21.0 +0100
 @@ -53,6 +53,8 @@
  #include linux/vfs.h
  #include linux/vmalloc.h
  #include linux/errno.h
 +#include linux/mount.h
 +#include linux/seq_file.h
  #include asm/byteorder.h
  
  #include linux/udf_fs.h
 @@ -71,6 +73,8 @@
  #define VDS_POS_TERMINATING_DESC 6
  #define VDS_POS_LENGTH   7
  
 +#define UDF_DEFAULT_BLOCKSIZE 2048
 +
  static char error_buf[1024];
  
  /* These are the meat - everything else is stuffing */
 @@ -95,6 +99,7 @@ static void udf_open_lvid(struct super_b
  static void udf_close_lvid(struct super_block *);
  static unsigned int udf_count_free(struct super_block *);
  static int udf_statfs(struct dentry *, struct kstatfs *);
 +static int udf_show_options(struct seq_file *, struct vfsmount *);
  
  struct logicalVolIntegrityDescImpUse *udf_sb_lvidiu(struct udf_sb_info *sbi)
  {
 @@ -181,6 +186,7 @@ static const struct super_operations udf
   .write_super= udf_write_super,
   .statfs = udf_statfs,
   .remount_fs = udf_remount_fs,
 + .show_options   = udf_show_options,
  };
  
  struct udf_options {
 @@ -247,6 +253,56 @@ static int udf_sb_alloc_partition_maps(s
   return 0;
  }
  
 +static int udf_show_options(struct seq_file *seq, struct vfsmount *mnt)
 +{
 + struct super_block *sb = mnt-mnt_sb;
 + struct udf_sb_info *sbi = UDF_SB(sb);
 +
 + if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT))
 + seq_puts(seq, ,nostrict);
 + if (sb-s_blocksize != UDF_DEFAULT_BLOCKSIZE)
 + seq_printf(seq, ,bs=%lu, sb-s_blocksize);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_UNHIDE))
 + seq_puts(seq, ,unhide);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_UNDELETE))
 + seq_puts(seq, ,undelete);
 + if (!UDF_QUERY_FLAG(sb, UDF_FLAG_USE_AD_IN_ICB))
 + seq_puts(seq, ,noadinicb);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_USE_SHORT_AD))
 + seq_puts(seq, ,shortad);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_UID_FORGET))
 + seq_puts(seq, ,uid=forget);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_UID_IGNORE))
 + seq_puts(seq, ,uid=ignore);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_GID_FORGET))
 + seq_puts(seq, ,gid=forget);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_GID_IGNORE))
 + seq_puts(seq, ,gid=ignore);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_UID_SET))
 + seq_printf(seq, ,uid=%u, sbi-s_uid);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_GID_SET))
 + seq_printf(seq, ,gid=%u, sbi-s_gid);
 + if (sbi-s_umask != 0)
 + seq_printf(seq, ,umask=%o, sbi-s_umask);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_SESSION_SET))
 + seq_printf(seq, ,session=%u, sbi-s_session);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_LASTBLOCK_SET))
 + seq_printf(seq, ,lastblock=%u, sbi-s_last_block);
 + /* is this correct? */
 + if (sbi-s_anchor[2] != 0)
 + seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
 + /*
 +  * volume, partition, fileset and rootdir seem to be ignored
 +  * currently
 +  */
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8))
 + seq_puts(seq, ,utf8);
 + if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)  sbi-s_nls_map)
 + seq_printf(seq, ,iocharset=%s, sbi-s_nls_map-charset);
 +
 + return 0;
 +}
 +
  /*
   * udf_parse_options
   *
 @@ -339,13 +395,14 @@ static match_table_t tokens = {
   {Opt_err,   NULL}
  };
  
 -static int udf_parse_options(char *options, struct udf_options *uopt)
 +static int udf_parse_options(char *options, struct udf_options *uopt,
 +  bool remount)
  {
  Please use just 'int' for 'remount' option. We are slowly trying to
get rid of these strange things in UDF code so adding new ones isn't
desirable.

   char *p;
   int option;
  
   uopt-novrs = 0;
 - uopt-blocksize = 2048;
 + uopt-blocksize = UDF_DEFAULT_BLOCKSIZE;
   uopt-partition = 0x;
   uopt-session = 0x;
   uopt-lastblock = 0;
 @@ -415,11 +472,15 @@ static int udf_parse_options(char *optio
   if (match_int(args, option))
   return 0;
   uopt-session = option;
 + if (!remount)
 + uopt-flags |= (1  UDF_FLAG_SESSION_SET);
   break;
   case Opt_lastblock:
   if (match_int(args, option))
   return 0;
   uopt-lastblock = option;
 + if (!remount)
 + uopt-flags |= (1  UDF_FLAG_LASTBLOCK_SET);

Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Jan Kara
  | + /* is this correct? */
  | + if (sbi-s_anchor[2] != 0)
  | + seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
  
  you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
  in sake of style unification but we should wait for Jan's
  decision (i'm not the expert in this area ;)
 
 I think UDF_SB_ANCHOR macro was removed by some patch in -mm.
  Yes, it's going to be removed so don't use it. Actually, basing this
patch on top of -mm is a good idea because there are quite some changes
in Andrew's queue.

 I'm more interested if the second element of the s_anchor array really
 does always have the value of the 'anchor=N' mount option.  I haven't
 been able to verify that fully.  Do you have some insight into that?
  As Cyrill wrote, it could be zeroed out in case there is no anchor in
the specified block. So I guess you have to store the passed value
somewhere else..

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-25 Thread Jan Kara
 Greetings!
 
 data=ordered mode has proven reliable over the years, and it does this by 
 ordering filedata flushes before metadata flushes.  But this sometimes 
 causes contention in the order of a 10x slowdown for certain apps, either 
 due to the misuse of fsync or due to inherent behaviour like db's, as well 
 as inherent starvation issues exposed by the data=ordered mode.
 
 data=writeback mode alleviates data=order mode slowdowns, but only works 
 per-mount and is too dangerous to run as a default mode.
 
 This RFC proposes to introduce a tunable which allows to disable fsync and 
 changes ordered into writeback writeout on a per-process basis like this:
 
   echo 1  /proc/`pidof process`/softsync
  I guess disabling fsync() was already commented on enough. Regarding
switching to writeback mode on per-process basis - not easily possible
because sometimes data is not written out by the process which stored
them (think of mmaped file). And in case of DB, they use direct-io
anyway most of the time so they don't care about journaling mode anyway.
  But as Diego wrote, there is definitely some room for improvement in
current data=ordered mode so the difference shouldn't be as big in the
end.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Cyrill Gorcunov
[Miklos Szeredi - Fri, Jan 25, 2008 at 04:50:15PM +0100]
|| + /* is this correct? */
|| + if (sbi-s_anchor[2] != 0)
|| + seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
|
|you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
|in sake of style unification but we should wait for Jan's
|decision (i'm not the expert in this area ;)
|   
|   I think UDF_SB_ANCHOR macro was removed by some patch in -mm.
|Yes, it's going to be removed so don't use it. Actually, basing this
|  patch on top of -mm is a good idea because there are quite some changes
|  in Andrew's queue.
|  
|   I'm more interested if the second element of the s_anchor array really
|   does always have the value of the 'anchor=N' mount option.  I haven't
|   been able to verify that fully.  Do you have some insight into that?
|As Cyrill wrote, it could be zeroed out in case there is no anchor in
|  the specified block. So I guess you have to store the passed value
|  somewhere else..
| 
| But in that case, would the value of the anchor= option matter?
| 
| This is actually a somewhat philosophical question about what the
| mount options in /proc/mounts mean:
| 
|  1) Options _given_ by the user for the mount
|  2) Options which are _effective_ for the mount
| 
| If we take interpretation 2) and there was no anchor (whatever that
| means), then the anchor=N option wasn't effective, and not giving it
| would have had the same effect.
| 
| This could be confusing to the user, though...
| 
| Thanks,
| Miklos
| 

I think _effective_ options is much more important - they could
show you that something bad happened (and if this zeroing of anchor
has been happened udf print debug message) Anyway, Miklos, I think
the options _given_ by a user does not mean anything in that case
because it just doesn't reveal what is being used in _real_.

- Cyrill -
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 25/26] mount options: fix udf

2008-01-25 Thread Miklos Szeredi
 On Fri 25-01-08 16:50:15, Miklos Szeredi wrote:
 | +   /* is this correct? */
 | +   if (sbi-s_anchor[2] != 0)
 | +   seq_printf(seq, ,anchor=%u, sbi-s_anchor[2]);
 
 you know, I would prefer to use form UDF_SB_ANCHOR(sb)[2]
 in sake of style unification but we should wait for Jan's
 decision (i'm not the expert in this area ;)

I think UDF_SB_ANCHOR macro was removed by some patch in -mm.
 Yes, it's going to be removed so don't use it. Actually, basing this
   patch on top of -mm is a good idea because there are quite some changes
   in Andrew's queue.
   
I'm more interested if the second element of the s_anchor array really
does always have the value of the 'anchor=N' mount option.  I haven't
been able to verify that fully.  Do you have some insight into that?
 As Cyrill wrote, it could be zeroed out in case there is no anchor in
   the specified block. So I guess you have to store the passed value
   somewhere else..
  
  But in that case, would the value of the anchor= option matter?
   No, it would not.
 
  This is actually a somewhat philosophical question about what the
  mount options in /proc/mounts mean:
  
   1) Options _given_ by the user for the mount
   2) Options which are _effective_ for the mount
  
  If we take interpretation 2) and there was no anchor (whatever that
  means), then the anchor=N option wasn't effective, and not giving it
  would have had the same effect.
  
  This could be confusing to the user, though...
   Hmm, given that options are modified by remount for some filesystems,
 it's probably the best to display the effective state. So your code should
 display the right thing as it is.

OK.  Cyrill, Jan, thanks for the reviews.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 21/26] mount options: partially fix nfs

2008-01-25 Thread Chuck Lever

On Jan 25, 2008, at 4:39 AM, Miklos Szeredi wrote:

Miklos Szeredi wrote:

From: Miklos Szeredi [EMAIL PROTECTED]

Add posix, bsize=, namelen= options to /proc/mounts for nfs
filesystems.

Document several other options that are still missing.


NFS lists only some options in /proc/mounts on purpose: only the
essential options are mentioned there to keep clutter down.  The  
three
you've added here are for all intents and purposes deprecated,  
which is

why they are not supported.

NFS lists a more complete set of mount options for a mount point in
/proc/self/mountstats.  See nfs_show_stats().

Since your cover letter does not explain why you are changing this  
code,

can you refer me to a description of why you are doing this?


Descritption is in the 01/26 patch.


More below.


Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux/fs/nfs/super.c
===
--- linux.orig/fs/nfs/super.c   2008-01-19 11:56:34.0 +0100
+++ linux/fs/nfs/super.c2008-01-21 20:41:30.0 +0100
@@ -449,6 +449,7 @@ static void nfs_show_mount_options(struc
} nfs_info[] = {
{ NFS_MOUNT_SOFT, ,soft, ,hard },
{ NFS_MOUNT_INTR, ,intr, ,nointr },
+   { NFS_MOUNT_POSIX, ,posix,  },
{ NFS_MOUNT_NOCTO, ,nocto,  },
{ NFS_MOUNT_NOAC, ,noac,  },
{ NFS_MOUNT_NONLM, ,nolock,  },
@@ -459,10 +460,17 @@ static void nfs_show_mount_options(struc
};
const struct proc_nfs_info *nfs_infop;
struct nfs_client *clp = nfss-nfs_client;
+   unsigned int default_namelen =
+   clp-rpc_ops-version == 4 ? NFS4_MAXNAMLEN :
+   clp-rpc_ops-version == 3 ? NFS3_MAXNAMLEN : NFS2_MAXNAMLEN;

seq_printf(m, ,vers=%d, clp-rpc_ops-version);
seq_printf(m, ,rsize=%d, nfss-rsize);
seq_printf(m, ,wsize=%d, nfss-wsize);
+   if (nfss-bsize != 0)
+   seq_printf(m, ,bsize=%d, nfss-bsize);
+   if (nfss-namelen != default_namelen)
+   seq_printf(m, ,namelen=%d, nfss-namelen);
if (nfss-acregmin != 3*HZ || showdefaults)
seq_printf(m, ,acregmin=%d, nfss-acregmin/HZ);
if (nfss-acregmax != 60*HZ || showdefaults)
@@ -482,6 +490,18 @@ static void nfs_show_mount_options(struc
 	seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout- 
to_initval / HZ);
 	seq_printf(m, ,retrans=%u, nfss-client-cl_timeout- 
to_retries);
 	seq_printf(m, ,sec=%s, nfs_pseudoflavour_to_name(nfss-client- 
cl_auth-au_flavor));

+
+   /*
+* Missing options:
+* port=


Probably should be supported.


+* addr=


This one is already supported; see nfs_show_options().


Right, thanks.




+* clientaddr=


This one isn't, and should be... would be useful for tracking down
certain NFSv4 problems.


+* mounthost=
+* mountaddr=
+* mountport=
+* mountvers=
+* mountproto=


And these mount* options are for the kernel's new mount protocol  
client.

  They aren't really useful for understanding steady-state NFS client
behavior, they only effect mount-time behavior.


All mount options should be shown, which are needed to reconstruct a
previous mount.


Ah, OK.

I'm happy to implement logic to display the all missing options.  I  
should have updated nfs_show_mount_options() when I wrote the NFS  
mount option parser.


Let me know your preference.


For example, if you copy options out from /proc/mount, umount the
filesystem, and then create a new mount with the copied options, you
should get the same mount.


For NFS, umount also needs to read some of the options in order to  
determine how mountd is to connect to the server for the unmount.   
(That's why we have addr= in the first place).


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 12/26] mount options: fix ext4

2008-01-25 Thread Andreas Dilger
On Jan 24, 2008  20:33 +0100, Miklos Szeredi wrote:
 Add stripe= option to /proc/mounts for ext4 filesystems.
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]

Acked-by: Andreas Dilger [EMAIL PROTECTED]

 Index: linux/fs/ext4/super.c
 ===
 --- linux.orig/fs/ext4/super.c2008-01-23 12:57:07.0 +0100
 +++ linux/fs/ext4/super.c 2008-01-23 21:43:51.0 +0100
 @@ -742,7 +742,8 @@ static int ext4_show_options(struct seq_
   seq_puts(seq, ,nomballoc);
   if (!test_opt(sb, DELALLOC))
   seq_puts(seq, ,nodelalloc);
 -
 + if (sbi-s_stripe)
 + seq_printf(seq, ,stripe=%lu, sbi-s_stripe);
  
   /*
* journal mode get enabled in different ways

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 8/9] bfs: remove multiple assignments

2008-01-25 Thread Dmitri Vorobiev
The checkpatch.pl reported several warnings about multiple variable
assignments in the BFS driver sources. This trivial patch fixes these
warnings by giving each assignment its own line.

No functional changes introduced.

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/dir.c   |   13 +
 fs/bfs/file.c  |6 --
 fs/bfs/inode.c |7 +--
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index 2964505..94fed7a 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -104,7 +104,9 @@ static int bfs_create(struct inode *dir, struct dentry 
*dentry, int mode,
info-si_freei--;
inode-i_uid = current-fsuid;
inode-i_gid = (dir-i_mode  S_ISGID) ? dir-i_gid : current-fsgid;
-   inode-i_mtime = inode-i_atime = inode-i_ctime = CURRENT_TIME_SEC;
+   inode-i_mtime = CURRENT_TIME_SEC;
+   inode-i_atime = CURRENT_TIME_SEC;
+   inode-i_ctime = CURRENT_TIME_SEC;
inode-i_blocks = 0;
inode-i_op = bfs_file_inops;
inode-i_fop = bfs_file_operations;
@@ -200,7 +202,8 @@ static int bfs_unlink(struct inode *dir, struct dentry 
*dentry)
}
de-ino = 0;
mark_buffer_dirty(bh);
-   dir-i_ctime = dir-i_mtime = CURRENT_TIME_SEC;
+   dir-i_ctime = CURRENT_TIME_SEC;
+   dir-i_mtime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
inode-i_ctime = dir-i_ctime;
inode_dec_link_count(inode);
@@ -220,7 +223,8 @@ static int bfs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
struct bfs_dirent *old_de, *new_de;
int error = -ENOENT;
 
-   old_bh = new_bh = NULL;
+   old_bh = NULL;
+   new_bh = NULL;
old_inode = old_dentry-d_inode;
if (S_ISDIR(old_inode-i_mode))
return -EINVAL;
@@ -252,7 +256,8 @@ static int bfs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
goto end_rename;
}
old_de-ino = 0;
-   old_dir-i_ctime = old_dir-i_mtime = CURRENT_TIME_SEC;
+   old_dir-i_ctime = CURRENT_TIME_SEC;
+   old_dir-i_mtime = CURRENT_TIME_SEC;
mark_inode_dirty(old_dir);
if (new_inode) {
new_inode-i_ctime = CURRENT_TIME_SEC;
diff --git a/fs/bfs/file.c b/fs/bfs/file.c
index f32b2a2..ab2fa66 100644
--- a/fs/bfs/file.c
+++ b/fs/bfs/file.c
@@ -111,7 +111,8 @@ static int bfs_get_block(struct inode *inode, sector_t 
block,
create, (unsigned long)block, phys);
map_bh(bh_result, sb, phys);
info-si_freeb -= phys - bi-i_eblock;
-   info-si_lf_eblk = bi-i_eblock = phys;
+   info-si_lf_eblk = phys;
+   bi-i_eblock = phys;
mark_inode_dirty(inode);
mark_buffer_dirty(sbh);
err = 0;
@@ -140,7 +141,8 @@ static int bfs_get_block(struct inode *inode, sector_t 
block,
create, (unsigned long)block, phys);
bi-i_sblock = phys;
phys += block;
-   info-si_lf_eblk = bi-i_eblock = phys;
+   info-si_lf_eblk = phys;
+   bi-i_eblock = phys;
 
/*
 * This assumes nothing can write the inode back while we are here
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 91d5686..7eefafb 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -157,7 +157,9 @@ static void bfs_delete_inode(struct inode *inode)
}
 
inode-i_size = 0;
-   inode-i_atime = inode-i_mtime = inode-i_ctime = CURRENT_TIME_SEC;
+   inode-i_atime = CURRENT_TIME_SEC;
+   inode-i_mtime = CURRENT_TIME_SEC;
+   inode-i_ctime = CURRENT_TIME_SEC;
lock_kernel();
mark_inode_dirty(inode);
 
@@ -213,7 +215,8 @@ static int bfs_statfs(struct dentry *dentry, struct kstatfs 
*buf)
buf-f_type = BFS_MAGIC;
buf-f_bsize = s-s_blocksize;
buf-f_blocks = info-si_blocks;
-   buf-f_bfree = buf-f_bavail = info-si_freeb;
+   buf-f_bfree = info-si_freeb;
+   buf-f_bavail = info-si_freeb;
buf-f_files = info-si_lasti + 1 - BFS_ROOT_INO;
buf-f_ffree = info-si_freei;
buf-f_fsid.val[0] = (u32)id;
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 9/9] bfs: use the proper header file for inclusion

2008-01-25 Thread Dmitri Vorobiev
The checkpatch.pl reported the following warning:

$ ./scripts/checkpatch.pl --strict --file fs/bfs/inode.c
CHECK: Use #include linux/uaccess.h instead of asm/uaccess.h
+#include asm/uaccess.h

This patch fixes this warning.

No functional changes introduced.

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/inode.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 7eefafb..00c54fa 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -15,7 +15,7 @@
 #include linux/smp_lock.h
 #include linux/buffer_head.h
 #include linux/vfs.h
-#include asm/uaccess.h
+#include linux/uaccess.h
 #include bfs.h
 
 MODULE_AUTHOR(Tigran Aivazian [EMAIL PROTECTED]);
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/9] bfs: coding style cleanup in fs/bfs/inode.c

2008-01-25 Thread Dmitri Vorobiev
This patch cleans up errors found by checkpatch.pl.

Before the patch:

$ ./scripts/checkpatch.pl --file fs/bfs/inode.c  | grep total
total: 11 errors, 0 warnings, 445 lines checked

After the patch:

$ ./scripts/checkpatch.pl --file fs/bfs/inode.c  | grep total
total: 0 errors, 0 warnings, 446 lines checked

No functional changes introduced.

This patch was compile-tested by building the BFS driver both
as a module and as a part of the kernel proper.

Signed-off-by: Dmitri Vorobiev [EMAIL PROTECTED]
Acked-by: Tigran Aivazian [EMAIL PROTECTED]
---
 fs/bfs/inode.c |   23 ---
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 2284657..5191990 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -90,12 +90,12 @@ static void bfs_read_inode(struct inode *inode)
 static int bfs_write_inode(struct inode *inode, int unused)
 {
unsigned int ino = (u16)inode-i_ino;
-unsigned long i_sblock;
+   unsigned long i_sblock;
struct bfs_inode *di;
struct buffer_head *bh;
int block, off;
 
-dprintf(ino=%08x\n, ino);
+   dprintf(ino=%08x\n, ino);
 
if ((ino  BFS_ROOT_INO) || (ino  BFS_SB(inode-i_sb)-si_lasti)) {
printf(Bad inode number %s:%08x\n, inode-i_sb-s_id, ino);
@@ -128,7 +128,7 @@ static int bfs_write_inode(struct inode *inode, int unused)
di-i_atime = cpu_to_le32(inode-i_atime.tv_sec);
di-i_mtime = cpu_to_le32(inode-i_mtime.tv_sec);
di-i_ctime = cpu_to_le32(inode-i_ctime.tv_sec);
-i_sblock = BFS_I(inode)-i_sblock;
+   i_sblock = BFS_I(inode)-i_sblock;
di-i_sblock = cpu_to_le32(i_sblock);
di-i_eblock = cpu_to_le32(BFS_I(inode)-i_eblock);
di-i_eoffset = cpu_to_le32(i_sblock * BFS_BSIZE + inode-i_size - 1);
@@ -157,7 +157,7 @@ static void bfs_delete_inode(struct inode *inode)
printf(invalid ino=%08lx\n, ino);
return;
}
-   
+
inode-i_size = 0;
inode-i_atime = inode-i_mtime = inode-i_ctime = CURRENT_TIME_SEC;
lock_kernel();
@@ -177,13 +177,13 @@ static void bfs_delete_inode(struct inode *inode)
mark_buffer_dirty(bh);
brelse(bh);
 
-if (bi-i_dsk_ino) {
+   if (bi-i_dsk_ino) {
if (bi-i_sblock)
info-si_freeb += bi-i_eblock + 1 - bi-i_sblock;
info-si_freei++;
clear_bit(ino, info-si_imap);
dump_imap(delete_inode, s);
-}
+   }
 
/*
 * If this was the last file, make the previous block
@@ -293,7 +293,8 @@ void dump_imap(const char *prefix, struct super_block *s)
if (!tmpbuf)
return;
for (i = BFS_SB(s)-si_lasti; i = 0; i--) {
-   if (i  PAGE_SIZE - 100) break;
+   if (i  PAGE_SIZE - 100)
+   break;
if (test_bit(i, BFS_SB(s)-si_imap))
strcat(tmpbuf, 1);
else
@@ -321,12 +322,12 @@ static int bfs_fill_super(struct super_block *s, void 
*data, int silent)
sb_set_blocksize(s, BFS_BSIZE);
 
bh = sb_bread(s, 0);
-   if(!bh)
+   if (!bh)
goto out;
bfs_sb = (struct bfs_super_block *)bh-b_data;
if (le32_to_cpu(bfs_sb-s_magic) != BFS_MAGIC) {
if (!silent)
-   printf(No BFS filesystem on %s (magic=%08x)\n, 
+   printf(No BFS filesystem on %s (magic=%08x)\n,
s-s_id,  le32_to_cpu(bfs_sb-s_magic));
goto out;
}
@@ -395,7 +396,7 @@ static int bfs_fill_super(struct super_block *s, void 
*data, int silent)
if (!(s-s_flags  MS_RDONLY)) {
mark_buffer_dirty(info-si_sbh);
s-s_dirt = 1;
-   } 
+   }
dump_imap(read_super, s);
return 0;
 
@@ -425,7 +426,7 @@ static int __init init_bfs_fs(void)
int err = init_inodecache();
if (err)
goto out1;
-err = register_filesystem(bfs_fs_type);
+   err = register_filesystem(bfs_fs_type);
if (err)
goto out;
return 0;
-- 
1.5.3

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html