Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-13 Thread Kalpak Shah
On Fri, 2007-07-13 at 09:59 +0530, Aneesh Kumar K.V wrote:
 
 Kalpak Shah wrote:
  On Tue, 2007-07-10 at 16:30 -0700, Andrew Morton wrote:
  On Sun, 01 Jul 2007 03:36:56 -0400
  Mingming Cao [EMAIL PROTECTED] wrote:
 
  This patch is a spinoff of the old nanosecond patches.
  I don't know what the old nanosecond patches are.  A link to a suitable
  changlog for those patches would do in a pinch.  Preferable would be to
  write a proper changelog for this patch.
  
  The incremental patch contains a proper changelog describing the patch.
  
 
 
 Instead of  putting incremental patches it would be nice if we can have 
 replacement patches.
 for the already existing patches with the comments addressed. For example if 
 we have a 
 review comment on the patch message ( commit log ) then adding an incremental 
 patch doesn't help.

I think that it would be easier to review just the changes that have
been made to the patches instead of having people go through the entire
patch again. I was hoping that someone with write access to ext4-git
would update the commit logs.

If replacement patches are preferred, then I will send them again.

Thanks,
Kalpak.

 
 
 -aneesh
 -
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Christoph Hellwig
On Fri, Jul 13, 2007 at 06:17:55PM +0530, Amit K. Arora wrote:
  /*
 + * sys_fallocate - preallocate blocks or free preallocated blocks
 + * @fd: the file descriptor
 + * @mode: mode specifies the behavior of allocation.
 + * @offset: The offset within file, from where allocation is being
 + *   requested. It should not have a negative value.
 + * @len: The amount of space in bytes to be allocated, from the offset.
 + *This can not be zero or a negative value.

kerneldoc comments are for in-kernel APIs which syscalls aren't.  I'd say
just temove this comment, the manpage is a much better documentation anyway.

 + * TBD Generic fallocate to be added for file systems that do not
 + *support fallocate.

Please remove the comment, adding a generic fallback in kernelspace is a
very dumb idea as we already discussed long time ago.

 --- linux-2.6.22.orig/include/linux/fs.h
 +++ linux-2.6.22/include/linux/fs.h
 @@ -266,6 +266,21 @@ extern int dir_notify_enable;
  #define SYNC_FILE_RANGE_WRITE2
  #define SYNC_FILE_RANGE_WAIT_AFTER   4
  
 +/*
 + * sys_fallocate modes
 + * Currently sys_fallocate supports two modes:
 + * FALLOC_ALLOCATE : This is the preallocate mode, using which an application
 + *   may request reservation of space for a particular file.
 + *   The file size will be changed if the allocation is
 + *   beyond EOF.
 + * FALLOC_RESV_SPACE :   This is same as the above mode, with only one 
 difference
 + *   that the file size will not be modified.
 + */
 +#define FALLOC_FL_KEEP_SIZE0x01 /* default is extend/shrink size */
 +
 +#define FALLOC_ALLOCATE0
 +#define FALLOC_RESV_SPACE  FALLOC_FL_KEEP_SIZE

Just remove FALLOC_ALLOCATE, 0 flags should be the default.  I'm also
not sure there is any point in having two namespace now that we have a flags-
based ABI.

Also please don't add this to fs.h.  fs.h is a complete mess and the
falloc flags are a new user ABI.  Add a linux/falloc.h instead which can
be added to headers-y so the ABI constant can be exported to userspace.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6][TAKE7] ext4: change for better extent-to-group alignment

2007-07-13 Thread Amit K. Arora
From: Amit Arora [EMAIL PROTECTED]

Change on-disk format for extent to represent uninitialized/initialized extents

This change was suggested by Andreas Dilger. 
This patch changes the EXT_MAX_LEN value and extent code which marks/checks
uninitialized extents. With this change it will be possible to have
initialized extents with 2^15 blocks (earlier the max blocks we could have
was 2^15 - 1). This way we can have better extent-to-block alignment.
Now, maximum number of blocks we can have in an initialized extent is 2^15
and in an uninitialized extent is 2^15 - 1.

This patch takes care of Andreas's suggestion of using EXT_INIT_MAX_LEN
instead of 0x8000 at some places.

Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/fs/ext4/extents.c
===
--- linux-2.6.22.orig/fs/ext4/extents.c
+++ linux-2.6.22/fs/ext4/extents.c
@@ -1106,7 +1106,7 @@ static int
 ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
struct ext4_extent *ex2)
 {
-   unsigned short ext1_ee_len, ext2_ee_len;
+   unsigned short ext1_ee_len, ext2_ee_len, max_len;
 
/*
 * Make sure that either both extents are uninitialized, or
@@ -1115,6 +1115,11 @@ ext4_can_extents_be_merged(struct inode 
if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
return 0;
 
+   if (ext4_ext_is_uninitialized(ex1))
+   max_len = EXT_UNINIT_MAX_LEN;
+   else
+   max_len = EXT_INIT_MAX_LEN;
+
ext1_ee_len = ext4_ext_get_actual_len(ex1);
ext2_ee_len = ext4_ext_get_actual_len(ex2);
 
@@ -1127,7 +1132,7 @@ ext4_can_extents_be_merged(struct inode 
 * as an RO_COMPAT feature, refuse to merge to extents if
 * this can result in the top bit of ee_len being set.
 */
-   if (ext1_ee_len + ext2_ee_len  EXT_MAX_LEN)
+   if (ext1_ee_len + ext2_ee_len  max_len)
return 0;
 #ifdef AGGRESSIVE_TEST
if (le16_to_cpu(ex1-ee_len) = 4)
@@ -1814,7 +1819,11 @@ ext4_ext_rm_leaf(handle_t *handle, struc
 
ex-ee_block = cpu_to_le32(block);
ex-ee_len = cpu_to_le16(num);
-   if (uninitialized)
+   /*
+* Do not mark uninitialized if all the blocks in the
+* extent have been removed.
+*/
+   if (uninitialized  num)
ext4_ext_mark_uninitialized(ex);
 
err = ext4_ext_dirty(handle, inode, path + depth);
@@ -2307,6 +2316,19 @@ int ext4_ext_get_blocks(handle_t *handle
/* allocate new block */
goal = ext4_ext_find_goal(inode, path, iblock);
 
+   /*
+* See if request is beyond maximum number of blocks we can have in
+* a single extent. For an initialized extent this limit is
+* EXT_INIT_MAX_LEN and for an uninitialized extent this limit is
+* EXT_UNINIT_MAX_LEN.
+*/
+   if (max_blocks  EXT_INIT_MAX_LEN 
+   create != EXT4_CREATE_UNINITIALIZED_EXT)
+   max_blocks = EXT_INIT_MAX_LEN;
+   else if (max_blocks  EXT_UNINIT_MAX_LEN 
+create == EXT4_CREATE_UNINITIALIZED_EXT)
+   max_blocks = EXT_UNINIT_MAX_LEN;
+
/* Check if we can really insert (iblock)::(iblock+max_blocks) extent */
newex.ee_block = cpu_to_le32(iblock);
newex.ee_len = cpu_to_le16(max_blocks);
Index: linux-2.6.22/include/linux/ext4_fs_extents.h
===
--- linux-2.6.22.orig/include/linux/ext4_fs_extents.h
+++ linux-2.6.22/include/linux/ext4_fs_extents.h
@@ -141,7 +141,25 @@ typedef int (*ext_prepare_callback)(stru
 
 #define EXT_MAX_BLOCK  0x
 
-#define EXT_MAX_LEN((1UL  15) - 1)
+/*
+ * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
+ * initialized extent. This is 2^15 and not (2^16 - 1), since we use the
+ * MSB of ee_len field in the extent datastructure to signify if this
+ * particular extent is an initialized extent or an uninitialized (i.e.
+ * preallocated).
+ * EXT_UNINIT_MAX_LEN is the maximum number of blocks we can have in an
+ * uninitialized extent.
+ * If ee_len is = 0x8000, it is an initialized extent. Otherwise, it is an
+ * uninitialized one. In other words, if MSB of ee_len is set, it is an
+ * uninitialized extent with only one special scenario when ee_len = 0x8000.
+ * In this case we can not have an uninitialized extent of zero length and
+ * thus we make it as a special case of initialized extent with 0x8000 length.
+ * This way we get better extent-to-group alignment for initialized extents.
+ * Hence, the maximum number of blocks we can have in an *initialized*
+ * extent is 2^15 (32768) and in an *uninitialized* extent is 2^15-1 (32767).
+ */
+#define EXT_INIT_MAX_LEN   (1UL  15)
+#define EXT_UNINIT_MAX_LEN (EXT_INIT_MAX_LEN - 1)
 
 
 

[PATCH 5/6][TAKE7] ext4: write support for preallocated blocks

2007-07-13 Thread Amit K. Arora
From:  Amit Arora [EMAIL PROTECTED]

write support for preallocated blocks

This patch adds write support to the uninitialized extents that get
created when a preallocation is done using fallocate(). It takes care of
splitting the extents into multiple (upto three) extents and merging the
new split extents with neighbouring ones, if possible.

Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/fs/ext4/extents.c
===
--- linux-2.6.22.orig/fs/ext4/extents.c
+++ linux-2.6.22/fs/ext4/extents.c
@@ -1140,6 +1140,53 @@ ext4_can_extents_be_merged(struct inode 
 }
 
 /*
+ * This function tries to merge the ex extent to the next extent in the tree.
+ * It always tries to merge towards right. If you want to merge towards
+ * left, pass ex - 1 as argument instead of ex.
+ * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
+ * 1 if they got merged.
+ */
+int ext4_ext_try_to_merge(struct inode *inode,
+ struct ext4_ext_path *path,
+ struct ext4_extent *ex)
+{
+   struct ext4_extent_header *eh;
+   unsigned int depth, len;
+   int merge_done = 0;
+   int uninitialized = 0;
+
+   depth = ext_depth(inode);
+   BUG_ON(path[depth].p_hdr == NULL);
+   eh = path[depth].p_hdr;
+
+   while (ex  EXT_LAST_EXTENT(eh)) {
+   if (!ext4_can_extents_be_merged(inode, ex, ex + 1))
+   break;
+   /* merge with next extent! */
+   if (ext4_ext_is_uninitialized(ex))
+   uninitialized = 1;
+   ex-ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
+   + ext4_ext_get_actual_len(ex + 1));
+   if (uninitialized)
+   ext4_ext_mark_uninitialized(ex);
+
+   if (ex + 1  EXT_LAST_EXTENT(eh)) {
+   len = (EXT_LAST_EXTENT(eh) - ex - 1)
+   * sizeof(struct ext4_extent);
+   memmove(ex + 1, ex + 2, len);
+   }
+   eh-eh_entries = cpu_to_le16(le16_to_cpu(eh-eh_entries) - 1);
+   merge_done = 1;
+   WARN_ON(eh-eh_entries == 0);
+   if (!eh-eh_entries)
+   ext4_error(inode-i_sb, ext4_ext_try_to_merge,
+  inode#%lu, eh-eh_entries = 0!, inode-i_ino);
+   }
+
+   return merge_done;
+}
+
+/*
  * check if a portion of the newext extent overlaps with an
  * existing extent.
  *
@@ -1327,25 +1374,7 @@ has_space:
 
 merge:
/* try to merge extents to the right */
-   while (nearex  EXT_LAST_EXTENT(eh)) {
-   if (!ext4_can_extents_be_merged(inode, nearex, nearex + 1))
-   break;
-   /* merge with next extent! */
-   if (ext4_ext_is_uninitialized(nearex))
-   uninitialized = 1;
-   nearex-ee_len = cpu_to_le16(ext4_ext_get_actual_len(nearex)
-   + ext4_ext_get_actual_len(nearex + 1));
-   if (uninitialized)
-   ext4_ext_mark_uninitialized(nearex);
-
-   if (nearex + 1  EXT_LAST_EXTENT(eh)) {
-   len = (EXT_LAST_EXTENT(eh) - nearex - 1)
-   * sizeof(struct ext4_extent);
-   memmove(nearex + 1, nearex + 2, len);
-   }
-   eh-eh_entries = cpu_to_le16(le16_to_cpu(eh-eh_entries)-1);
-   BUG_ON(eh-eh_entries == 0);
-   }
+   ext4_ext_try_to_merge(inode, path, nearex);
 
/* try to merge extents to the left */
 
@@ -2011,15 +2040,158 @@ void ext4_ext_release(struct super_block
 #endif
 }
 
+/*
+ * This function is called by ext4_ext_get_blocks() if someone tries to write
+ * to an uninitialized extent. It may result in splitting the uninitialized
+ * extent into multiple extents (upto three - one initialized and two
+ * uninitialized).
+ * There are three possibilities:
+ *   a There is no split required: Entire extent should be initialized
+ *   b Splits in two extents: Write is happening at either end of the extent
+ *   c Splits in three extents: Somone is writing in middle of the extent
+ */
+int ext4_ext_convert_to_initialized(handle_t *handle, struct inode *inode,
+   struct ext4_ext_path *path,
+   ext4_fsblk_t iblock,
+   unsigned long max_blocks)
+{
+   struct ext4_extent *ex, newex;
+   struct ext4_extent *ex1 = NULL;
+   struct ext4_extent *ex2 = NULL;
+   struct ext4_extent *ex3 = NULL;
+   struct ext4_extent_header *eh;
+   unsigned int allocated, ee_block, ee_len, depth;
+   ext4_fsblk_t newblock;
+   int err = 0;
+   int ret = 0;
+
+   depth = ext_depth(inode);
+   eh = path[depth].p_hdr;
+   ex = 

[PATCH 4/6][TAKE7] ext4: fallocate support in ext4

2007-07-13 Thread Amit K. Arora
From: Amit Arora [EMAIL PROTECTED]

fallocate support in ext4

This patch implements -fallocate() inode operation in ext4. With this
patch users of ext4 file systems will be able to use fallocate() system
call for persistent preallocation. Current implementation only supports
preallocation for regular files (directories not supported as of date)
with extent maps. This patch does not support block-mapped files currently.
Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of
now.


Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/fs/ext4/extents.c
===
--- linux-2.6.22.orig/fs/ext4/extents.c
+++ linux-2.6.22/fs/ext4/extents.c
@@ -282,7 +282,7 @@ static void ext4_ext_show_path(struct in
} else if (path-p_ext) {
ext_debug(  %d:%d:%llu ,
  le32_to_cpu(path-p_ext-ee_block),
- le16_to_cpu(path-p_ext-ee_len),
+ ext4_ext_get_actual_len(path-p_ext),
  ext_pblock(path-p_ext));
} else
ext_debug(  []);
@@ -305,7 +305,7 @@ static void ext4_ext_show_leaf(struct in
 
for (i = 0; i  le16_to_cpu(eh-eh_entries); i++, ex++) {
ext_debug(%d:%d:%llu , le32_to_cpu(ex-ee_block),
- le16_to_cpu(ex-ee_len), ext_pblock(ex));
+ ext4_ext_get_actual_len(ex), ext_pblock(ex));
}
ext_debug(\n);
 }
@@ -425,7 +425,7 @@ ext4_ext_binsearch(struct inode *inode, 
ext_debug(  - %d:%llu:%d ,
le32_to_cpu(path-p_ext-ee_block),
ext_pblock(path-p_ext),
-   le16_to_cpu(path-p_ext-ee_len));
+   ext4_ext_get_actual_len(path-p_ext));
 
 #ifdef CHECK_BINSEARCH
{
@@ -686,7 +686,7 @@ static int ext4_ext_split(handle_t *hand
ext_debug(move %d:%llu:%d in new leaf %llu\n,
le32_to_cpu(path[depth].p_ext-ee_block),
ext_pblock(path[depth].p_ext),
-   le16_to_cpu(path[depth].p_ext-ee_len),
+   ext4_ext_get_actual_len(path[depth].p_ext),
newblock);
/*memmove(ex++, path[depth].p_ext++,
sizeof(struct ext4_extent));
@@ -1106,7 +1106,19 @@ static int
 ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
struct ext4_extent *ex2)
 {
-   if (le32_to_cpu(ex1-ee_block) + le16_to_cpu(ex1-ee_len) !=
+   unsigned short ext1_ee_len, ext2_ee_len;
+
+   /*
+* Make sure that either both extents are uninitialized, or
+* both are _not_.
+*/
+   if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
+   return 0;
+
+   ext1_ee_len = ext4_ext_get_actual_len(ex1);
+   ext2_ee_len = ext4_ext_get_actual_len(ex2);
+
+   if (le32_to_cpu(ex1-ee_block) + ext1_ee_len !=
le32_to_cpu(ex2-ee_block))
return 0;
 
@@ -1115,14 +1127,14 @@ ext4_can_extents_be_merged(struct inode 
 * as an RO_COMPAT feature, refuse to merge to extents if
 * this can result in the top bit of ee_len being set.
 */
-   if (le16_to_cpu(ex1-ee_len) + le16_to_cpu(ex2-ee_len)  EXT_MAX_LEN)
+   if (ext1_ee_len + ext2_ee_len  EXT_MAX_LEN)
return 0;
 #ifdef AGGRESSIVE_TEST
if (le16_to_cpu(ex1-ee_len) = 4)
return 0;
 #endif
 
-   if (ext_pblock(ex1) + le16_to_cpu(ex1-ee_len) == ext_pblock(ex2))
+   if (ext_pblock(ex1) + ext1_ee_len == ext_pblock(ex2))
return 1;
return 0;
 }
@@ -1144,7 +1156,7 @@ unsigned int ext4_ext_check_overlap(stru
unsigned int ret = 0;
 
b1 = le32_to_cpu(newext-ee_block);
-   len1 = le16_to_cpu(newext-ee_len);
+   len1 = ext4_ext_get_actual_len(newext);
depth = ext_depth(inode);
if (!path[depth].p_ext)
goto out;
@@ -1191,8 +1203,9 @@ int ext4_ext_insert_extent(handle_t *han
struct ext4_extent *nearex; /* nearest extent */
struct ext4_ext_path *npath = NULL;
int depth, len, err, next;
+   unsigned uninitialized = 0;
 
-   BUG_ON(newext-ee_len == 0);
+   BUG_ON(ext4_ext_get_actual_len(newext) == 0);
depth = ext_depth(inode);
ex = path[depth].p_ext;
BUG_ON(path[depth].p_hdr == NULL);
@@ -1200,14 +1213,24 @@ int ext4_ext_insert_extent(handle_t *han
/* try to insert block into found extent and return */
if (ex  ext4_can_extents_be_merged(inode, ex, newext)) {
ext_debug(append %d block to %d:%d (from %llu)\n,
-   le16_to_cpu(newext-ee_len),
+   

[PATCH 3/6][TAKE7] revalidate write permissions for fallocate

2007-07-13 Thread Amit K. Arora
From: David P. Quigley [EMAIL PROTECTED]

Revalidate the write permissions for fallocate(2), in case security policy has
changed since the files were opened.

Acked-by: James Morris [EMAIL PROTECTED]
Signed-off-by: David P. Quigley [EMAIL PROTECTED]

---
 fs/open.c |3 +++
 1 files changed, 3 insertions(+)

Index: linux-2.6.22/fs/open.c
===
--- linux-2.6.22.orig/fs/open.c
+++ linux-2.6.22/fs/open.c
@@ -407,6 +407,9 @@ asmlinkage long sys_fallocate(int fd, in
goto out;
if (!(file-f_mode  FMODE_WRITE))
goto out_fput;
+   ret = security_file_permission(file, MAY_WRITE);
+   if (ret)
+   goto out_fput;
 
inode = file-f_path.dentry-d_inode;
 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6][TAKE7] manpage for fallocate

2007-07-13 Thread Amit K. Arora
Following is the modified version of the manpage originally submitted by
David Chinner. Please use `nroff -man fallocate.2 | less` to view.

This includes changes suggested by Heikki Orsila and Barry Naujok.


.TH fallocate 2
.SH NAME
fallocate \- allocate or remove file space
.SH SYNOPSIS
.nf
.B #include fcntl.h
.PP
.BI long fallocate(int  fd , int  mode , loff_t  offset , loff_t  len);
.SH DESCRIPTION
The
.B fallocate
syscall allows a user to directly manipulate the allocated disk space
for the file referred to by
.I fd
for the byte range starting at
.I offset
and continuing for
.I len
bytes.
The
.I mode
parameter determines the operation to be performed on the given range.
Currently there are two modes:
.TP
.B FALLOC_ALLOCATE
allocates and initialises to zero the disk space within the given range.
After a successful call, subsequent writes are guaranteed not to fail because
of lack of disk space.  If the size of the file is less than
.IR offset + len ,
then the file is increased to this size; otherwise the file size is left
unchanged.
.B FALLOC_ALLOCATE
closely resembles
.BR posix_fallocate (3)
and is intended as a method of optimally implementing this function.
.B FALLOC_ALLOCATE
may allocate a larger range than that was specified.
.TP
.B FALLOC_RESV_SPACE
provides the same functionality as
.B FALLOC_ALLOCATE
except it does not ever change the file size. This allows allocation
of zero blocks beyond the end of file and is useful for optimising
append workloads.
.SH RETURN VALUE
.B fallocate
returns zero on success, or an error number on failure.
Note that
.I errno
is not set.
.SH ERRORS
.TP
.B EBADF
.I fd
is not a valid file descriptor, or is not opened for writing.
.TP
.B EFBIG
.IR offset + len
exceeds the maximum file size.
.TP
.B EINVAL
.I offset
was less than 0, or
.I len
was less than or equal to 0.
.TP
.B ENODEV
.I fd
does not refer to a regular file or a directory.
.TP
.B ENOSPC
There is not enough space left on the device containing the file
referred to by
.IR fd .
.TP
.B ESPIPE
.I fd
refers to a pipe of file descriptor.
.TP
.B ENOSYS
The filesystem underlying the file descriptor does not support this
operation.
.TP
.B EINTR
A signal was caught during execution
.TP
.B EIO
An I/O error occurred while reading from or writing to a file system.
.TP
.B EOPNOTSUPP
The mode is not supported on the file descriptor.
.SH AVAILABILITY
The
.B fallocate
system call is available since 2.6.XX
.SH SEE ALSO
.BR syscall (2),
.BR posix_fadvise (3),
.BR ftruncate (3).
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Peter Zijlstra
On Fri, 2007-07-13 at 02:05 -0700, Andrew Morton wrote:

 Except lockdep doesn't know about journal_start(), which has ranking
 requirements similar to a semaphore.  

Something like so?

Or can journal_stop() be done by a different task than the one that did
journal_start()? - in which case nothing much can be done :-/

This seems to boot... albeit I did not push it hard.

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
---
 fs/jbd/transaction.c |9 +
 include/linux/jbd.h  |5 +
 2 files changed, 14 insertions(+)

Index: linux-2.6/fs/jbd/transaction.c
===
--- linux-2.6.orig/fs/jbd/transaction.c
+++ linux-2.6/fs/jbd/transaction.c
@@ -233,6 +233,8 @@ out:
return ret;
 }
 
+static struct lock_class_key jbd_handle_key;
+
 /* Allocate a new handle.  This should probably be in a slab... */
 static handle_t *new_handle(int nblocks)
 {
@@ -243,6 +245,8 @@ static handle_t *new_handle(int nblocks)
handle-h_buffer_credits = nblocks;
handle-h_ref = 1;
 
+   lockdep_init_map(handle-h_lockdep_map, jbd_handle, jbd_handle_key, 
0);
+
return handle;
 }
 
@@ -286,6 +290,9 @@ handle_t *journal_start(journal_t *journ
current-journal_info = NULL;
handle = ERR_PTR(err);
}
+
+   lock_acquire(handle-h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+
return handle;
 }
 
@@ -1411,6 +1418,8 @@ int journal_stop(handle_t *handle)
spin_unlock(journal-j_state_lock);
}
 
+   lock_release(handle-h_lockdep_map, 1, _THIS_IP_);
+
jbd_free_handle(handle);
return err;
 }
Index: linux-2.6/include/linux/jbd.h
===
--- linux-2.6.orig/include/linux/jbd.h
+++ linux-2.6/include/linux/jbd.h
@@ -30,6 +30,7 @@
 #include linux/bit_spinlock.h
 #include linux/mutex.h
 #include linux/timer.h
+#include linux/lockdep.h
 
 #include asm/semaphore.h
 #endif
@@ -405,6 +406,10 @@ struct handle_s
unsigned inth_sync: 1;  /* sync-on-close */
unsigned inth_jdata:1;  /* force data journaling */
unsigned inth_aborted:  1;  /* fatal error on handle */
+
+#ifdef CONFIG_LOCKDEP
+   struct lockdep_map  h_lockdep_map;
+#endif
 };
 
 


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Amit K. Arora
From: Amit Arora [EMAIL PROTECTED]

sys_fallocate() implementation on i386, x86_64 and powerpc

fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called -fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. A generic file system operation to handle fallocate
   (generic_fallocate), for filesystems that do _not_ have the fallocate
   inode operation implemented.
3. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()


Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.22.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6.22/arch/i386/kernel/syscall_table.S
@@ -323,3 +323,4 @@ ENTRY(sys_call_table)
.long sys_signalfd
.long sys_timerfd
.long sys_eventfd
+   .long sys_fallocate
Index: linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c
===
--- linux-2.6.22.orig/arch/powerpc/kernel/sys_ppc32.c
+++ linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c
@@ -773,6 +773,13 @@ asmlinkage int compat_sys_truncate64(con
return sys_truncate(path, (high  32) | low);
 }
 
+asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo,
+u32 lenhi, u32 lenlo)
+{
+   return sys_fallocate(fd, mode, ((loff_t)offhi  32) | offlo,
+((loff_t)lenhi  32) | lenlo);
+}
+
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
Index: linux-2.6.22/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.22.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6.22/arch/x86_64/ia32/ia32entry.S
@@ -719,4 +719,5 @@ ia32_sys_call_table:
.quad compat_sys_signalfd
.quad compat_sys_timerfd
.quad sys_eventfd
+   .quad sys32_fallocate
 ia32_syscall_end:
Index: linux-2.6.22/fs/open.c
===
--- linux-2.6.22.orig/fs/open.c
+++ linux-2.6.22/fs/open.c
@@ -353,6 +353,92 @@ asmlinkage long sys_ftruncate64(unsigned
 #endif
 
 /*
+ * sys_fallocate - preallocate blocks or free preallocated blocks
+ * @fd: the file descriptor
+ * @mode: mode specifies the behavior of allocation.
+ * @offset: The offset within file, from where allocation is being
+ * requested. It should not have a negative value.
+ * @len: The amount of space in bytes to be allocated, from the offset.
+ *  This can not be zero or a negative value.
+ *
+ * This system call preallocates space for a file. The range of blocks
+ * allocated depends on the value of offset and len arguments provided
+ * by the user/application. With FALLOC_ALLOCATE or FALLOC_RESV_SPACE
+ * modes, if the system call succeeds, subsequent writes to the file in
+ * the given range (specified by offset  len) should not fail - even if
+ * the file system later becomes full. Hence the preallocation done is
+ * persistent (valid even after reopen of the file and remount/reboot).
+ *
+ * It is expected that the -fallocate() inode operation implemented by
+ * the individual file systems will update the file size and/or
+ * ctime/mtime depending on the mode and also on the success of the
+ * operation.
+ *
+ * Note: Incase the file system does not support preallocation,
+ * posix_fallocate() should fall back to the library implementation (i.e.
+ * allocating zero-filled new blocks to the file).
+ *
+ * Return Values
+ * 0   : On 

Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-13 Thread Pekka Enberg

On 7/13/07, Kalpak Shah [EMAIL PROTECTED] wrote:

 EXT4_DIR_LINK_MAX() is buggy: it evaluates its arg twice.

#define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir)  (dir)-i_nlink = EXT4_LINK_MAX)


[snip]


Sorry, I didn't understand what is the problem with this macro?


The expression represented by 'dir' is evaluated twice (think dir++
here). It's safer to make it a static inline function.

  Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6][TAKE7] fallocate system call

2007-07-13 Thread Amit K. Arora
This is the latest fallocate patchset and is based on 2.6.22.

* Following are the changes from TAKE6:
1) We now just have two modes (and no deallocation modes).
2) Updated the man page
3) Added a new patch submitted by David P. Quigley  (Patch 3/6).
4) Used EXT_INIT_MAX_LEN instead of 0x8000 in Patch 6/6.
5) Included below in the end is a small testcase to test fallocate.

* Following are the changes from TAKE5 to TAKE6:
1) Rebased to 2.6.22
2) Added compat wrapper for x86_64
3) Dropped s390 and ia64 patches, since the platform maintaners can
   add the support for fallocate once it is in mainline.
4) Added a change suggested by Andreas for better extent-to-group
   alignment in ext4 (Patch 6/6). Please refer following post:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg02445.html
5) Renamed mode flags and values from FA_ to FALLOC_
6) Added manpage (updated version of the one initially submitted by
   David Chinner).


Todos:
-
1 Implementation on other architectures (other than i386, x86_64,
   and ppc64). s390(x) and ia64 patches are ready and will be pushed
   by platform maintaners when the fallocate is in mainline.
2 A generic file system operation to handle fallocate
   (generic_fallocate), for filesystems that do _not_ have the fallocate
   inode operation implemented.
3 Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()
4 Patch to e2fsprogs to recognize and display uninitialized extents.


Following patches follow:
Patch 1/6 : manpage for fallocate
Patch 2/6 : fallocate() implementation in i386, x86_64 and powerpc
Patch 3/6 : revalidate write permissions for fallocate
Patch 4/6 : ext4: fallocate support in ext4
Patch 5/6 : ext4: write support for preallocated blocks
Patch 6/6 : ext4: change for better extent-to-group alignment

Note: Attached below is a small testcase to test fallocate. The __NR_fallocate
will need to be changed depending on the system call number in the kernel (it
may get changed due to merge) and also depending on the architecture.

--
Regards,
Amit Arora



#include stdio.h
#include stdlib.h
#include fcntl.h
#include errno.h

#include linux/unistd.h
#include sys/vfs.h
#include sys/stat.h

#define VERBOSE 0

#define __NR_fallocate324

#define FALLOC_FL_KEEP_SIZE 0x01
#define FALLOC_ALLOCATE 0x0
#define FALLOC_RESV_SPACE   FALLOC_FL_KEEP_SIZE


int do_fallocate(int fd, int mode, loff_t offset, loff_t len)
{
  int ret;

  if (VERBOSE)
printf(Trying to preallocate blocks (offset=%llu, len=%llu)\n,
offset, len);
  ret = syscall(__NR_fallocate, fd, mode, offset, len);

  if (ret 0) {
printf(SYSCALL: received error %d, ret=%d\n, errno, ret);
close(fd);
return(1);
  }

  if (VERBOSE)
printf(fallocate system call succedded !  ret=%d\n, ret);

  return ret;
}

int test_fallocate(int fd, int mode, loff_t offset, loff_t len)
{
  int ret, blocks;
  struct stat statbuf1, statbuf2;

  fstat(fd, statbuf1);

  ret = do_fallocate(fd, mode, offset, len);

  fstat(fd, statbuf2);

  /* check file size after preallocation */
  if (mode == FALLOC_ALLOCATE) {
if (!ret  statbuf1.st_size  (offset + len) 
statbuf2.st_size != (offset + len)) {
printf(Error: fallocate succeeded, but the file size did not 
change, where it should have!\n);
ret = 1;
}
  } else if (statbuf1.st_size != statbuf2.st_size) {
printf(Error : File size changed, when it should not have!\n);
ret = 1;
  }

  blocks = ((statbuf2.st_blocks - statbuf1.st_blocks) * 512)/ 
statbuf2.st_blksize;

  /* Print report */
  printf(# FALLOCATE TEST REPORT #\n);
  printf(\tNew blocks preallocated = %d.\n, blocks);
  printf(\tNumber of bytes preallocated = %d\n, blocks * statbuf2.st_blksize);
  printf(\tOld file size = %d, New file size %d.\n,
  statbuf1.st_size, statbuf2.st_size);
  printf(\tOld num blocks = %d, New num blocks %d.\n,
  (statbuf1.st_blocks * 512)/1024, (statbuf2.st_blocks * 512)/1024);

  return ret;
}


int do_write(int fd, loff_t offset, loff_t len)
{
  int ret;
  char *buf;

  buf = (char *)malloc(len);
  if (!buf) {
printf(error: malloc failed.\n);
return(-1);
  }

  if (VERBOSE)
printf(Trying to write to file (offset=%llu, len=%llu)\n, 
offset, len);

  ret = lseek(fd, offset, SEEK_SET);
  if (ret != offset) {
printf(lseek() failed error=%d, ret=%d\n, errno, ret);
close(fd); 
return(-1);
  }

  ret = write(fd, buf, len);
  if (ret != len) {
 printf(write() failed error=%d, ret=%d\n, errno, ret);
close(fd); 
return(-1);
  }

  if (VERBOSE)
printf(Write succedded ! Written %llu bytes ret=%d\n, len, ret);

  return ret;
}


int test_write(int fd, loff_t offset, loff_t len)
{
  int ret;

  ret = do_write(fd, offset, len);
  printf(# 

Re: [PATCH 3/6][TAKE7] revalidate write permissions for fallocate

2007-07-13 Thread Christoph Hellwig
On Fri, Jul 13, 2007 at 06:18:47PM +0530, Amit K. Arora wrote:
 From: David P. Quigley [EMAIL PROTECTED]
 
 Revalidate the write permissions for fallocate(2), in case security policy has
 changed since the files were opened.
 
 Acked-by: James Morris [EMAIL PROTECTED]
 Signed-off-by: David P. Quigley [EMAIL PROTECTED]

This should be merged into the main falloc patch.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Amit K. Arora
On Fri, Jul 13, 2007 at 02:21:19PM +0100, Christoph Hellwig wrote:
 On Fri, Jul 13, 2007 at 06:17:55PM +0530, Amit K. Arora wrote:
   /*
  + * sys_fallocate - preallocate blocks or free preallocated blocks
  + * @fd: the file descriptor
  + * @mode: mode specifies the behavior of allocation.
  + * @offset: The offset within file, from where allocation is being
  + * requested. It should not have a negative value.
  + * @len: The amount of space in bytes to be allocated, from the offset.
  + *  This can not be zero or a negative value.
 
 kerneldoc comments are for in-kernel APIs which syscalls aren't.  I'd say
 just temove this comment, the manpage is a much better documentation anyway.

Ok. I will remove this entire comment.
 
  + * TBD Generic fallocate to be added for file systems that do not
  + *  support fallocate.
 
 Please remove the comment, adding a generic fallback in kernelspace is a
 very dumb idea as we already discussed long time ago.

  --- linux-2.6.22.orig/include/linux/fs.h
  +++ linux-2.6.22/include/linux/fs.h
  @@ -266,6 +266,21 @@ extern int dir_notify_enable;
   #define SYNC_FILE_RANGE_WRITE  2
   #define SYNC_FILE_RANGE_WAIT_AFTER 4
   
  +/*
  + * sys_fallocate modes
  + * Currently sys_fallocate supports two modes:
  + * FALLOC_ALLOCATE :   This is the preallocate mode, using which an 
  application
  + * may request reservation of space for a particular file.
  + * The file size will be changed if the allocation is
  + * beyond EOF.
  + * FALLOC_RESV_SPACE : This is same as the above mode, with only one 
  difference
  + * that the file size will not be modified.
  + */
  +#define FALLOC_FL_KEEP_SIZE0x01 /* default is extend/shrink size */
  +
  +#define FALLOC_ALLOCATE0
  +#define FALLOC_RESV_SPACE  FALLOC_FL_KEEP_SIZE
 
 Just remove FALLOC_ALLOCATE, 0 flags should be the default.  I'm also
 not sure there is any point in having two namespace now that we have a flags-
 based ABI.

Ok. Since we have only one flag (FALLOC_FL_KEEP_SIZE) and we do not want
to declare the default mode (FALLOC_ALLOCATE), we can _just_ have this
flag and remove the other mode too (FALLOC_RESV_SPACE).
Is this what you are suggesting ?

 Also please don't add this to fs.h.  fs.h is a complete mess and the
 falloc flags are a new user ABI.  Add a linux/falloc.h instead which can
 be added to headers-y so the ABI constant can be exported to userspace.

Should we need a header file just to declare one flag - i.e.
FALLOC_FL_KEEP_SIZE (since now there is no point of declaring the two
modes) ? If linux/fs.h is not a good place, will asm-generic/fcntl.h
be a sane place for this flag ?

Thanks!
--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6][TAKE7] manpage for fallocate

2007-07-13 Thread David Chinner
On Fri, Jul 13, 2007 at 06:16:01PM +0530, Amit K. Arora wrote:
 Following is the modified version of the manpage originally submitted by
 David Chinner. Please use `nroff -man fallocate.2 | less` to view.
 
 This includes changes suggested by Heikki Orsila and Barry Naujok.

Can we get itemised change logs for all these patches from now on?

 .TH fallocate 2
 .SH NAME
 fallocate \- allocate or remove file space

If fallocate is just being used for allocating space this is wrong.
maybe - manipulate file space instead?

dd .TP
 .B FALLOC_RESV_SPACE
 provides the same functionality as
 .B FALLOC_ALLOCATE
 except it does not ever change the file size. This allows allocation
 of zero blocks beyond the end of file and is useful for optimising

of zeroed blocks

-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-13 Thread Ric Wheeler



Guy Watkins wrote:

} -Original Message-
} From: [EMAIL PROTECTED] [mailto:linux-raid-
} [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
} Sent: Thursday, July 12, 2007 1:35 PM
} To: [EMAIL PROTECTED]
} Cc: Tejun Heo; [EMAIL PROTECTED]; Stefan Bader; Phillip Susi; device-mapper
} development; linux-fsdevel@vger.kernel.org; [EMAIL PROTECTED];
} [EMAIL PROTECTED]; Jens Axboe; David Chinner; Andreas Dilger
} Subject: Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for
} devices, filesystems, and dm/md.
} 
} On Wed, 11 Jul 2007 18:44:21 EDT, Ric Wheeler said:

}  [EMAIL PROTECTED] wrote:
}   On Tue, 10 Jul 2007 14:39:41 EDT, Ric Wheeler said:
}  
}   All of the high end arrays have non-volatile cache (read, on power
} loss, it is a
}   promise that it will get all of your data out to permanent storage).
} You don't
}   need to ask this kind of array to drain the cache. In fact, it might
} just ignore
}   you if you send it that kind of request ;-)
}  
}   OK, I'll bite - how does the kernel know whether the other end of that
}   fiberchannel cable is attached to a DMX-3 or to some no-name product
} that
}   may not have the same assurances?  Is there a I'm a high-end array
} bit
}   in the sense data that I'm unaware of?
}  
} 
}  There are ways to query devices (think of hdparm -I in S-ATA/P-ATA
} drives, SCSI
}  has similar queries) to see what kind of device you are talking to. I am
} not
}  sure it is worth the trouble to do any automatic detection/handling of
} this.
} 
}  In this specific case, it is more a case of when you attach a high end
} (or
}  mid-tier) device to a server, you should configure it without barriers
} for its
}  exported LUNs.
} 
} I don't have a problem with the sysadmin *telling* the system the other

} end of
} that fiber cable has characteristics X, Y and Z.  What worried me was
} that it
} looked like conflating device reported writeback cache with device
} actually
} has enough battery/hamster/whatever backup to flush everything on a power
} loss.
} (My back-of-envelope calculation shows for a worst-case of needing a 1ms
} seek
} for each 4K block, a 1G cache can take up to 4 1/2 minutes to sync.
} That's
} a lot of battery..)

Most hardware RAID devices I know of use the battery to save the cache while
the power is off.  When the power is restored it flushes the cache to disk.
If the power failure lasts longer than the batteries then the cache data is
lost, but the batteries last 24+ hours I beleve.


Most mid-range and high end arrays actually use that battery to insure that data 
is all written out to permanent media when the power is lost. I won't go into 
how that is done, but it clearly would not be a safe assumption to assume that 
your power outage is only going to be a certain length of time (and if not, you 
would lose data).




A big EMC array we had had enough battery power to power about 400 disks
while the 16 Gig of cache was flushed.  I think EMC told me the batteries
would last about 20 minutes.  I don't recall if the array was usable during
the 20 minutes.  We never tested a power failure.

Guy


I worked on the team that designed that big array.

At one point, we had an array on loan to a partner who tried to put it in a very 
small data center. A few weeks later, they brought in an electrician who needed 
to run more power into the center.  It was pretty funny - he tried to find a 
power button to turn it off and then just walked over and dropped power trying 
to get the Symm to turn off.  When that didn't work, he was really, really 
confused ;-)


ric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-13 Thread Kalpak Shah
The updated patch is attached. comments inline...

On Tue, 2007-07-10 at 22:40 -0700, Andrew Morton wrote:
  If we exceed 65000 subdirectories in an htree directory it sets the
  inode link count to 1 and no longer counts subdirectories.  The
  directory link count is not actually used when determining if a
  directory is empty, as that only counts subdirectories and not regular
  files that might be in there. 
  
  A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
  the subdir count for any directory crosses 65000.
  
 
 Would I be correct in assuming that a later fsck will clear
 EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any 65000 subdir
 directories?
 
 If so, that is worth a mention in the changelog, perhaps?

The changelog has been updated to include this.

   
  +static inline void ext4_inc_count(handle_t *handle, struct inode *inode)
  +{
  +   inc_nlink(inode);
  +   if (is_dx(inode)  inode-i_nlink  1) {
  +   /* limit is 16-bit i_links_count */
  +   if (inode-i_nlink = EXT4_LINK_MAX || inode-i_nlink == 2) {
  +   inode-i_nlink = 1;
  +   EXT4_SET_RO_COMPAT_FEATURE(inode-i_sb,
  + EXT4_FEATURE_RO_COMPAT_DIR_NLINK);
  +   }
  +   }
  +}
 
 Looks too big to be inlined.
 
 Why do we set EXT4_FEATURE_RO_COMPAT_DIR_NLINK if i_nlink==2?

I have added a comment for this. (since it indicates that nlinks==1
previously).

 
  +static inline void ext4_dec_count(handle_t *handle, struct inode *inode)
  +{
  +   drop_nlink(inode);
  +   if (S_ISDIR(inode-i_mode)  inode-i_nlink == 0)
  +   inc_nlink(inode);
  +}
 
 Probably too big to inline.

Removed the inline.

   
  -   if (inode-i_nlink = EXT4_LINK_MAX)
  +   if (EXT4_DIR_LINK_MAX(inode))
  return -EMLINK;
 
 argh.  WHY_IS_EXT4_FULL_OF_UPPER_CASE_MACROS_WHICH_COULD_BE_IMPLEMENTED
 as_lower_case_inlines?  Sigh.  It's all the old-timers, I guess.
 
 EXT4_DIR_LINK_MAX() is buggy: it evaluates its arg twice.

#define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir)  (dir)-i_nlink = EXT4_LINK_MAX)

This just checks if directory has hash indexing in which case we need not worry 
about EXT4_LINK_MAX subdir limit. If directory is not hash indexed then we will 
need to enforce a max subdir limit. 

Sorry, I didn't understand what is the problem with this macro?

Thanks,
Kalpak.
This patch adds support to ext4 for allowing more than 65000
subdirectories. Currently the maximum number of subdirectories is capped
at 32000.

If we exceed 65000 subdirectories in an htree directory it sets the
inode link count to 1 and no longer counts subdirectories.  The
directory link count is not actually used when determining if a
directory is empty, as that only counts subdirectories and not regular
files that might be in there. 

A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
the subdir count for any directory crosses 65000. A later fsck will clear
EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory
with 65000 subdirs.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Kalpak Shah [EMAIL PROTECTED]


---
 fs/ext4/namei.c |   52 +++-
 include/linux/ext4_fs.h |4 ++-
 2 files changed, 41 insertions(+), 15 deletions(-)

Index: linux-2.6.22/fs/ext4/namei.c
===
--- linux-2.6.22.orig/fs/ext4/namei.c
+++ linux-2.6.22/fs/ext4/namei.c
@@ -1617,6 +1617,35 @@ static int ext4_delete_entry (handle_t *
 	return -ENOENT;
 }
 
+/*
+ * DIR_NLINK feature is set if 1) nlinks  EXT4_LINK_MAX or 2) nlinks == 2,
+ * since this indicates that nlinks count was previously 1.
+ */
+static void ext4_inc_count(handle_t *handle, struct inode *inode)
+{
+	inc_nlink(inode);
+	if (is_dx(inode)  inode-i_nlink  1) {
+		/* limit is 16-bit i_links_count */
+		if (inode-i_nlink = EXT4_LINK_MAX || inode-i_nlink == 2) {
+			inode-i_nlink = 1;
+			EXT4_SET_RO_COMPAT_FEATURE(inode-i_sb,
+	  EXT4_FEATURE_RO_COMPAT_DIR_NLINK);
+		}
+	}
+}
+
+/*
+ * If a directory had nlink == 1, then we should let it be 1. This indicates
+ * directory has EXT4_LINK_MAX subdirs.
+ */
+static void ext4_dec_count(handle_t *handle, struct inode *inode)
+{
+	drop_nlink(inode);
+	if (S_ISDIR(inode-i_mode)  inode-i_nlink == 0)
+		inc_nlink(inode);
+}
+
+
 static int ext4_add_nondir(handle_t *handle,
 		struct dentry *dentry, struct inode *inode)
 {
@@ -1713,7 +1742,7 @@ static int ext4_mkdir(struct inode * dir
 	struct ext4_dir_entry_2 * de;
 	int err, retries = 0;
 
-	if (dir-i_nlink = EXT4_LINK_MAX)
+	if (EXT4_DIR_LINK_MAX(dir))
 		return -EMLINK;
 
 retry:
@@ -1736,7 +1765,7 @@ retry:
 	inode-i_size = EXT4_I(inode)-i_disksize = inode-i_sb-s_blocksize;
 	dir_block = ext4_bread (handle, inode, 0, 1, err);
 	if (!dir_block) {
-		drop_nlink(inode); /* is this nlink == 0? */
+		ext4_dec_count(handle, inode); /* is this nlink == 0? */

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Andrew Morton
On Tue, 10 Jul 2007 16:32:47 -0700 Andrew Morton [EMAIL PROTECTED] wrote:

  +   brelse(bh);
  +   up_write(EXT4_I(inode)-xattr_sem);
  +   return error;
  +}
  +
 
 We're doing GFP_KERNEL memory allocations while holding xattr_sem.  This
 can cause the VM to reenter the filesystem, perhaps taking i_mutex and/or
 i_truncate_sem and/or journal_start() (I forget whether this still
 happens).  Have we checked whether this can occur and if so, whether we are
 OK from a lock ranking POV?  Bear in mind that journalled-data mode is more
 complex in this regard.

I notice that everyone carefully avoided addressing this ;)

Oh well, hopefully people are testing with lockdep enabled.  As long
as the fs is put under extreme memory pressure, most bugs should be reported.

Except lockdep doesn't know about journal_start(), which has ranking
requirements similar to a semaphore.  Nor does it know about lock_page().
We already have hard-to-hit but deadlockable bugs in this area.


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Christoph Hellwig
On Fri, Jul 13, 2007 at 07:48:58PM +0530, Amit K. Arora wrote:
 Ok. Since we have only one flag (FALLOC_FL_KEEP_SIZE) and we do not want
 to declare the default mode (FALLOC_ALLOCATE), we can _just_ have this
 flag and remove the other mode too (FALLOC_RESV_SPACE).
 Is this what you are suggesting ?

Yes.

 Should we need a header file just to declare one flag - i.e.
 FALLOC_FL_KEEP_SIZE (since now there is no point of declaring the two
 modes) ? If linux/fs.h is not a good place, will asm-generic/fcntl.h
 be a sane place for this flag ?

It might sound a litte silly but is the cleanest thing we could do by
far.  And I suspect there will be more more flags soon..

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6][TAKE7] manpage for fallocate

2007-07-13 Thread Amit K. Arora
On Sat, Jul 14, 2007 at 12:06:51AM +1000, David Chinner wrote:
 On Fri, Jul 13, 2007 at 06:16:01PM +0530, Amit K. Arora wrote:
  Following is the modified version of the manpage originally submitted by
  David Chinner. Please use `nroff -man fallocate.2 | less` to view.
  
  This includes changes suggested by Heikki Orsila and Barry Naujok.
 
 Can we get itemised change logs for all these patches from now on?

Sure.
 
  .TH fallocate 2
  .SH NAME
  fallocate \- allocate or remove file space
 
 If fallocate is just being used for allocating space this is wrong.
 maybe - manipulate file space instead?

Yes, it needs to be changed.
 
 dd .TP
  .B FALLOC_RESV_SPACE
  provides the same functionality as
  .B FALLOC_ALLOCATE
  except it does not ever change the file size. This allows allocation
  of zero blocks beyond the end of file and is useful for optimising
 
 of zeroed blocks

Ok.

--
Regards,
Amit Arora

 -- 
 Dave Chinner
 Principal Engineer
 SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6][TAKE7] revalidate write permissions for fallocate

2007-07-13 Thread Amit K. Arora
On Fri, Jul 13, 2007 at 02:21:37PM +0100, Christoph Hellwig wrote:
 On Fri, Jul 13, 2007 at 06:18:47PM +0530, Amit K. Arora wrote:
  From: David P. Quigley [EMAIL PROTECTED]
  
  Revalidate the write permissions for fallocate(2), in case security policy 
  has
  changed since the files were opened.
  
  Acked-by: James Morris [EMAIL PROTECTED]
  Signed-off-by: David P. Quigley [EMAIL PROTECTED]
 
 This should be merged into the main falloc patch.

Ok. Will merge it...

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Andreas Dilger
On Jul 13, 2007  15:33 +0200, Peter Zijlstra wrote:
 On Fri, 2007-07-13 at 02:05 -0700, Andrew Morton wrote:
 Or can journal_stop() be done by a different task than the one that did
 journal_start()? - in which case nothing much can be done :-/

The call to journal_stop() has to be in the same process, since the
journal handle is also held in current-journal_info so the handle
does not need to be passed as an argument all over the VFS.

 This seems to boot... albeit I did not push it hard.

Can you please also make a patch for jbd2.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-13 Thread Andrew Morton
On Fri, 13 Jul 2007 16:00:48 +0530 Kalpak Shah [EMAIL PROTECTED] wrote:


   - if (inode-i_nlink = EXT4_LINK_MAX)
   + if (EXT4_DIR_LINK_MAX(inode))
 return -EMLINK;
  
  argh.  WHY_IS_EXT4_FULL_OF_UPPER_CASE_MACROS_WHICH_COULD_BE_IMPLEMENTED
  as_lower_case_inlines?  Sigh.  It's all the old-timers, I guess.
  
  EXT4_DIR_LINK_MAX() is buggy: it evaluates its arg twice.
 
 #define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir)  (dir)-i_nlink = 
 EXT4_LINK_MAX)
 
 This just checks if directory has hash indexing in which case we need not 
 worry about EXT4_LINK_MAX subdir limit. If directory is not hash indexed then 
 we will need to enforce a max subdir limit. 
 
 Sorry, I didn't understand what is the problem with this macro?

Macros should never evaluate their argument more than once, because if they
do they will misbehave when someone passes them an
expression-with-side-effects:

struct inode *p = q;

EXT4_DIR_LINK_MAX(p++);

one expects `p' to have the value q+1 here.  But it might be q+2.

and

EXT4_DIR_LINK_MAX(some_function());

might cause some_function() to be called twice.


This is one of the many problems which gets fixed when we write code in C
rather than in cpp.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5][TAKE8] fallocate system call

2007-07-13 Thread Amit K. Arora
This is the latest fallocate patchset and is based on 2.6.22.

* Following are the changes from TAKE7:
1) Updated the man page.
2) Merged revalidate write permissions patch with the main falloc patch.
3) Added linux/falloc.h and moved FALLOC_FL_KEEP_SIZE flag to it.
   Also removed the two modes (FALLOC_ALLOCATE and FALLOC_RESV_SPACE).
4) Removed comment above sys_fallocate definition.
5) Updated the testcase below to use FALLOC_FL_KEEP_SIZE flag instead
   of previous two modes.

* Following are the changes from TAKE6:
1) We now just have two modes (and no deallocation modes).
2) Updated the man page
3) Added a new patch submitted by David P. Quigley  (Patch 3/6).
4) Used EXT_INIT_MAX_LEN instead of 0x8000 in Patch 6/6.
4) Included below in the end is a small testcase to test fallocate.


* Following are the changes from TAKE5 to TAKE6:
1) Rebased to 2.6.22
2) Added compat wrapper for x86_64
3) Dropped s390 and ia64 patches, since the platform maintaners can
   add the support for fallocate once it is in mainline.
4) Added a change suggested by Andreas for better extent-to-group
   alignment in ext4 (Patch 6/6). Please refer following post:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg02445.html
5) Renamed mode flags and values from FA_ to FALLOC_
6) Added manpage (updated version of the one initially submitted by
   David Chinner).


Todos:
-
1 Implementation on other architectures (other than i386, x86_64,
   and ppc64). s390(x) and ia64 patches are ready and will be pushed
   by platform maintaners when the fallocate is in mainline.
2 A generic file system operation to handle fallocate
   (generic_fallocate), for filesystems that do _not_ have the fallocate
   inode operation implemented.
3 Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()
4 Patch to e2fsprogs to recognize and display uninitialized extents.


Following patches follow:
Patch 1/5 : manpage for fallocate
Patch 2/5 : fallocate() implementation in i386, x86_64 and powerpc
Patch 3/5 : ext4: fallocate support in ext4
Patch 4/5 : ext4: write support for preallocated blocks
Patch 5/5 : ext4: change for better extent-to-group alignment

**
Attached below is a small testcase to test fallocate. The __NR_fallocate will
need to be changed depending on the system call number in the kernel (it may
get changed due to merge) and also depending on the architecture.

--
Regards,
Amit Arora



#include stdio.h
#include stdlib.h
#include fcntl.h
#include errno.h

#include linux/unistd.h
#include sys/vfs.h
#include sys/stat.h

#define VERBOSE 0

#define __NR_fallocate324

#define FALLOC_FL_KEEP_SIZE 0x01

int do_fallocate(int fd, int mode, loff_t offset, loff_t len)
{
  int ret;

  if (VERBOSE)
printf(Trying to preallocate blocks (offset=%llu, len=%llu)\n,
offset, len);
  ret = syscall(__NR_fallocate, fd, mode, offset, len);

  if (ret 0) {
printf(SYSCALL: received error %d, ret=%d\n, errno, ret);
close(fd);
return(1);
  }

  if (VERBOSE)
printf(fallocate system call succedded !  ret=%d\n, ret);

  return ret;
}

int test_fallocate(int fd, int mode, loff_t offset, loff_t len)
{
  int ret, blocks;
  struct stat statbuf1, statbuf2;

  fstat(fd, statbuf1);

  ret = do_fallocate(fd, mode, offset, len);

  fstat(fd, statbuf2);

  /* check file size after preallocation */
  if (!mode) {
if (!ret  statbuf1.st_size  (offset + len) 
statbuf2.st_size != (offset + len)) {
printf(Error: fallocate succeeded, but the file size did not 
change, where it should have!\n);
ret = 1;
}
  } else if (statbuf1.st_size != statbuf2.st_size) {
printf(Error : File size changed, when it should not have!\n);
ret = 1;
  }

  blocks = ((statbuf2.st_blocks - statbuf1.st_blocks) * 512)/ 
statbuf2.st_blksize;

  /* Print report */
  printf(# FALLOCATE TEST REPORT #\n);
  printf(\tNew blocks preallocated = %d.\n, blocks);
  printf(\tNumber of bytes preallocated = %d\n, blocks * statbuf2.st_blksize);
  printf(\tOld file size = %d, New file size %d.\n,
  statbuf1.st_size, statbuf2.st_size);
  printf(\tOld num blocks = %d, New num blocks %d.\n,
  (statbuf1.st_blocks * 512)/1024, (statbuf2.st_blocks * 512)/1024);

  return ret;
}


int do_write(int fd, loff_t offset, loff_t len)
{
  int ret;
  char *buf;

  buf = (char *)malloc(len);
  if (!buf) {
printf(error: malloc failed.\n);
return(-1);
  }

  if (VERBOSE)
printf(Trying to write to file (offset=%llu, len=%llu)\n, 
offset, len);

  ret = lseek(fd, offset, SEEK_SET);
  if (ret != offset) {
printf(lseek() failed error=%d, ret=%d\n, errno, ret);
close(fd); 
return(-1);
  }

  ret = write(fd, buf, len);
  if (ret != len) {
 printf(write() failed error=%d, ret=%d\n, errno, ret);

[PATCH 1/5][TAKE8] manpage for fallocate

2007-07-13 Thread Amit K. Arora
Following is the modified version of the manpage originally submitted by
David Chinner. Please use `nroff -man fallocate.2 | less` to view.

Following changed from TAKE7:
* Removed FALLOC_ALLOCATE and FALLOCATE_RESV_SPACE modes.
* Described only single flag for mode, i.e. FALLOC_FL_KEEP_SIZE.
* s/zero blocks/zeroed blocks/ as suggested by Dave.
* Included linux/falloc.h instead of fcntl.h.

Following changed from TAKE6 to TAKE7:
Included changes suggested by Heikki Orsila and Barry Naujok.


.TH fallocate 2
.SH NAME
fallocate \- manipulate file space
.SH SYNOPSIS
.nf
.B #include linux/falloc.h
.PP
.BI long fallocate(int  fd , int  mode , loff_t  offset , loff_t  len 
);
.SH DESCRIPTION
The
.B fallocate
syscall allows a user to directly manipulate the allocated disk space
for the file referred to by
.I fd
for the byte range starting at
.I offset
and continuing for
.I len
bytes.
The
.I mode
parameter determines the operation to be performed on the given range.
Currently there is only one flag supported for the mode argument.
.TP
.B FALLOC_FL_KEEP_SIZE
allocates and initialises to zero the disk space within the given range.
After a successful call, subsequent writes are guaranteed not to fail because
of lack of disk space.  Even if the size of the file is less than
.IR offset + len ,
the file size is not changed. This allows allocation of zeroed blocks beyond
the end of file and is useful for optimising append workloads.
.PP
If
.B FALLOC_FL_KEEP_SIZE
flag is not specified in the mode argument, the default behavior of this system
call is almost same as when this flag is passed. The only difference is that
on success, the file size will be changed if the
.IR offset + len
is greater than the file size. This default behavior closely resembles
.BR posix_fallocate (3)
and is intended as a method of optimally implementing this function.
.PP
.B fallocate
may allocate a larger range than that was specified.
.SH RETURN VALUE
.B fallocate
returns zero on success, or an error number on failure.
Note that
.I errno
is not set.
.SH ERRORS
.TP
.B EBADF
.I fd
is not a valid file descriptor, or is not opened for writing.
.TP
.B EFBIG
.IR offset + len
exceeds the maximum file size.
.TP
.B EINVAL
.I offset
was less than 0, or
.I len
was less than or equal to 0.
.TP
.B ENODEV
.I fd
does not refer to a regular file or a directory.
.TP
.B ENOSPC
There is not enough space left on the device containing the file
referred to by
.IR fd .
.TP
.B ESPIPE
.I fd
refers to a pipe of file descriptor.
.TP
.B ENOSYS
The filesystem underlying the file descriptor does not support this
operation.
.TP
.B EINTR
A signal was caught during execution
.TP
.B EIO
An I/O error occurred while reading from or writing to a file system.
.TP
.B EOPNOTSUPP
The mode is not supported on the file descriptor.
.SH AVAILABILITY
The
.B fallocate
system call is available since 2.6.XX
.SH SEE ALSO
.BR posix_fallocate (3),
.BR posix_fadvise (3),
.BR ftruncate (3).
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5][TAKE8] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Amit K. Arora
From: Amit Arora [EMAIL PROTECTED]

sys_fallocate() implementation on i386, x86_64 and powerpc

fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called -fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()

CHANGELOG:
-
Following changed from TAKE7:
1. Added linux/falloc.h and moved FALLOC_FL_KEEP_SIZE flag to it.
2. Removed the two modes (FALLOC_ALLOCATE and FALLOC_RESV_SPACE).
3. Merged revalidate write permissions patch from David P. Quigley
   to this patch.
4. Deleted comment above sys_fallocate definition, as suggested by Christoph.


Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.22.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6.22/arch/i386/kernel/syscall_table.S
@@ -323,3 +323,4 @@ ENTRY(sys_call_table)
.long sys_signalfd
.long sys_timerfd
.long sys_eventfd
+   .long sys_fallocate
Index: linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c
===
--- linux-2.6.22.orig/arch/powerpc/kernel/sys_ppc32.c
+++ linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c
@@ -773,6 +773,13 @@ asmlinkage int compat_sys_truncate64(con
return sys_truncate(path, (high  32) | low);
 }
 
+asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo,
+u32 lenhi, u32 lenlo)
+{
+   return sys_fallocate(fd, mode, ((loff_t)offhi  32) | offlo,
+((loff_t)lenhi  32) | lenlo);
+}
+
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
Index: linux-2.6.22/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.22.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6.22/arch/x86_64/ia32/ia32entry.S
@@ -719,4 +719,5 @@ ia32_sys_call_table:
.quad compat_sys_signalfd
.quad compat_sys_timerfd
.quad sys_eventfd
+   .quad sys32_fallocate
 ia32_syscall_end:
Index: linux-2.6.22/fs/open.c
===
--- linux-2.6.22.orig/fs/open.c
+++ linux-2.6.22/fs/open.c
@@ -26,6 +26,7 @@
 #include linux/syscalls.h
 #include linux/rcupdate.h
 #include linux/audit.h
+#include linux/falloc.h
 
 int vfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
@@ -352,6 +353,64 @@ asmlinkage long sys_ftruncate64(unsigned
 }
 #endif
 
+asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)
+{
+   struct file *file;
+   struct inode *inode;
+   long ret = -EINVAL;
+
+   if (offset  0 || len = 0)
+   goto out;
+
+   /* Return error if mode is not supported */
+   ret = -EOPNOTSUPP;
+   if (mode  !(mode  FALLOC_FL_KEEP_SIZE))
+   goto out;
+
+   ret = -EBADF;
+   file = fget(fd);
+   if (!file)
+   goto out;
+   if (!(file-f_mode  FMODE_WRITE))
+   goto out_fput;
+   /*
+* Revalidate the write permissions, in case security policy has
+* changed since the files were opened.
+*/
+   ret = security_file_permission(file, MAY_WRITE);
+   if (ret)
+   goto out_fput;
+
+   inode = file-f_path.dentry-d_inode;
+
+   ret = -ESPIPE;
+   if (S_ISFIFO(inode-i_mode))
+   goto out_fput;
+
+   ret = -ENODEV;
+   /*
+* Let individual file system 

[PATCH 3/5][TAKE8] ext4: fallocate support in ext4

2007-07-13 Thread Amit K. Arora
From: Amit Arora [EMAIL PROTECTED]

fallocate support in ext4

This patch implements -fallocate() inode operation in ext4. With this
patch users of ext4 file systems will be able to use fallocate() system
call for persistent preallocation. Current implementation only supports
preallocation for regular files (directories not supported as of date)
with extent maps. This patch does not support block-mapped files currently.
Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of
now.

CHANGELOG:
-
Following changed from TAKE7:
1. Removed usage of FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes and
   used FALLOC_FL_KEEP_SIZE mode flag instead.
2. Included  linux/falloc.h new header file, which defines above flag.


Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/fs/ext4/extents.c
===
--- linux-2.6.22.orig/fs/ext4/extents.c
+++ linux-2.6.22/fs/ext4/extents.c
@@ -39,6 +39,7 @@
 #include linux/quotaops.h
 #include linux/string.h
 #include linux/slab.h
+#include linux/falloc.h
 #include linux/ext4_fs_extents.h
 #include asm/uaccess.h
 
@@ -282,7 +283,7 @@ static void ext4_ext_show_path(struct in
} else if (path-p_ext) {
ext_debug(  %d:%d:%llu ,
  le32_to_cpu(path-p_ext-ee_block),
- le16_to_cpu(path-p_ext-ee_len),
+ ext4_ext_get_actual_len(path-p_ext),
  ext_pblock(path-p_ext));
} else
ext_debug(  []);
@@ -305,7 +306,7 @@ static void ext4_ext_show_leaf(struct in
 
for (i = 0; i  le16_to_cpu(eh-eh_entries); i++, ex++) {
ext_debug(%d:%d:%llu , le32_to_cpu(ex-ee_block),
- le16_to_cpu(ex-ee_len), ext_pblock(ex));
+ ext4_ext_get_actual_len(ex), ext_pblock(ex));
}
ext_debug(\n);
 }
@@ -425,7 +426,7 @@ ext4_ext_binsearch(struct inode *inode, 
ext_debug(  - %d:%llu:%d ,
le32_to_cpu(path-p_ext-ee_block),
ext_pblock(path-p_ext),
-   le16_to_cpu(path-p_ext-ee_len));
+   ext4_ext_get_actual_len(path-p_ext));
 
 #ifdef CHECK_BINSEARCH
{
@@ -686,7 +687,7 @@ static int ext4_ext_split(handle_t *hand
ext_debug(move %d:%llu:%d in new leaf %llu\n,
le32_to_cpu(path[depth].p_ext-ee_block),
ext_pblock(path[depth].p_ext),
-   le16_to_cpu(path[depth].p_ext-ee_len),
+   ext4_ext_get_actual_len(path[depth].p_ext),
newblock);
/*memmove(ex++, path[depth].p_ext++,
sizeof(struct ext4_extent));
@@ -1106,7 +1107,19 @@ static int
 ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
struct ext4_extent *ex2)
 {
-   if (le32_to_cpu(ex1-ee_block) + le16_to_cpu(ex1-ee_len) !=
+   unsigned short ext1_ee_len, ext2_ee_len;
+
+   /*
+* Make sure that either both extents are uninitialized, or
+* both are _not_.
+*/
+   if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
+   return 0;
+
+   ext1_ee_len = ext4_ext_get_actual_len(ex1);
+   ext2_ee_len = ext4_ext_get_actual_len(ex2);
+
+   if (le32_to_cpu(ex1-ee_block) + ext1_ee_len !=
le32_to_cpu(ex2-ee_block))
return 0;
 
@@ -1115,14 +1128,14 @@ ext4_can_extents_be_merged(struct inode 
 * as an RO_COMPAT feature, refuse to merge to extents if
 * this can result in the top bit of ee_len being set.
 */
-   if (le16_to_cpu(ex1-ee_len) + le16_to_cpu(ex2-ee_len)  EXT_MAX_LEN)
+   if (ext1_ee_len + ext2_ee_len  EXT_MAX_LEN)
return 0;
 #ifdef AGGRESSIVE_TEST
if (le16_to_cpu(ex1-ee_len) = 4)
return 0;
 #endif
 
-   if (ext_pblock(ex1) + le16_to_cpu(ex1-ee_len) == ext_pblock(ex2))
+   if (ext_pblock(ex1) + ext1_ee_len == ext_pblock(ex2))
return 1;
return 0;
 }
@@ -1144,7 +1157,7 @@ unsigned int ext4_ext_check_overlap(stru
unsigned int ret = 0;
 
b1 = le32_to_cpu(newext-ee_block);
-   len1 = le16_to_cpu(newext-ee_len);
+   len1 = ext4_ext_get_actual_len(newext);
depth = ext_depth(inode);
if (!path[depth].p_ext)
goto out;
@@ -1191,8 +1204,9 @@ int ext4_ext_insert_extent(handle_t *han
struct ext4_extent *nearex; /* nearest extent */
struct ext4_ext_path *npath = NULL;
int depth, len, err, next;
+   unsigned uninitialized = 0;
 
-   BUG_ON(newext-ee_len == 0);
+   BUG_ON(ext4_ext_get_actual_len(newext) == 0);
depth = ext_depth(inode);

[PATCH 4/5][TAKE8] ext4: write support for preallocated blocks

2007-07-13 Thread Amit K. Arora
From:  Amit Arora [EMAIL PROTECTED]

write support for preallocated blocks

This patch adds write support to the uninitialized extents that get
created when a preallocation is done using fallocate(). It takes care of
splitting the extents into multiple (upto three) extents and merging the
new split extents with neighbouring ones, if possible.

CHANGELOG:
-
This patch did not change from TAKE7 (besides offsets ;).


Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/fs/ext4/extents.c
===
--- linux-2.6.22.orig/fs/ext4/extents.c
+++ linux-2.6.22/fs/ext4/extents.c
@@ -1141,6 +1141,53 @@ ext4_can_extents_be_merged(struct inode 
 }
 
 /*
+ * This function tries to merge the ex extent to the next extent in the tree.
+ * It always tries to merge towards right. If you want to merge towards
+ * left, pass ex - 1 as argument instead of ex.
+ * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
+ * 1 if they got merged.
+ */
+int ext4_ext_try_to_merge(struct inode *inode,
+ struct ext4_ext_path *path,
+ struct ext4_extent *ex)
+{
+   struct ext4_extent_header *eh;
+   unsigned int depth, len;
+   int merge_done = 0;
+   int uninitialized = 0;
+
+   depth = ext_depth(inode);
+   BUG_ON(path[depth].p_hdr == NULL);
+   eh = path[depth].p_hdr;
+
+   while (ex  EXT_LAST_EXTENT(eh)) {
+   if (!ext4_can_extents_be_merged(inode, ex, ex + 1))
+   break;
+   /* merge with next extent! */
+   if (ext4_ext_is_uninitialized(ex))
+   uninitialized = 1;
+   ex-ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
+   + ext4_ext_get_actual_len(ex + 1));
+   if (uninitialized)
+   ext4_ext_mark_uninitialized(ex);
+
+   if (ex + 1  EXT_LAST_EXTENT(eh)) {
+   len = (EXT_LAST_EXTENT(eh) - ex - 1)
+   * sizeof(struct ext4_extent);
+   memmove(ex + 1, ex + 2, len);
+   }
+   eh-eh_entries = cpu_to_le16(le16_to_cpu(eh-eh_entries) - 1);
+   merge_done = 1;
+   WARN_ON(eh-eh_entries == 0);
+   if (!eh-eh_entries)
+   ext4_error(inode-i_sb, ext4_ext_try_to_merge,
+  inode#%lu, eh-eh_entries = 0!, inode-i_ino);
+   }
+
+   return merge_done;
+}
+
+/*
  * check if a portion of the newext extent overlaps with an
  * existing extent.
  *
@@ -1328,25 +1375,7 @@ has_space:
 
 merge:
/* try to merge extents to the right */
-   while (nearex  EXT_LAST_EXTENT(eh)) {
-   if (!ext4_can_extents_be_merged(inode, nearex, nearex + 1))
-   break;
-   /* merge with next extent! */
-   if (ext4_ext_is_uninitialized(nearex))
-   uninitialized = 1;
-   nearex-ee_len = cpu_to_le16(ext4_ext_get_actual_len(nearex)
-   + ext4_ext_get_actual_len(nearex + 1));
-   if (uninitialized)
-   ext4_ext_mark_uninitialized(nearex);
-
-   if (nearex + 1  EXT_LAST_EXTENT(eh)) {
-   len = (EXT_LAST_EXTENT(eh) - nearex - 1)
-   * sizeof(struct ext4_extent);
-   memmove(nearex + 1, nearex + 2, len);
-   }
-   eh-eh_entries = cpu_to_le16(le16_to_cpu(eh-eh_entries)-1);
-   BUG_ON(eh-eh_entries == 0);
-   }
+   ext4_ext_try_to_merge(inode, path, nearex);
 
/* try to merge extents to the left */
 
@@ -2012,15 +2041,158 @@ void ext4_ext_release(struct super_block
 #endif
 }
 
+/*
+ * This function is called by ext4_ext_get_blocks() if someone tries to write
+ * to an uninitialized extent. It may result in splitting the uninitialized
+ * extent into multiple extents (upto three - one initialized and two
+ * uninitialized).
+ * There are three possibilities:
+ *   a There is no split required: Entire extent should be initialized
+ *   b Splits in two extents: Write is happening at either end of the extent
+ *   c Splits in three extents: Somone is writing in middle of the extent
+ */
+int ext4_ext_convert_to_initialized(handle_t *handle, struct inode *inode,
+   struct ext4_ext_path *path,
+   ext4_fsblk_t iblock,
+   unsigned long max_blocks)
+{
+   struct ext4_extent *ex, newex;
+   struct ext4_extent *ex1 = NULL;
+   struct ext4_extent *ex2 = NULL;
+   struct ext4_extent *ex3 = NULL;
+   struct ext4_extent_header *eh;
+   unsigned int allocated, ee_block, ee_len, depth;
+   ext4_fsblk_t newblock;
+   int err = 0;
+   int ret = 0;
+
+ 

[PATCH 5/5][TAKE8] ext4: change for better extent-to-group alignment

2007-07-13 Thread Amit K. Arora
From: Amit Arora [EMAIL PROTECTED]

Change on-disk format for extent to represent uninitialized/initialized extents

This change was suggested by Andreas Dilger. 
This patch changes the EXT_MAX_LEN value and extent code which marks/checks
uninitialized extents. With this change it will be possible to have
initialized extents with 2^15 blocks (earlier the max blocks we could have
was 2^15 - 1). This way we can have better extent-to-block alignment.
Now, maximum number of blocks we can have in an initialized extent is 2^15
and in an uninitialized extent is 2^15 - 1.


CHANGELOG:
-
This patch did not change from TAKE7 (besides offsets ;).

Following changed from TAKE6 to TAKE7:
1. Taken care of Andreas's suggestion of using EXT_INIT_MAX_LEN instead of
   0x8000 at some places.

Signed-off-by: Amit Arora [EMAIL PROTECTED]

Index: linux-2.6.22/fs/ext4/extents.c
===
--- linux-2.6.22.orig/fs/ext4/extents.c
+++ linux-2.6.22/fs/ext4/extents.c
@@ -1107,7 +1107,7 @@ static int
 ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
struct ext4_extent *ex2)
 {
-   unsigned short ext1_ee_len, ext2_ee_len;
+   unsigned short ext1_ee_len, ext2_ee_len, max_len;
 
/*
 * Make sure that either both extents are uninitialized, or
@@ -1116,6 +1116,11 @@ ext4_can_extents_be_merged(struct inode 
if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
return 0;
 
+   if (ext4_ext_is_uninitialized(ex1))
+   max_len = EXT_UNINIT_MAX_LEN;
+   else
+   max_len = EXT_INIT_MAX_LEN;
+
ext1_ee_len = ext4_ext_get_actual_len(ex1);
ext2_ee_len = ext4_ext_get_actual_len(ex2);
 
@@ -1128,7 +1133,7 @@ ext4_can_extents_be_merged(struct inode 
 * as an RO_COMPAT feature, refuse to merge to extents if
 * this can result in the top bit of ee_len being set.
 */
-   if (ext1_ee_len + ext2_ee_len  EXT_MAX_LEN)
+   if (ext1_ee_len + ext2_ee_len  max_len)
return 0;
 #ifdef AGGRESSIVE_TEST
if (le16_to_cpu(ex1-ee_len) = 4)
@@ -1815,7 +1820,11 @@ ext4_ext_rm_leaf(handle_t *handle, struc
 
ex-ee_block = cpu_to_le32(block);
ex-ee_len = cpu_to_le16(num);
-   if (uninitialized)
+   /*
+* Do not mark uninitialized if all the blocks in the
+* extent have been removed.
+*/
+   if (uninitialized  num)
ext4_ext_mark_uninitialized(ex);
 
err = ext4_ext_dirty(handle, inode, path + depth);
@@ -2308,6 +2317,19 @@ int ext4_ext_get_blocks(handle_t *handle
/* allocate new block */
goal = ext4_ext_find_goal(inode, path, iblock);
 
+   /*
+* See if request is beyond maximum number of blocks we can have in
+* a single extent. For an initialized extent this limit is
+* EXT_INIT_MAX_LEN and for an uninitialized extent this limit is
+* EXT_UNINIT_MAX_LEN.
+*/
+   if (max_blocks  EXT_INIT_MAX_LEN 
+   create != EXT4_CREATE_UNINITIALIZED_EXT)
+   max_blocks = EXT_INIT_MAX_LEN;
+   else if (max_blocks  EXT_UNINIT_MAX_LEN 
+create == EXT4_CREATE_UNINITIALIZED_EXT)
+   max_blocks = EXT_UNINIT_MAX_LEN;
+
/* Check if we can really insert (iblock)::(iblock+max_blocks) extent */
newex.ee_block = cpu_to_le32(iblock);
newex.ee_len = cpu_to_le16(max_blocks);
Index: linux-2.6.22/include/linux/ext4_fs_extents.h
===
--- linux-2.6.22.orig/include/linux/ext4_fs_extents.h
+++ linux-2.6.22/include/linux/ext4_fs_extents.h
@@ -141,7 +141,25 @@ typedef int (*ext_prepare_callback)(stru
 
 #define EXT_MAX_BLOCK  0x
 
-#define EXT_MAX_LEN((1UL  15) - 1)
+/*
+ * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
+ * initialized extent. This is 2^15 and not (2^16 - 1), since we use the
+ * MSB of ee_len field in the extent datastructure to signify if this
+ * particular extent is an initialized extent or an uninitialized (i.e.
+ * preallocated).
+ * EXT_UNINIT_MAX_LEN is the maximum number of blocks we can have in an
+ * uninitialized extent.
+ * If ee_len is = 0x8000, it is an initialized extent. Otherwise, it is an
+ * uninitialized one. In other words, if MSB of ee_len is set, it is an
+ * uninitialized extent with only one special scenario when ee_len = 0x8000.
+ * In this case we can not have an uninitialized extent of zero length and
+ * thus we make it as a special case of initialized extent with 0x8000 length.
+ * This way we get better extent-to-group alignment for initialized extents.
+ * Hence, the maximum number of blocks we can have in an *initialized*
+ * extent is 2^15 (32768) and in an *uninitialized* extent is 2^15-1 

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Andrew Morton
On Fri, 13 Jul 2007 15:33:41 +0200
Peter Zijlstra [EMAIL PROTECTED] wrote:

 On Fri, 2007-07-13 at 02:05 -0700, Andrew Morton wrote:
 
  Except lockdep doesn't know about journal_start(), which has ranking
  requirements similar to a semaphore.  
 
 Something like so?

Looks OK.

 Or can journal_stop() be done by a different task than the one that did
 journal_start()? - in which case nothing much can be done :-/

Yeah, journal_start() and journal_stop() are well-behaved.
 
 This seems to boot... albeit I did not push it hard.

I fear the consequences of this change :(

Oh well, please keep it alive, maybe beat on it a bit, resend it
later on?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-13 Thread Mingming Cao
On Fri, 2007-07-13 at 12:35 +0530, Kalpak Shah wrote:
 On Fri, 2007-07-13 at 09:59 +0530, Aneesh Kumar K.V wrote:
  
  Kalpak Shah wrote:
   On Tue, 2007-07-10 at 16:30 -0700, Andrew Morton wrote:
   On Sun, 01 Jul 2007 03:36:56 -0400
   Mingming Cao [EMAIL PROTECTED] wrote:
  
   This patch is a spinoff of the old nanosecond patches.
   I don't know what the old nanosecond patches are.  A link to a suitable
   changlog for those patches would do in a pinch.  Preferable would be to
   write a proper changelog for this patch.
   
   The incremental patch contains a proper changelog describing the patch.
   
  
  
  Instead of  putting incremental patches it would be nice if we can have 
  replacement patches.
  for the already existing patches with the comments addressed. For example 
  if we have a 
  review comment on the patch message ( commit log ) then adding an 
  incremental patch doesn't help.
 
 I think that it would be easier to review just the changes that have
 been made to the patches instead of having people go through the entire
 patch again. I was hoping that someone with write access to ext4-git
 would update the commit logs.
 
 If replacement patches are preferred, then I will send them again.
 

No need, I already fold your fix patch to the parent patches, so in the
updated ext4-patch-queue it saved the updated nanosecond patch.

 Thanks,
 Kalpak.
 
  
  
  -aneesh
  -
  To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Zach Brown
 I fear the consequences of this change :(

I love it.  In the past I've lost time by working with patches which
didn't quite realize that ext3 holds a transaction open during
-direct_IO.

 Oh well, please keep it alive, maybe beat on it a bit, resend it
 later on?

I can test the patch to make sure that it catches mistakes I've made in
the past.  Peter, do you have any interest in seeing how far we can get
at tracking lock_page()?  I'm not holding my breath, but any little bit
would probably help.

- z
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


lease and lock patches

2007-07-13 Thread J. Bruce Fields
Please pull from the 'for-linus' branch at

  git://linux-nfs.org/~bfields/linux.git for-linus

for a series of patches which add a setlease() file method.  The
longer-term goal is to allow cluster and network filesystems to give out
consistent leases when possible, in particular to allow nfsd to give out
delegations on cluster filesystems.  For now, though, we're using this
just to disallow leases selectively on certain filesystems (nfs and gfs2
for now) where they don't make sense.

Also includes some minor locks.c cleanup.

J. Bruce Fields (9):
  locks: convert an -EINVAL return to a BUG
  locks: clean up lease_alloc()
  locks: share more common lease code
  locks: rename lease functions to reflect locks.c conventions
  locks: provide a file lease method enabling cluster-coherent leases
  locks: export setlease to filesystems
  nfs: disable leases over NFS
  locks: make posix_test_lock() interface more consistent
  locks: fix vfs_test_lock() comment

Marc Eshel (1):
  gfs2: stop giving out non-cluster-coherent leases

david m. richter (1):
  leases: minor break_lease() comment clarification

 fs/gfs2/ops_file.c  |   24 +++
 fs/locks.c  |  112 ++
 fs/nfs/file.c   |   16 +++-
 fs/nfsd/nfs4state.c |   10 ++--
 include/linux/fs.h  |4 +-
 5 files changed, 105 insertions(+), 61 deletions(-)
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] isofs: mounting to regular file may succeed

2007-07-13 Thread Kirill Kuvaldin
It turned out that mounting a corrupted ISO image to a regular file may
succeed, e.g. if an image was prepared as follows:

$ dd if=correct.iso of=bad.iso bs=4k count=8

We then can mount it to a regular file:

# mount -o loop -t iso9660 bad.iso /tmp/file

But mounting it to a directory fails with -ENOTDIR, simply because 
the root directory inode doesn't have S_IFDIR set and the condition
in graft_tree() is met:

if (S_ISDIR(nd-dentry-d_inode-i_mode) !=
  S_ISDIR(mnt-mnt_root-d_inode-i_mode))
return -ENOTDIR

This is because the root directory inode was read from an incorrect
block. It's supposed to be read from sbi-s_firstdatazone, which is
an absolute value and gets messed up in the case of an incorrect image.

In order to somehow circumvent this we have to check that the root
directory inode is actually a directory after all.


Signed-off-by: Kirill Kuvaldin [EMAIL PROTECTED]

diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 5c3eecf..ce5062a 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -840,6 +840,15 @@ root_found:
goto out_no_root;
if (!inode-i_op)
goto out_bad_root;
+
+   /* Make sure the root inode is a directory */
+   if (!S_ISDIR(inode-i_mode)) {
+   printk(KERN_WARNING
+   isofs_fill_super: root inode is not a directory. 
+   Corrupted media?\n);
+   goto out_iput;
+   }
+
/* get the root dentry */
s-s_root = d_alloc_root(inode);
if (!(s-s_root))
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Andreas Dilger
On Jul 13, 2007  02:05 -0700, Andrew Morton wrote:
 On Tue, 10 Jul 2007 16:32:47 -0700 Andrew Morton [EMAIL PROTECTED] wrote:
 
   + brelse(bh);
   + up_write(EXT4_I(inode)-xattr_sem);
   + return error;
   +}
   +
  
  We're doing GFP_KERNEL memory allocations while holding xattr_sem.  This
  can cause the VM to reenter the filesystem, perhaps taking i_mutex and/or
  i_truncate_sem and/or journal_start() (I forget whether this still
  happens).  Have we checked whether this can occur and if so, whether we are
  OK from a lock ranking POV?  Bear in mind that journalled-data mode is more
  complex in this regard.
 
 I notice that everyone carefully avoided addressing this ;)
 
 Oh well, hopefully people are testing with lockdep enabled.  As long
 as the fs is put under extreme memory pressure, most bugs should be reported.

I have no objection to changing these to GFP_NOFS or GFP_ATOMIC, because
the number of times this function is called is really quite small (only
for existing inodes when the size of the fixed fields in the inode is
increasing) and the buffers are freed immediately so this won't put any
undue strain on the atomic memory pools.

That said, there is also a GFP_KERNEL allocations in ext3_xattr_block_set()
under xattr_sem, so the same problem would exist there.

I also just noticed that buffer and b_entry_name are leaked in
ext4_expand_extra_isize() if the while loop is run more than one time
(again a relatively rare event).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html