Re: [RFC] ext3 freeze feature ver 0.2

2008-02-26 Thread Eric Sandeen
Takashi Sato wrote:

 o Elevate XFS ioctl numbers (XFS_IOC_FREEZE and XFS_IOC_THAW) to the VFS
   As Andreas Dilger and Christoph Hellwig advised me, I have elevated
   them to include/linux/fs.h as below.
 #define FIFREEZE_IOWR('X', 119, int)
    #define FITHAW  _IOWR('X', 120, int)
   The ioctl numbers used by XFS applications don't need to be changed.
   But my following ioctl for the freeze needs the parameter
   as the timeout period.  So if XFS applications don't want the timeout
   feature as the current implementation, the parameter needs to be
   changed 1 (level?) into 0.

So, existing xfs applications calling the xfs ioctl now will behave
differently, right?  We can only keep the same ioctl number if the
calling semantics are the same.  Keeping the same number but changing
the semantics is harmful, IMHO

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-25 Thread Eric Sandeen
Theodore Tso wrote:
 On Sun, Feb 24, 2008 at 09:20:50PM -0700, Andreas Dilger wrote:
 On Feb 22, 2008  19:15 -0500, Theodore Ts'o wrote:
 So before the recent patch were we actually creating long symlinks in
 extents format?  Or were we just setting the flag but still treating
 them as a block number?  If it was the latter, I guess we can put in
 code into e2fsck to detect that case, and convert it back to a
 singleton block number.  
 Eric informed me that the long symlinks were actually stored in extent
 mapped blocks.  That is not harmful, because it can only be a single
 block and it will always fit into the inode.  The other thing to note
 is that extent mapping is REQUIRED for  32-bit blocknumbers, so we
 may as well fix e2fsprogs to allow these symlinks to be handled normally.
 
 Well, at least some kernel versions (as of sometime just before
 2.6.25, iirc) were storing the long symlink as a single block in
 i_block[0], despite EXTENTS_FL being set.  Valerie noticed this, and I
 confirmed it, as it caused the mainline e2fsck extents support to core
 dump.  Basically, what this means is that e2fsprogs can't trust
 EXTENTS_FL for long symlinks.

Are you sure?  This was her patch comment, from
[PATCH] ext4: Don't set EXTENTS_FL flag for fast symlinks:

 For fast symbolic links, the file content is stored in the i_block[]
 array, which is not compatible with the new file extents format.
 e2fsck reports error on such files because EXTENTS_FL is set.
 Don't set the EXTENTS_FL flag when creating fast symlinks.

so this was for *fast* symlinks, stored internally on top of the i_block
array, right?  In that case trying to read it as extents would certainly
cause problems.  But, for *long* symlinks I think they truly were stored
in extent format, and as you say...

 But you do raise a good point that we need to support using the
 extents format in order to support blocks  2**32, so we can't just
 arbitrary convert all symlinks to the old-style direct block maps.

... so I think we really *should* be unconditionally storing *long*
symlinks in extent format, on ext4... right?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Eric Sandeen
Theodore Tso wrote:
 This looks good, but I assume that the bug was caused by some race
 condition where if you try to call dm_task_get_info() while some other
 process is creating or removing a snapshot, dm_task_get_info() is
 returning some kind of EAGAIN, or some other Try again; we're busy
 error, right?
 
 If that is the case, can you try to find out what error is being
 returned?  It may be the right thing to do is to check to see if we
 are getting a resource is locked; try again in a sec error message,
 and retry the dm_task_get_info(), instead of just returning a failure.

well, dm_task_get_info just returns either 0 or 1; unless there is some
other contextual piece of information to use, I don't know if we can
differentiate between error types.  I'll ask agk...

 Thanks!!
 
   - Ted

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Eric Sandeen
Theodore Tso wrote:
 On Fri, Feb 22, 2008 at 09:02:56AM -0600, Eric Sandeen wrote:
 Theodore Tso wrote:
 This looks good, but I assume that the bug was caused by some race
 condition where if you try to call dm_task_get_info() while some other
 process is creating or removing a snapshot, dm_task_get_info() is
 returning some kind of EAGAIN, or some other Try again; we're busy
 error, right?

 If that is the case, can you try to find out what error is being
 returned?  It may be the right thing to do is to check to see if we
 are getting a resource is locked; try again in a sec error message,
 and retry the dm_task_get_info(), instead of just returning a failure.
 well, dm_task_get_info just returns either 0 or 1; unless there is some
 other contextual piece of information to use, I don't know if we can
 differentiate between error types.  I'll ask agk...
 
 Maybe the right thing is to try 3 times before giving up, maybe with a
 nanosleep in between, or some such?  Hopefully agk can give us some
 hints about what's the right way to handle errors from all of the
 dm_task* calls.

From a quick chat with agk, it sounds like outright failure is
appropriate.  Sounds like most of the calls fail for reasons like ENOMEM
(but it might be nice if it returned that, eh?)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Eric Sandeen
Theodore Tso wrote:
 On Fri, Feb 22, 2008 at 10:16:53AM -0600, Eric Sandeen wrote:
 From a quick chat with agk, it sounds like outright failure is
 appropriate.  Sounds like most of the calls fail for reasons like ENOMEM
 (but it might be nice if it returned that, eh?)
 
 So the question then is why is it that Phillip was able to seeing
 failures when he was creating and deleting snapshots?
 
 I don't mind having blkid return a failure, but it may not fix
 Phillip's scenario which he listed in BZ #433857; yeah, he won't have
 a core dump, which is good, but it might mean that some or all of the
 dm volumes disappear from the blkid results.

Maybe a device-mapper bug is in order :)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck dies with error this should never happen!!!

2008-02-22 Thread Eric Sandeen
Bas van Schaik wrote:

 In a few hours I will hit the road towards France for a holiday, from
 which I will return on the 3rd of March. I would be _really_ grateful if
 you could assist me solving this problem.

Providing the compressed e2image would let ted or others look into the
problem.

Thanks,
-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck dies with error this should never happen!!!

2008-02-22 Thread Eric Sandeen
Bas van Schaik wrote:
 Bas van Schaik wrote:

 In a few hours I will hit the road towards France for a holiday, from
 which I will return on the 3rd of March. I would be _really_ grateful if
 you could assist me solving this problem.
 Providing the compressed e2image would let ted or others look into the
 problem.

 
 Should that be a raw image or a 'normal' one?

Raw, I think; that way you can compress it as well.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-21 Thread Eric Sandeen
Theodore Tso wrote:
 On Wed, Feb 20, 2008 at 12:46:57PM -0600, Eric Sandeen wrote:
 Theodore Tso wrote:
 The big news here is that extents branch is sufficiently finished that
 I've promoted it to the next branch.
 Hey, that's great news.  :)

 Out of curiosity; what is the plan for behavior when finding symlinks
 with the extents flag set?  Last I checked e2fsprogs-interim, they got
 clobbered.  Is that on the TODO before extents support goes live?
 
 Right now on e2fpsrogs 'next' the extents flag being set on symlinks,
 block/char devices (in general inodes for which
 ext2fs_has_valid_blocks returns NULL) are ignored.  I need to make
 sure this does the right thing with long symlinks --- and I'd argue
 that given that long symlinks can only *ever* be one block long, it's
 pointless to use the extents format, since it's just too complicated,
 and I *think* that's what the kernel code is doing, but I need to
 check to be sure.

After the patches I sent a couple days ago, which Aneesh then collapsed
into the ext4_new_inode logic, that should be the case (for both types
of symlinks)

 Eventually I'll add code to mainline to clear EXTENTS_FL from inodes
 that shouldn't have it, but the kernel patches need to hit mainline first.

Ok, but my concern is what happens to those long symlinks when they
really truly are in extents format.  One option is to say hey it was
ext4DEV, deal with it and nuke the symlink, but if possible, converting
back to the proper format would be nice.

 ...

 So one question which Eric you'll want to consider is at what point
 you want to switch the FC users from e2fsprogs-interim to the
 bleeding-edge e2fsprogs mainline code, since eventually this is the
 branch that will have blk64_t support.

Well, honestly I haven't even put -interim into rawhide yet, although
I'm fairly close to that (or -next, whichever makes the most sense)

In my testing a few days ago, e2fsck from -interim blew away the long
symlinks, and I didn't want to foist that upon the folks using ext4dev
today...

 One related question is whether we want to try to get support for full
 64-bit physical block numbers into ext4.  I think there were some
 rough draft patches floating about, but IIRC they didn't
 simultaneously support the 48-bit extent format.  The e2fsprogs
 mainline implementation of extents makes it fairly easy to support new
 extents formats --- it's minimal code changes in a single file,
 lib/ext2fs/extents.c, and made sure all of the infrastructure was
 present to make this easy --- but I believe that trying to support
 multiple formats in the kernel given the current ext4 code would be
 fairly invasive.  Given the timeline, I'm assuming that our current
 path is that we aren't planning on pushing full 64-bit physical block
 support into the extents format for what we hope will make it into the
 next round of enterprise kernels, but I thought I should throw out the
 question so we make the decision one way or the other, explicitly.

I too had assumed that 48 bits would be it for now; it should be
sufficient for a good while.  I guess what I'd like to see if a usable
ext4 out there in the near future, with stuff added on later as
necessary; delalloc, flex_bg (if that doesn't make 2.6.25...) etc.

Oh, speaking of all this - what do you think the criteria are for
dropping the dev from ext4dev?  How do we decide that it's cooked
enough?  :)

Thanks,
-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-21 Thread Eric Sandeen
This is for RH Bugzilla #433857: 
rpc.mountd segfaults due to uninitialized value in e2fsprogs devname.c

https://bugzilla.redhat.com/show_bug.cgi?id=433857

which did some very helpful analysis  provided a patch.

This patch is based on that, but checks all the devicemapper calls,
and does some goto error handling / unwrapping, in the same style as
the device-mapper lib code itself.

Compile-tested only, but seems fine to me.

Thanks,

-Eric


Index: e2fsprogs-1.40.5/lib/blkid/devname.c
===
--- e2fsprogs-1.40.5.orig/lib/blkid/devname.c
+++ e2fsprogs-1.40.5/lib/blkid/devname.c
@@ -171,37 +171,42 @@ static int dm_device_has_dep(const dev_t
struct dm_deps *deps;
struct dm_info info;
unsigned int i;
+   int ret = 0;
 
task = dm_task_create(DM_DEVICE_DEPS);
if (!task)
-   return 0;
+   goto out;
 
-   dm_task_set_name(task, name);
-   dm_task_run(task);
-   dm_task_get_info(task, info);
+   if (!dm_task_set_name(task, name))
+   goto out;
 
-   if (!info.exists) {
-   dm_task_destroy(task);
-   return 0;
-   }
+   if (!dm_task_run(task))
+   goto out;
+
+   if (!dm_task_get_info(task, info))
+   goto out;
+
+   if  (!info.exists)
+   goto out;
 
deps = dm_task_get_deps(task);
-   if (!deps || deps-count == 0) {
-   dm_task_destroy(task);
-   return 0;
-   }
+   if (!deps || deps-count == 0)
+   goto out;
 
for (i = 0; i  deps-count; i++) {
dev_t dep_dev = deps-device[i];
 
if (dev == dep_dev) {
-   dm_task_destroy(task);
-   return 1;
+   ret = 1;
+   goto out;
}
}
 
-   dm_task_destroy(task);
-   return 0;
+out:
+   if (task)
+   dm_task_destroy(task);
+
+   return ret;
 }
 
 static int dm_device_is_leaf(const dev_t dev)
@@ -214,15 +219,16 @@ static int dm_device_is_leaf(const dev_t
dm_log_init(dm_quiet_log);
task = dm_task_create(DM_DEVICE_LIST);
if (!task)
-   return 1;
+   goto out;
+
dm_log_init(0);
 
-   dm_task_run(task);
+   if (!dm_task_run(task))
+   goto out;
+
names = dm_task_get_names(task);
-   if (!names || !names-dev) {
-   dm_task_destroy(task);
-   return 1;
-   }
+   if (!names || !names-dev)
+   goto out;
 
n = 0;
do {
@@ -234,7 +240,9 @@ static int dm_device_is_leaf(const dev_t
next = names-next;
} while (next);
 
-   dm_task_destroy(task);
+out:
+   if (task)
+   dm_task_destroy(task);
 
return ret;
 }
@@ -247,20 +255,25 @@ static dev_t dm_get_devno(const char *na
 
task = dm_task_create(DM_DEVICE_INFO);
if (!task)
-   return ret;
+   goto out;
 
-   dm_task_set_name(task, name);
-   dm_task_run(task);
-   dm_task_get_info(task, info);
+   if (!dm_task_set_name(task, name))
+   goto out;
 
-   if (!info.exists) {
-   dm_task_destroy(task);
-   return ret;
-   }
+   if (!dm_task_run(task))
+   goto out;
+
+   if (!dm_task_get_info(task, info))
+   goto out;
+
+   if (!info.exists)
+   goto out;
 
ret = makedev(info.major, info.minor);
 
-   dm_task_destroy(task);
+out:
+   if (task)
+   dm_task_destroy(task);

return ret;
 }
@@ -275,15 +288,15 @@ static void dm_probe_all(blkid_cache cac
dm_log_init(dm_quiet_log);
task = dm_task_create(DM_DEVICE_LIST);
if (!task)
-   return;
+   goto out;
dm_log_init(0);
 
-   dm_task_run(task);
+   if (!dm_task_run(task))
+   goto out;
+
names = dm_task_get_names(task);
-   if (!names || !names-dev) {
-   dm_task_destroy(task);
-   return;
-   }
+   if (!names || !names-dev)
+   goto out;
 
n = 0;
do {
@@ -311,7 +324,9 @@ try_next:
next = names-next;
} while (next);
 
-   dm_task_destroy(task);
+out:
+   if (task)
+   dm_task_destroy(task);
 }
 #endif /* HAVE_DEVMAPPER */
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: ext4_find_next_zero_bit needs an aligned address on some arch

2008-02-20 Thread Eric Sandeen
Aneesh Kumar K.V wrote:
 ext4_find_next_zero_bit and ext4_find_next_bit needs a long aligned
 address on x8_64. Add mb_find_next_zero_bit and mb_find_next_bit
 and use them in the mballoc.
 
 Fix: https://bugzilla.redhat.com/show_bug.cgi?id=433286
 
 Eric Sandeen debugged the problem and suggested the fix.

Also, Ted  Mingming: we probably should get this into 2.6.25; at least
w/ the way the Fedora kernel is configured, ext4 is pretty much DOA w/o
this change.  I'm not sure why it started showing up now, but it is, in
a big way. :)

Thanks,
-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: ext4_find_next_zero_bit needs an aligned address on some arch

2008-02-20 Thread Eric Sandeen
Aneesh Kumar K.V wrote:
 ext4_find_next_zero_bit and ext4_find_next_bit needs a long aligned
 address on x8_64. Add mb_find_next_zero_bit and mb_find_next_bit
 and use them in the mballoc.
 
 Fix: https://bugzilla.redhat.com/show_bug.cgi?id=433286
 
 Eric Sandeen debugged the problem and suggested the fix.

Thanks for fixing this up, Aneesh.

Thanks for getting rid of the magic looks-like-a-function-but-isn't
#define, too.  :)

The testcase I reproduced this with (basically just rebuilding a kernel
src.rpm on ext4) seems to pass with change.

As we had it, the direct call of find_next_zero_bit seemed to almost do
the right thing, except it scanned more bits than we asked for, up to 64
bits' worth, and so a) wandered off our page and b) returned an answer
that was  the size we asked it to scan in some cases.

(It's unfortunate that the alignment code for this got taken out in the
first place, only to put it back now, but I guess I didn't speak up
then, either...)

Really, I think the core bitmap functions could use some fixing, or at
least some warnings if it's given an address it can't cope with properly.

But for now

Acked-by: Eric Sandeen [EMAIL PROTECTED]

Thanks,
-Eric

 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 CC:Eric Sandeen [EMAIL PROTECTED]
 ---
  fs/ext4/mballoc.c |   62 ++--
  1 files changed, 40 insertions(+), 22 deletions(-)
 
 diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
 index 89772b9..ccddd21 100644
 --- a/fs/ext4/mballoc.c
 +++ b/fs/ext4/mballoc.c
 @@ -627,21 +627,19 @@ static ext4_fsblk_t ext4_grp_offs_to_block(struct 
 super_block *sb,
   return block;
  }
  
 +static inline void *mb_correct_addr_and_bit(int *bit, void *addr)
 +{
  #if BITS_PER_LONG == 64
 -#define mb_correct_addr_and_bit(bit, addr)   \
 -{\
 - bit += ((unsigned long) addr  7UL)  3;   \
 - addr = (void *) ((unsigned long) addr  ~7UL);  \
 -}
 + *bit += ((unsigned long) addr  7UL)  3;
 + addr = (void *) ((unsigned long) addr  ~7UL);
  #elif BITS_PER_LONG == 32
 -#define mb_correct_addr_and_bit(bit, addr)   \
 -{\
 - bit += ((unsigned long) addr  3UL)  3;   \
 - addr = (void *) ((unsigned long) addr  ~3UL);  \
 -}
 + *bit += ((unsigned long) addr  3UL)  3;
 + addr = (void *) ((unsigned long) addr  ~3UL);
  #else
  #error how many bits you are?!
  #endif
 + return addr;
 +}
  
  static inline int mb_test_bit(int bit, void *addr)
  {
 @@ -649,34 +647,54 @@ static inline int mb_test_bit(int bit, void *addr)
* ext4_test_bit on architecture like powerpc
* needs unsigned long aligned address
*/
 - mb_correct_addr_and_bit(bit, addr);
 + addr = mb_correct_addr_and_bit(bit, addr);
   return ext4_test_bit(bit, addr);
  }
  
  static inline void mb_set_bit(int bit, void *addr)
  {
 - mb_correct_addr_and_bit(bit, addr);
 + addr = mb_correct_addr_and_bit(bit, addr);
   ext4_set_bit(bit, addr);
  }
  
  static inline void mb_set_bit_atomic(spinlock_t *lock, int bit, void *addr)
  {
 - mb_correct_addr_and_bit(bit, addr);
 + addr = mb_correct_addr_and_bit(bit, addr);
   ext4_set_bit_atomic(lock, bit, addr);
  }
  
  static inline void mb_clear_bit(int bit, void *addr)
  {
 - mb_correct_addr_and_bit(bit, addr);
 + addr = mb_correct_addr_and_bit(bit, addr);
   ext4_clear_bit(bit, addr);
  }
  
  static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void *addr)
  {
 - mb_correct_addr_and_bit(bit, addr);
 + addr = mb_correct_addr_and_bit(bit, addr);
   ext4_clear_bit_atomic(lock, bit, addr);
  }
  
 +static inline int mb_find_next_zero_bit(void *addr, int max, int start)
 +{
 + int fix = 0;
 + addr = mb_correct_addr_and_bit(fix, addr);
 + max += fix;
 + start += fix;
 +
 + return ext4_find_next_zero_bit(addr, max, start) - fix;
 +}
 +
 +static inline int mb_find_next_bit(void *addr, int max, int start)
 +{
 + int fix = 0;
 + addr = mb_correct_addr_and_bit(fix, addr);
 + max += fix;
 + start += fix;
 +
 + return ext4_find_next_bit(addr, max, start) - fix;
 +}
 +
  static void *mb_find_buddy(struct ext4_buddy *e4b, int order, int *max)
  {
   char *bb;
 @@ -946,12 +964,12 @@ static void ext4_mb_generate_buddy(struct super_block 
 *sb,
  
   /* initialize buddy from bitmap which is aggregation
* of on-disk bitmap and preallocations */
 - i = ext4_find_next_zero_bit(bitmap, max, 0);
 + i = mb_find_next_zero_bit(bitmap, max, 0);
   grp-bb_first_free = i;
   while (i  max) {
   fragments++;
   first = i;
 - i = ext4_find_next_bit(bitmap, max, i);
 + i = mb_find_next_bit(bitmap, max, i);
   len = i - first;
   free += len;
   if (len  1)
 @@ -959,7 +977,7 @@ static void

Re: What's cooking in e2fsprogs.git (topics)

2008-02-20 Thread Eric Sandeen
Theodore Tso wrote:
 The big news here is that extents branch is sufficiently finished that
 I've promoted it to the next branch.

Hey, that's great news.  :)

Out of curiosity; what is the plan for behavior when finding symlinks
with the extents flag set?  Last I checked e2fsprogs-interim, they got
clobbered.  Is that on the TODO before extents support goes live?

Thanks,
-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [E2FSPROGS, RFC] New mke2fs types parsing

2008-02-20 Thread Eric Sandeen
Theodore Ts'o wrote:
 The following patch is a work in progress, but I'm sending it out so
 folks can take a look at it and comment on the general approach.
 
 What this does is change how mke2fs -T works so it can take a comma
 separated list, so you can do things like this:
 
 mke2fs -T ext4,small,news

Is there some hierarchy of what these options are and how they fit
together?  i.e. small  news might go together (in bizarro world...) but
small,large wouldn't make sense - nor would -T ext3,ext4.  Or, if
somebody stores mail  news on the same fs, nad they say -T mail,news
but the mail  news types have conflicting directives...

how will you define what an acceptable composite of subtypes will be?

Thanks,

-Eric

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [E2FSPROGS, RFC] New mke2fs types parsing

2008-02-20 Thread Eric Sandeen
Theodore Tso wrote:
 On Wed, Feb 20, 2008 at 12:51:21PM -0600, Eric Sandeen wrote:
 Theodore Ts'o wrote:
 The following patch is a work in progress, but I'm sending it out so
 folks can take a look at it and comment on the general approach.

 What this does is change how mke2fs -T works so it can take a comma
 separated list, so you can do things like this:

   mke2fs -T ext4,small,news
 Is there some hierarchy of what these options are and how they fit
 together?  i.e. small  news might go together (in bizarro world...) but
 small,large wouldn't make sense - nor would -T ext3,ext4.  Or, if
 somebody stores mail  news on the same fs, nad they say -T mail,news
 but the mail  news types have conflicting directives...

 how will you define what an acceptable composite of subtypes will be?
 
 There are only three things which mke2fs will do, in my design:

I apologize if I was too quick to ask a dumb question rather than
atually read through the patch, but thank you for the text explanation :)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] - detect LVM2 PVs in libblkid

2008-02-19 Thread Eric Sandeen
Theodore Tso wrote:

 The patch works for me, but it would seem the right thing would be
 print the UUID without the hyphens, i.e.:
 
 guOQGdcOE3IafCm0190XkPZTy5fCEanQ
 
 instead of
 
   guOQGd-cOE3-IafC-m019-0XkP-ZTy5-fCEanQ

But it already does print it without the hyphens.

Or did you mean to say print the UUID *with* the hyphens?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clear extents flag on inodes created in ext4_mknod

2008-02-19 Thread Eric Sandeen
Aneesh Kumar K.V wrote:
 On Tue, Feb 19, 2008 at 10:39:52AM -0600, Eric Sandeen wrote:
 Eric Sandeen wrote:
 e2fsck doesn't expect to find char, block, fifo, or socket
 files with the extent flag set, so clear that in ext4_mknod.

 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 ---

 Index: linux-2.6.24/fs/ext4/namei.c
 ===
 --- linux-2.6.24.orig/fs/ext4/namei.c
 +++ linux-2.6.24/fs/ext4/namei.c
 @@ -1766,6 +1766,7 @@ retry:
  #ifdef CONFIG_EXT4DEV_FS_XATTR
 inode-i_op = ext4_special_inode_operations;
  #endif
 +   EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
 err = ext4_add_nondir(handle, dentry, inode);
 }
 ext4_journal_stop(handle);
 now that I think about it; perhaps it would be better to put this logic
 into ext4_new_inode, rather than setting it by default and clearing it
 here... that way new_inode() has all the logic about whether or not a
 particular type of file is in extents format.

 
 How about enabling it only for directory and regular files rather than
 enabling it globally and then disabling the flag for  symlink and device
 files ?

I think that makes sense.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clear extents flag on inodes created in ext4_mknod

2008-02-19 Thread Eric Sandeen
Aneesh Kumar K.V wrote:

 How about enabling it only for directory and regular files rather than
 enabling it globally and then disabling the flag for  symlink and device
 files ?

 
 how about something like below. There are two reason for not inheriting
 the i_flag from directory.

Looks about right to me... see below

 
 diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
 index d2c2e55..f430939 100644
 --- a/fs/ext4/ialloc.c
 +++ b/fs/ext4/ialloc.c
 @@ -794,7 +794,12 @@ got:
   ei-i_dir_start_lookup = 0;
   ei-i_disksize = 0;
  
 - ei-i_flags = EXT4_I(dir)-i_flags  ~EXT4_INDEX_FL;
 + /*
 +  * Don't inherit extent flag from directory. We set extent flag on
 +  * newly created directory and file only if -o extent mount option is
 +  * specfied
 nitpick:   typo

 +  */
 + ei-i_flags = EXT4_I(dir)-i_flags  ~ (EXT4_INDEX_FL|EXT4_EXTENTS_FL);
   if (S_ISLNK(mode))
   ei-i_flags = ~(EXT4_IMMUTABLE_FL|EXT4_APPEND_FL);
   /* dirsync only applies to directories */
 @@ -836,8 +841,10 @@ got:
   ext4_std_error(sb, err);
   goto fail_free_drop;
   }
 - if (test_opt(sb, EXTENTS)  !S_ISLNK(mode)) {
 - EXT4_I(inode)-i_flags |= EXT4_EXTENTS_FL;
 + if (test_opt(sb, EXTENTS)) {
 + /* set extent flag only for diretory and file */
 + if (S_ISDIR(mode) || S_ISREG(mode))
 + EXT4_I(inode)-i_flags |= EXT4_EXTENTS_FL;
   ext4_ext_tree_init(handle, inode);
   err = ext4_update_incompat_feature(handle, sb,
   EXT4_FEATURE_INCOMPAT_EXTENTS);

Hm you only set the extents flag for dir  reg, but you call the next
two functions for all files?  I think the next 2 lines need to be part
of the conditional, no? (and then there can be just one big test,
indentation can come out a bit...?)

But, the spirit of the patch looks right.

-Eric



 diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
 index 23902ba..da942bc 100644
 --- a/fs/ext4/namei.c
 +++ b/fs/ext4/namei.c
 @@ -1771,7 +1771,6 @@ retry:
  #ifdef CONFIG_EXT4DEV_FS_XATTR
   inode-i_op = ext4_special_inode_operations;
  #endif
 - EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
   err = ext4_add_nondir(handle, dentry, inode);
   }
   ext4_journal_stop(handle);

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clear extents flag on inodes created in ext4_mknod

2008-02-19 Thread Eric Sandeen
Eric Sandeen wrote:
 e2fsck doesn't expect to find char, block, fifo, or socket
 files with the extent flag set, so clear that in ext4_mknod.
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 ---
 
 Index: linux-2.6.24/fs/ext4/namei.c
 ===
 --- linux-2.6.24.orig/fs/ext4/namei.c
 +++ linux-2.6.24/fs/ext4/namei.c
 @@ -1766,6 +1766,7 @@ retry:
  #ifdef CONFIG_EXT4DEV_FS_XATTR
   inode-i_op = ext4_special_inode_operations;
  #endif
 + EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
   err = ext4_add_nondir(handle, dentry, inode);
   }
   ext4_journal_stop(handle);

now that I think about it; perhaps it would be better to put this logic
into ext4_new_inode, rather than setting it by default and clearing it
here... that way new_inode() has all the logic about whether or not a
particular type of file is in extents format.

Think it's worth changing?

-Eric

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: set EXT4_EXTENTS_FL only for directory and regular files

2008-02-19 Thread Eric Sandeen
Aneesh Kumar K.V wrote:
 Also don't inherit EXT4_EXTENTS_FL from parent directory.
 If we have a directory with extent flag set and later mount the file
 system with -o noextents, the files created in that directory will also
 have extent flag set but we would not have called ext4_ext_tree_init for
 them. This will cause error later when we are verifying the extent header
 
 Also we don't want to set extent flag for symlinks, char, block, fifo
 or socket

Minor typo in comments, diretory, but otherwise:

Acked-by: Eric Sandeen [EMAIL PROTECTED]

 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/ialloc.c |   22 +++---
  fs/ext4/namei.c  |1 -
  2 files changed, 15 insertions(+), 8 deletions(-)
 
 diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
 index 028e601..78d1094 100644
 --- a/fs/ext4/ialloc.c
 +++ b/fs/ext4/ialloc.c
 @@ -794,7 +794,12 @@ got:
   ei-i_dir_start_lookup = 0;
   ei-i_disksize = 0;
  
 - ei-i_flags = EXT4_I(dir)-i_flags  ~EXT4_INDEX_FL;
 + /*
 +  * Don't inherit extent flag from directory. We set extent flag on
 +  * newly created directory and file only if -o extent mount option is
 +  * specified
 +  */
 + ei-i_flags = EXT4_I(dir)-i_flags  ~(EXT4_INDEX_FL|EXT4_EXTENTS_FL);
   if (S_ISLNK(mode))
   ei-i_flags = ~(EXT4_IMMUTABLE_FL|EXT4_APPEND_FL);
   /* dirsync only applies to directories */
 @@ -837,12 +842,15 @@ got:
   goto fail_free_drop;
   }
   if (test_opt(sb, EXTENTS)) {
 - EXT4_I(inode)-i_flags |= EXT4_EXTENTS_FL;
 - ext4_ext_tree_init(handle, inode);
 - err = ext4_update_incompat_feature(handle, sb,
 - EXT4_FEATURE_INCOMPAT_EXTENTS);
 - if (err)
 - goto fail;
 + /* set extent flag only for diretory and file */
 + if (S_ISDIR(mode) || S_ISREG(mode)) {
 + EXT4_I(inode)-i_flags |= EXT4_EXTENTS_FL;
 + ext4_ext_tree_init(handle, inode);
 + err = ext4_update_incompat_feature(handle, sb,
 + EXT4_FEATURE_INCOMPAT_EXTENTS);
 + if (err)
 + goto fail;
 + }
   }
  
   ext4_debug(allocating inode %lu\n, inode-i_ino);
 diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
 index 39d4af4..da942bc 100644
 --- a/fs/ext4/namei.c
 +++ b/fs/ext4/namei.c
 @@ -2225,7 +2225,6 @@ retry:
   inode-i_op = ext4_fast_symlink_inode_operations;
   memcpy((char*)EXT4_I(inode)-i_data,symname,l);
   inode-i_size = l-1;
 - EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
   }
   EXT4_I(inode)-i_disksize = inode-i_size;
   err = ext4_add_nondir(handle, dentry, inode);

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] print mostly-printable xattr strings in debugfs

2008-02-18 Thread Eric Sandeen
Eric Sandeen wrote:
 Taking a cue from getfattr... if a string is mostly
 printable characters, go ahead  print as a string,
 and escape what's left over.

Ted, ping on this?  Since we use selinux a lot, it'd be nice to have
something readable in debugfs output.

If there's anything you don't like about the patch I'll be happy to fix
it up.

Thanks,
-Eric

 so we get:
 
 Extended attributes stored in inode body: 
   selinux = system_u:object_r:root_t:s0\000 (28)
 
 instead of:
 
 Extended attributes stored in inode body: 
   selinux = 73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f 72 3a 72 6f 6f 
 74 5f 74 3a 73 30 00  (28)
 
 (selinux includes the trailing null in len so it
 never prints as a string today)
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 ---
  
 Index: e2fsprogs-1.40.5/debugfs/debugfs.c
 ===
 --- e2fsprogs-1.40.5.orig/debugfs/debugfs.c
 +++ e2fsprogs-1.40.5/debugfs/debugfs.c
 @@ -434,19 +434,21 @@ static int list_blocks_proc(ext2_filsys 
  
  static void dump_xattr_string(FILE *out, const char *str, int len)
  {
 - int printable = 1;
 + int printable = 0;
   int i;
   
 - /* check is string printable? */
 + /* check: is string printable enough? */
   for (i = 0; i  len; i++)
 - if (!isprint(str[i])) {
 - printable = 0;
 - break;
 - }
 + if (isprint(str[i]))
 + printable++;
 +
 + if (printable = len*7/8)
 + printable = 0;
  
   for (i = 0; i  len; i++)
   if (printable)
 - fprintf(out, %c, (unsigned char)str[i]);
 + fprintf(out, isprint(str[i]) ? %c : \\%03o,
 + (unsigned char)str[i]);
   else
   fprintf(out, %02x , (unsigned char)str[i]);
  }
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][7/28] e2fsprogs-extents.patch

2008-02-18 Thread Eric Sandeen
Eric Sandeen wrote:
 So this trips up on things like sockets, fifos, and block  char nodes.

Just to demonstrate; doing this on ext4:

mknod mnt/block b 1 1
mknod mnt/char c 1 1
mknod mnt/fifo p
mksock mnt/sock

mkdir -p
mnt/verylongdir12345678901234567890/verylongdir12345678901234567890/verylongdir12345678901234567890
ln -s
mnt/verylongdir12345678901234567890/verylongdir12345678901234567890/verylongdir12345678901234567890
mnt/longlink

yields an unhappy fsck w/ e2fsprogs-interim:

e2fsck 1.40.6 (09-Feb-2008)
Pass 1: Checking inodes, blocks, and sizes
Inode 12 has EXTENT_FL set, but is not in extents format
Fix? no

Inode 13 has EXTENT_FL set, but is not in extents format
Fix? no

Inode 14 has EXTENT_FL set, but is not in extents format
Fix? no

Inode 15 has EXTENT_FL set, but is not in extents format
Fix? no

Inode 17 has EXTENT_FL set, but is not in extents format
Fix? no

Pass 2: Checking directory structure
Inode 12 (/block) is an illegal block device.
Clear? no

Inode 13 (/char) is an illegal character device.
Clear? no

Inode 14 (/fifo) is an illegal FIFO.
Clear? no

Inode 15 (/sock) is an illegal socket.
Clear? no

Symlink /longlink (inode #17) is invalid.
Clear? no

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][7/28] e2fsprogs-extents.patch

2008-02-18 Thread Eric Sandeen
Andreas Dilger wrote:
 Support for checking 32-bit extents format inodes and the INCOMPAT_EXTENTS
 feature.
 
 Clear the high 16 bits of extents and index entries, since the
 extents patches did not do this explicitly.  Some parts of this
 code need fixing for checking  32-bit block filesystems (when
 INCOMPAT_64BIT support is added), marked FIXME: 48-bit support.
 
 Verify extent headers in blocks, logical ordering of extents,
 logical ordering of indexes.
 
 Add explicit checking of {d,t,}indirect and index blocks to detect
 corruption instead of implicitly doing this by checking the referred
 blocks and only block-at-a-time correctness.  This avoids incorrectly
 invoking the very lengthy duplicate blocks pass for bad indirect/index
 blocks.  We may want to tune the threshold for how many errors make
 a bad indirect/index block.
 
 Add ability to split or remove extents in order to allow extent
 reallocation during the duplicate blocks pass.
 
...


 @@ -904,21 +910,75 @@ void e2fsck_pass1(e2fsck_t ctx)
   ctx-fs_sockets_count++;
   } else
   mark_inode_bad(ctx, ino);
 - if (inode-i_block[EXT2_IND_BLOCK])
 - ctx-fs_ind_count++;
 - if (inode-i_block[EXT2_DIND_BLOCK])
 - ctx-fs_dind_count++;
 - if (inode-i_block[EXT2_TIND_BLOCK])
 - ctx-fs_tind_count++;
 - if (inode-i_block[EXT2_IND_BLOCK] ||
 - inode-i_block[EXT2_DIND_BLOCK] ||
 - inode-i_block[EXT2_TIND_BLOCK] ||
 - inode-i_file_acl) {
 - inodes_to_process[process_inode_count].ino = ino;
 - inodes_to_process[process_inode_count].inode = *inode;
 - process_inode_count++;
 - } else
 - check_blocks(ctx, pctx, block_buf);
 +
 + eh = (struct ext3_extent_header *)inode-i_block;
 + if ((inode-i_flags  EXT4_EXTENTS_FL)) {
 + if ((LINUX_S_ISREG(inode-i_mode) ||
 +  LINUX_S_ISDIR(inode-i_mode)) 

So this trips up on things like sockets, fifos, and block  char nodes.

Also this is unhappy:

 @@ -137,7 +141,7 @@ int e2fsck_pass1_check_device_inode(ext2
* If the index flag is set, then this is a bogus
* device/fifo/socket
*/
 - if (inode-i_flags  EXT2_INDEX_FL)
 + if (inode-i_flags  (EXT2_INDEX_FL | EXT4_EXTENTS_FL))
   return 0;

Do we really care if these have the extents flag set?  IOW should we
make sure the kernel doesn't set the flag, or should we make e2fsck not
care...

There are enough checks in e2fsck to show the intent was that these
files should not have the extents flag set, but I'm not sure why it
matters enough that the kernel needs to run around being sure to clear
it

Or... (rambling on now) it seems odd to me that zero-length files have
the extents flag set at all; should we only set extents when we actually
get a block allocated to the file?  That would also take care of this
from the kernel side I think.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][7/28] e2fsprogs-extents.patch

2008-02-18 Thread Eric Sandeen
Theodore Tso wrote:
 On Mon, Feb 18, 2008 at 11:56:53AM -0600, Eric Sandeen wrote:
 So this trips up on things like sockets, fifos, and block  char nodes.

 Also this is unhappy:

 @@ -137,7 +141,7 @@ int e2fsck_pass1_check_device_inode(ext2
  * If the index flag is set, then this is a bogus
  * device/fifo/socket
  */
 -   if (inode-i_flags  EXT2_INDEX_FL)
 +   if (inode-i_flags  (EXT2_INDEX_FL | EXT4_EXTENTS_FL))
 return 0;
 Do we really care if these have the extents flag set?  IOW should we
 make sure the kernel doesn't set the flag, or should we make e2fsck not
 care...
 
 Sigh
 
 I think we need to get kernel patches into mainline ASAP not to set
 the EXTENTS_FL 

You mean on devices/fifos/sockets ?  Ok.

But today, with 2.6.25-rc1 and e2fsprogs-interim, long (non-fast)
symlinks get clobbered by e2fsck, because:

Pass 1: Checking inodes, blocks, and sizes
Inode 12 has EXTENT_FL set, but is not in extents format
Fix? yes

Inode 12 has illegal block(s).  Clear? yes

Illegal block #0 (127754) in inode 12.  CLEARED.
Inode 12 is too big.  Truncate? yes

Block #1 (4) causes symlink to be too big.  CLEARED.
Block #4 (1) causes symlink to be too big.  CLEARED.
Block #5 (4772) causes symlink to be too big.  CLEARED.
Inode 12, i_blocks is 2, should be 0.  Fix? yes

Pass 2: Checking directory structure
Symlink /longlink (inode #12) is invalid.
Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -4772
Fix? yes

Free blocks count wrong for group #0 (3420, counted=3421).
Fix? yes

Free blocks count wrong (26192, counted=26193).
Fix? yes

and *poof* it's gone.  That one concerns me more...  This *should* be in
extents format, right, even though it's limited to one block...

 and at least
 for now, e2fsck needs to accept (and not complain or core dump) if
 EXTENTS_FL is set for files where ext2fs_inode_has_valid_blocks()
 returns false

well, if any filetypes are not supposed to have the extents flag set,
and they're zero-length, I'd say go ahead  clear it, and even complain
if you like - it's the design intent after all -  I wouldn't worry about
the noise at this stage.  FWIW, I haven't seen a core dump.  :)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] clear extents flag on inodes created in ext4_mknod

2008-02-18 Thread Eric Sandeen
e2fsck doesn't expect to find char, block, fifo, or socket
files with the extent flag set, so clear that in ext4_mknod.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
---

Index: linux-2.6.24/fs/ext4/namei.c
===
--- linux-2.6.24.orig/fs/ext4/namei.c
+++ linux-2.6.24/fs/ext4/namei.c
@@ -1766,6 +1766,7 @@ retry:
 #ifdef CONFIG_EXT4DEV_FS_XATTR
inode-i_op = ext4_special_inode_operations;
 #endif
+   EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
err = ext4_add_nondir(handle, dentry, inode);
}
ext4_journal_stop(handle);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] don't set extents flag for _any_ symlinks

2008-02-18 Thread Eric Sandeen
As symlinks are limited to a single block anyway, and e2fsck
doesn't expect to find it set, don't set the extents flag on 
any type of symlinks at all: fast/in-inode, or the 
external-block flavor.

There are a lot of filesystems out there by now w/ exent-style 
symlink blocks though, so e2fsck should probably be able to 
repair that at some point...

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24/fs/ext4/namei.c
===
--- linux-2.6.24.orig/fs/ext4/namei.c
+++ linux-2.6.24/fs/ext4/namei.c
@@ -2223,7 +2226,6 @@ retry:
inode-i_op = ext4_fast_symlink_inode_operations;
memcpy((char*)EXT4_I(inode)-i_data,symname,l);
inode-i_size = l-1;
-   EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
}
EXT4_I(inode)-i_disksize = inode-i_size;
err = ext4_add_nondir(handle, dentry, inode);

Index: linux-2.6.24/fs/ext4/ialloc.c
===
--- linux-2.6.24.orig/fs/ext4/ialloc.c
+++ linux-2.6.24/fs/ext4/ialloc.c
@@ -744,7 +744,7 @@ got:
ext4_std_error(sb, err);
goto fail_free_drop;
}
-   if (test_opt(sb, EXTENTS)) {
+   if (test_opt(sb, EXTENTS)  !S_ISLNK(mode)) {
EXT4_I(inode)-i_flags |= EXT4_EXTENTS_FL;
ext4_ext_tree_init(handle, inode);
err = ext4_update_incompat_feature(handle, sb,

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!

2008-02-15 Thread Eric Sandeen
Valerie Clement wrote:
 Eric Sandeen wrote:
 Valerie Clement wrote:
 Fix kernel BUG at fs/ext4/mballoc.c:910!

 From: Valerie Clement [EMAIL PROTECTED]

 With the flex_bg feature enabled, a large file creation oopses the
 kernel.
 Valerie, what's the specific testcase for this?

 Thanks,
 -Eric
 I just did a simple dd,
   dd if=/dev/zero of=/mnt/test/foo bs=1M count=8k
 on an ext4 filesystem with the uninit_groups, lazy_bg and flex_bg
 features enabled and the default mount options.

Ok, thanks Valerie.  It's just always nice to know exactly how to
reproduce a problem that a patch fixes, IMHO.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-10 Thread Eric Sandeen
Theodore Tso wrote:
 I'ts been a while since I've pushed one of these out.  This updates pu
 against the latest next branch, so we can make sure all of the
 changes that had been folded into maint is included on the pu
 branch.  It also includes more work on the tt/extents patches.
 Tune2fs and e2image are now functional with extents, and e2fsck has
 more extents checking added.  Still a bit more work to be done there,
 and I also have to add code to deal with symlinks that have the
 EXTENTS_FL flag set.
 
 I've dropped the js/flex-bg branch since it had conflicts with the
 on-disk bg flags, and it had conflicts with the exsting patches that
 caused me concern.  Jose, once you regenerate the patch and deal with
 comments, I'll add it back in.
 
 My TODO list:
 
finish e2fsck extents support
more flexible mke2fs.conf handling
EXT4 -- allow mke2fs 8TB without -f

ext3 should be fine as well, right?  And I think Josef sent a patch...

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] allocate struct ext4_allocation_context from a kmem cache to save stack space

2008-02-08 Thread Eric Sandeen
Mingming Cao wrote:

 printk could be removed...so as long as it builds fine. I had looked at
 the history yesterday and find this fix
 http://lists.openwall.net/linux-ext4/2007/10/10/2
 so I was under impression that the ifdefs was added to fix compile
 issue. I did not look more closely. Maybe that's not a issue any more.

I guess I should look into it.  For now let's just drop the #ifdef
removal, it's not related anyway.

Would you like me to send a fresh patch?

Thanks,
-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG_ON at mballoc.c:3752

2008-02-08 Thread Eric Sandeen
Eric Sesterhenn wrote:
 * Eric Sandeen ([EMAIL PROTECTED]) wrote:
 Eric Sesterhenn wrote:
 Eric ,
 can you run the test with below patch and see if this makes any
 difference ?. I know we are not fixing any bugs in the below patch.
 ok, i checked out the old version again and applied both patches,
 the BUG is gone (no surprise)
 In the case where it would have hit the BUG i now get the following message:

 [  740.400288] Aborting journal on device loop0.
 No message before that about *why* it aborted?
 
 assumed the stuff was from the previous runs
 

 [  735.424574] EXT4-fs error (device loop0): ext4_mb_generate_buddy:
 EXT4-fs: group 0: 6590 blocks in bitmap, 6600 in gd
 [  740.400288] Aborting journal on device loop0.

There it is, thanks.  Just making sure :)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


call for brave ext4 testers in F9 Alpha, with caveats

2008-02-08 Thread Eric Sandeen
(I sent this to fedora-devel, but of course anyone here who would like
to give it a whirl should also do so!  So far there haven't been any bad
reports.  :)  See the wiki link at the bottom for a couple of updates,
caveats  gotchas.)

..

Hi Gang -

The ext4 filesystem is a feature slated for F9, and we (rh, ibm, bull,
clusterfs, and others) are working feverishly to get it ready for a
prime-time F9 fs option.

I'd like to get a bit more exposure if possible, so if you're feeling
like living on the cutting edge of filesystems, read on:

Rawhide and the F9 alpha can install onto an ext4dev filesystem root;
first you need to tell (a.k.a. lie to) the installer with
iamanext4developer on the boot commandline.  This is akin to the jfs
or reiserfs options for those filesystems, but a higher hurdle (more
characters to type!).

Then you can do custom partitioning, and select ext4dev for any
filesystem (except /boot - no grub support (yet)).

The install should proceed Just Fine(tm).  If not, let me know.

And now for the (known) caveats:

1) Due to bug 429857: Root inode of ext4dev root filesystem does not get
selinux label - booting with selinux enabled  enforcing will probably
fail.  Boot with enforcing=0, and use restorecon or chcon on / to
(hopefully) properly update the root inode's selinux attrs.  I have a
fix for this bug, so soon, kernel updates will resolve this and allow it
to be properly set (and retained).  Please do test w/ selinux enabled
though, as the new larger inodes and in-inode xattrs could use airtime.

2) There is no readily-available e2fsprogs which can repair your shiny
new ext4dev filesystem.  If something goes badly I'll help out because
we need to know what went wrong, but so far there is no released
upstream e2fsprogs which can handle the new ext4 features.  So please
consider anything you put on ext4dev for now to be disposable, just to
be on the safe side.  extents-capable e2fsprogs should be available Real
Soon Now.

3) misc stuff - I've not yet tested ext4 over an encrypted block device,
or even over an lvm volume.  There may be some stack issues on x86 boxes
still, I'm working on slimming that down.  I hope that more real-world
use will shake out any remaining problems.

... and I suppose I should put this into a wiki page:
http://fedoraproject.org/wiki/FedoraExt4

We can keep it updated with any further issues or resolutions.

Thanks!
-Eric

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, E2FSPROGS] blkid: Automatically chose between ext4 and ext4dev as appropriate

2008-02-08 Thread Eric Sandeen
Theodore Ts'o wrote:
 I plan to release this in an upcoming 1.40.6 release, along with code
 that allows tune2fs -E test_fs to work on ext4 filesystems.  The idea
 is that if I can get this into community distro's, it will make it
 easier for them to experiment with ext4/ext4dev, with the appropriate
 forward compatibility when sometime (hopefully 2.6.26 or 2.6.27?) we
 rename ext4dev to ext4.
 
 Any comments before I push this change out to the maint branch?

Seems logically correct to me... although it also seems like this has
gotten fairly complex by now, what with blkid rummaging around in
modules.dep, /proc files, etc...

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG_ON at mballoc.c:3752

2008-02-08 Thread Eric Sandeen
Eric Sesterhenn wrote:
 Eric ,
 can you run the test with below patch and see if this makes any
 difference ?. I know we are not fixing any bugs in the below patch.
 
 ok, i checked out the old version again and applied both patches,
 the BUG is gone (no surprise)
 In the case where it would have hit the BUG i now get the following message:
 
 [  740.400288] Aborting journal on device loop0.

No message before that about *why* it aborted?

 [  740.405032] ext4_abort called.
 [  740.405097] EXT4-fs error (device loop0): ext4_journal_start_sb: Detected 
 aborted journal
 [  740.405178] Remounting filesystem read-only
 [  740.410974] EXT4-fs error (device loop0) in ext4_ordered_write_end: IO 
 failure
 [  740.414300] EXT4-fs error (device loop0) in ext4_reserve_inode_write: 
 Journal has aborted
 [  740.414422] pa cba56990: logic 16, phys. 1953, len 16
 [  740.414447] EXT4-fs error (device loop0): ext4_mb_release_inode_pa: free 
 4, pa_free 3
 [  740.414548] EXT4-fs error (device loop0) in ext4_mb_free_blocks: Journal 
 has aborted
 
 
 Greetings, Eric

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] allocate struct ext4_allocation_context from a kmem cache to save stack space

2008-02-07 Thread Eric Sandeen
Mingming Cao wrote:

 Do you intend to remove the #ifdef CONFIG_PROC_FS, or it's a accident? I
 think we need keep that to allow ext4 build without procfs configured.
 
 Other than this, the patch looks fine to me.:)

oh, it kind of snuck in.  It actually should still build, as
remove_proc_entry is a no-op function w/o the config option.

Feel free to leave it in if you like though.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] allocate struct ext4_allocation_context from a kmem cache to save stack space

2008-02-07 Thread Eric Sandeen
Mingming Cao wrote:
 On Thu, 2008-02-07 at 19:06 -0600, Eric Sandeen wrote:
 Mingming Cao wrote:

 Do you intend to remove the #ifdef CONFIG_PROC_FS, or it's a accident? I
 think we need keep that to allow ext4 build without procfs configured.

 Other than this, the patch looks fine to me.:)
 oh, it kind of snuck in.  It actually should still build, as
 remove_proc_entry is a no-op function w/o the config option.
 
 Oh, I mean the proc_mkdir(EXT4_ROOT, proc_root_fs) will complain w/o
 CONFIG_PROC_FS configured.
 
 Mingming
 

it'll build:

static inline struct proc_dir_entry *proc_mkdir(const char *name,
struct proc_dir_entry *parent) {return NULL;}

yes, it'll issue a printk though.  *shrug*

I like fewer #ifdefs better than more, but doesn't matter much to me.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG_ON at mballoc.c:3752

2008-02-07 Thread Eric Sandeen
Aneesh Kumar K.V wrote:
 On Thu, Feb 07, 2008 at 05:30:48PM -0800, Mingming Cao wrote:
 On Thu, 2008-02-07 at 18:25 +0530, Aneesh Kumar K.V wrote:

 ext4: Don't panic in case of corrupt bitmap

 From: Aneesh Kumar K.V [EMAIL PROTECTED]

 Multiblock allocator was calling BUG_ON in many case if the free and used
 blocks count obtained looking at the bitmap is different from what
 the allocator internally accounted for. Use ext4_error in such case
 and don't panic the system.

 There seems a lot of BUG_ON() and BUG() in mballoc code, other than this
 case. Should it always panic the whole system in those cases? Perhaps
 replacing with ext4_error() or some cases just WARN_ON is enough.

 
 I had looked at the BUG_ON in mballoc code and found them very useful
 while stabilizing the mballoc code. It helped to catch wrong usage of
 functions. Most of the BUG_ON are there to make sure we call the API
 with the lock held or the API should not return value greater than 'x'
 Should not call the function with a particular argument as NULL ...etc
 kind of thing. So i would suggest to keep them as such.

Yep - as long as the BUG_ONs are for programming errors, and not things
like memory allocation failures or corrupted disks, I think it's ok.
Not that I've audited them all.  :)

I asked about this early on, and got that answer... I'm reasonably
satisfied with it.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] allocate struct ext4_allocation_context from a kmem cache to save stack space

2008-02-06 Thread Eric Sandeen
struct ext4_allocation_context is rather large, and this bloats
the stack of many functions which use it.  Allocating it from
a named slab cache will alleviate this.

For example, with this change (on top of the noinline patch sent earlier):

-ext4_mb_new_blocks 200
+ext4_mb_new_blocks  40

-ext4_mb_free_blocks344
+ext4_mb_free_blocks168

-ext4_mb_release_inode_pa   216
+ext4_mb_release_inode_pa40

-ext4_mb_release_group_pa   192
+ext4_mb_release_group_pa24

Most of these stack-allocated structs are actually used only for
mballoc history; and in those cases often a smaller struct would do.
So changing that may be another way around it, at least for those
functions, if preferred.  For now, in those cases where the ac
is only for history, an allocation failure simply skips the history
recording, and does not cause any other failures.

Comments?

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
---

Index: linux-2.6.24-rc6-mm1/fs/ext4/mballoc.c
===
--- linux-2.6.24-rc6-mm1.orig/fs/ext4/mballoc.c
+++ linux-2.6.24-rc6-mm1/fs/ext4/mballoc.c
@@ -419,6 +419,7 @@
 #define MB_DEFAULT_GROUP_PREALLOC  512
 
 static struct kmem_cache *ext4_pspace_cachep;
+static struct kmem_cache *ext4_ac_cachep;
 
 #ifdef EXT4_BB_MAX_BLOCKS
 #undef EXT4_BB_MAX_BLOCKS
@@ -2958,11 +2959,18 @@ int __init init_ext4_mballoc(void)
if (ext4_pspace_cachep == NULL)
return -ENOMEM;
 
-#ifdef CONFIG_PROC_FS
+   ext4_ac_cachep =
+   kmem_cache_create(ext4_alloc_context,
+sizeof(struct ext4_allocation_context),
+0, SLAB_RECLAIM_ACCOUNT, NULL);
+   if (ext4_ac_cachep == NULL) {
+   kmem_cache_destroy(ext4_pspace_cachep);
+   return -ENOMEM;
+   }
+
proc_root_ext4 = proc_mkdir(EXT4_ROOT, proc_root_fs);
if (proc_root_ext4 == NULL)
printk(KERN_ERR EXT4-fs: Unable to create %s\n, EXT4_ROOT);
-#endif
 
return 0;
 }
@@ -2971,9 +2979,8 @@ void exit_ext4_mballoc(void)
 {
/* XXX: synchronize_rcu(); */
kmem_cache_destroy(ext4_pspace_cachep);
-#ifdef CONFIG_PROC_FS
+   kmem_cache_destroy(ext4_ac_cachep);
remove_proc_entry(EXT4_ROOT, proc_root_fs);
-#endif
 }
 
 
@@ -3698,7 +3705,7 @@ static noinline int ext4_mb_release_inod
struct buffer_head *bitmap_bh,
struct ext4_prealloc_space *pa)
 {
-   struct ext4_allocation_context ac;
+   struct ext4_allocation_context *ac;
struct super_block *sb = e4b-bd_sb;
struct ext4_sb_info *sbi = EXT4_SB(sb);
unsigned long end;
@@ -3714,9 +3721,13 @@ static noinline int ext4_mb_release_inod
BUG_ON(group != e4b-bd_group  pa-pa_len != 0);
end = bit + pa-pa_len;
 
-   ac.ac_sb = sb;
-   ac.ac_inode = pa-pa_inode;
-   ac.ac_op = EXT4_MB_HISTORY_DISCARD;
+   ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS);
+
+   if (ac) {
+   ac-ac_sb = sb;
+   ac-ac_inode = pa-pa_inode;
+   ac-ac_op = EXT4_MB_HISTORY_DISCARD;
+   }
 
while (bit  end) {
bit = ext4_find_next_zero_bit(bitmap_bh-b_data, end, bit);
@@ -3732,11 +3743,13 @@ static noinline int ext4_mb_release_inod
(unsigned) group);
free += next - bit;
 
-   ac.ac_b_ex.fe_group = group;
-   ac.ac_b_ex.fe_start = bit;
-   ac.ac_b_ex.fe_len = next - bit;
-   ac.ac_b_ex.fe_logical = 0;
-   ext4_mb_store_history(ac);
+   if (ac) {
+   ac-ac_b_ex.fe_group = group;
+   ac-ac_b_ex.fe_start = bit;
+   ac-ac_b_ex.fe_len = next - bit;
+   ac-ac_b_ex.fe_logical = 0;
+   ext4_mb_store_history(ac);
+   }
 
mb_free_blocks(pa-pa_inode, e4b, bit, next - bit);
bit = next + 1;
@@ -3750,6 +3763,8 @@ static noinline int ext4_mb_release_inod
}
BUG_ON(free != pa-pa_free);
atomic_add(free, sbi-s_mb_discarded);
+   if (ac)
+   kmem_cache_free(ext4_ac_cachep, ac);
 
return err;
 }
@@ -3757,12 +3772,15 @@ static noinline int ext4_mb_release_inod
 static noinline int ext4_mb_release_group_pa(struct ext4_buddy *e4b,
struct ext4_prealloc_space *pa)
 {
-   struct ext4_allocation_context ac;
+   struct ext4_allocation_context *ac;
struct super_block *sb = e4b-bd_sb;
ext4_group_t group;
ext4_grpblk_t bit;
 
-   ac.ac_op = EXT4_MB_HISTORY_DISCARD;
+   ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS);
+
+   if (ac)
+   ac-ac_op = EXT4_MB_HISTORY_DISCARD;
 
BUG_ON(pa-pa_deleted == 0

[PATCH] reduce mballoc stack usage with noinline

2008-02-01 Thread Eric Sandeen
mballoc.c is a whole lot of static functions, which gcc seems to
really like to inline.

With the changes below, on x86, I can at least get from:

432 ext4_mb_new_blocks
240 ext4_mb_free_blocks
208 ext4_mb_discard_group_preallocations
188 ext4_mb_seq_groups_show
164 ext4_mb_init_cache
152 ext4_mb_release_inode_pa
136 ext4_mb_seq_history_show
...

to

220 ext4_mb_free_blocks
188 ext4_mb_seq_groups_show
176 ext4_mb_regular_allocator
164 ext4_mb_init_cache
156 ext4_mb_new_blocks
152 ext4_mb_release_inode_pa
136 ext4_mb_seq_history_show
124 ext4_mb_release_group_pa
...

which still has some big functions in there, but not 432 bytes!

I wonder if a zone would make sense for struct ext4_allocation_context,
it's 108 bytes by itself.

I haven't honestly done any investigation of any performance
impact of the noinlines; FWIW xfs just took control of inlining,
and marked almost every non-trivial function as noinline, (although
the xfs situation was a little more dire) and I don't think they had
any noticeable performance impact.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-git9/fs/ext4/mballoc.c
===
--- linux-2.6.24-git9.orig/fs/ext4/mballoc.c
+++ linux-2.6.24-git9/fs/ext4/mballoc.c
@@ -1146,7 +1146,7 @@ out:
return err;
 }
 
-static int ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
+static noinline int ext4_mb_load_buddy(struct super_block *sb, ext4_group_t 
group,
struct ext4_buddy *e4b)
 {
struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -1926,7 +1926,7 @@ static int ext4_mb_good_group(struct ext
return 0;
 }
 
-static int ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
+static noinline int ext4_mb_regular_allocator(struct ext4_allocation_context 
*ac)
 {
ext4_group_t group;
ext4_group_t i;
@@ -2433,7 +2433,7 @@ static void ext4_mb_history_init(struct 
/* if we can't allocate history, then we simple won't use it */
 }
 
-static void ext4_mb_store_history(struct ext4_allocation_context *ac)
+static noinline void ext4_mb_store_history(struct ext4_allocation_context *ac)
 {
struct ext4_sb_info *sbi = EXT4_SB(ac-ac_sb);
struct ext4_mb_history h;
@@ -2769,7 +2769,7 @@ int ext4_mb_release(struct super_block *
return 0;
 }
 
-static void ext4_mb_free_committed_blocks(struct super_block *sb)
+static noinline void ext4_mb_free_committed_blocks(struct super_block *sb)
 {
struct ext4_sb_info *sbi = EXT4_SB(sb);
int err;
@@ -2982,7 +2982,7 @@ void exit_ext4_mballoc(void)
  * Check quota and mark choosed space (ac-ac_b_ex) non-free in bitmaps
  * Returns 0 if success or error code
  */
-static int ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
+static noinline int ext4_mb_mark_diskspace_used(struct ext4_allocation_context 
*ac,
handle_t *handle)
 {
struct buffer_head *bitmap_bh = NULL;
@@ -3099,7 +3099,7 @@ static void ext4_mb_normalize_group_requ
  * Normalization means making request better in terms of
  * size and alignment
  */
-static void ext4_mb_normalize_request(struct ext4_allocation_context *ac,
+static noinline void ext4_mb_normalize_request(struct ext4_allocation_context 
*ac,
struct ext4_allocation_request *ar)
 {
int bsbits, max;
@@ -3368,7 +3368,7 @@ static void ext4_mb_use_group_pa(struct 
 /*
  * search goal blocks in preallocated space
  */
-static int ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
+static noinline int ext4_mb_use_preallocated(struct ext4_allocation_context 
*ac)
 {
struct ext4_inode_info *ei = EXT4_I(ac-ac_inode);
struct ext4_locality_group *lg;
@@ -3535,7 +3535,7 @@ static void ext4_mb_put_pa(struct ext4_a
 /*
  * creates new preallocated space for given inode
  */
-static int ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
+static noinline int ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
 {
struct super_block *sb = ac-ac_sb;
struct ext4_prealloc_space *pa;
@@ -3622,7 +3622,7 @@ static int ext4_mb_new_inode_pa(struct e
 /*
  * creates new preallocated space for locality group inodes belongs to
  */
-static int ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
+static noinline int ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
 {
struct super_block *sb = ac-ac_sb;
struct ext4_locality_group *lg;
@@ -3695,7 +3695,7 @@ static int ext4_mb_new_preallocation(str
  * the caller MUST hold group/inode locks.
  * TODO: optimize the case when there are no in-core structures yet
  */
-static int ext4_mb_release_inode_pa(struct ext4_buddy *e4b,
+static noinline int ext4_mb_release_inode_pa(struct ext4_buddy *e4b,
struct buffer_head *bitmap_bh,
struct ext4_prealloc_space *pa)
 {
@@ -3755,7 +3755,7 @@ static int

[PATCH] make mballoc history a config option

2008-02-01 Thread Eric Sandeen
mballoc history is likely a great debugging tool, but it seems a
bit heavyweight.  If I make it a config option and turn it off,
things lighten up considerably, from:

220 ext4_mb_free_blocks
188 ext4_mb_seq_groups_show
176 ext4_mb_regular_allocator
164 ext4_mb_init_cache
156 ext4_mb_new_blocks
152 ext4_mb_release_inode_pa
136 ext4_mb_seq_history_show
124 ext4_mb_release_group_pa
124 ext4_mb_init
...

to:

176 ext4_mb_regular_allocator
164 ext4_mb_init_cache
156 ext4_mb_new_blocks
124 ext4_mb_init
112 ext4_mb_free_blocks
...
44 ext4_mb_release_inode_pa
...
16 ext4_mb_release_group_pa

It's a bit shocking how much this matters to the size of
ext4_mb_release_inode_pa etc; I've not yet found the 
reason why.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-git9/fs/ext4/mballoc.c
===
--- linux-2.6.24-git9.orig/fs/ext4/mballoc.c
+++ linux-2.6.24-git9/fs/ext4/mballoc.c
@@ -367,10 +367,9 @@
 #endif
 
 /*
- * with EXT4_MB_HISTORY mballoc stores last N allocations in memory
- * and you can monitor it in /proc/fs/ext4/dev/mb_history
+ * with CONFIG_EXT4DEV_FS_MBHISTORY mballoc stores last N allocations in
+ * memory and you can monitor it in /proc/fs/ext4/dev/mb_history
  */
-#define EXT4_MB_HISTORY
 #define EXT4_MB_HISTORY_ALLOC  1   /* allocation */
 #define EXT4_MB_HISTORY_PREALLOC   2   /* preallocated blocks used */
 #define EXT4_MB_HISTORY_DISCARD4   /* preallocation 
discarded */
@@ -562,7 +561,7 @@ struct ext4_buddy {
 #define EXT4_MB_BITMAP(e4b)((e4b)-bd_bitmap)
 #define EXT4_MB_BUDDY(e4b) ((e4b)-bd_buddy)
 
-#ifndef EXT4_MB_HISTORY
+#ifndef CONFIG_EXT4DEV_FS_MBHISTORY
 static inline void ext4_mb_store_history(struct ext4_allocation_context *ac)
 {
return;
@@ -2093,7 +2092,7 @@ out:
return err;
 }
 
-#ifdef EXT4_MB_HISTORY
+#ifdef CONFIG_EXT4DEV_FS_MBHISTORY
 struct ext4_mb_proc_session {
struct ext4_mb_history *history;
struct super_block *sb;
Index: linux-2.6.24-rc6-mm1/fs/Kconfig
===
--- linux-2.6.24-rc6-mm1.orig/fs/Kconfig
+++ linux-2.6.24-rc6-mm1/fs/Kconfig
@@ -202,6 +202,13 @@ config EXT4DEV_FS_SECURITY
  If you are not using a security module that requires using
  extended attributes for file security labels, say N.
 
+config EXT4DEV_FS_MBHISTORY
+   bool Ext4dev mballoc allocator history
+   help
+ Enabling this option make it possible to monitor mballoc
+ allocator history via /proc/fs/ext4/dev/mb_history.
+ Disabling this option will save memory and stack space.
+
 config JBD
tristate
help

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make mballoc history a config option

2008-02-01 Thread Eric Sandeen
Eric Sandeen wrote:

 It's a bit shocking how much this matters to the size of
 ext4_mb_release_inode_pa etc; I've not yet found the 
 reason why.

Oh, well, duh - its' the big 108 byte allocation context on the
stack and if it's not used it gets optimized away.

I think it's worth looking at allocating that one.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

2008-01-31 Thread Eric Sandeen
Andreas Dilger wrote:
 On Jan 30, 2008  14:49 -0800, Andrew Morton wrote:
 Problem Description:

 Inode size:   256 
 
 This is a bit interesting, since it isn't very common to use large inodes.
 I suspect this relates to the problem.

I think it is somewhat common on samba servers, though.

And it's the new default in the latest e2fsprogs... maybe something will
shake out in the F9 development cycle.

 These are production Samba servers making fairly extensive use of file and
 directory ACLs. Thus far, I've only noticed the corruptions when it came 
 time
 to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
 Note that I've never noticed any issues at runtime because of this - only 
 when
 I later realised that ACLs had been removed from random files and/or
 directories.

 I think I will implement some scripts to unmount and run fsck nightly from
 cron, so I can at least detect the corruption a little earlier. If there is
 some more helpful debugging output I can provide, please let me know.
 
 There is just such a script in the thread forced fsck (again?).  Since you
 are using LVs for the filesystem.

Which is on the ext3-users list btw...

 If you are able to reproduce this, could you please dump the inode and EA
 block before fixing the problem.

Do you need instructions on doing that?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: file magic for ext4

2008-01-30 Thread Eric Sandeen
This seems to work fine; any comments, either about
the logic, or the text descriptions, or the various
options I've shown/not shown?

To test, copy your magic file somewhere local, 
replace the ext2/3 section with the following,
and do something like:

# file -C -m magic; file -m ./magic -s /dev/sda1

-Eric

# ext2/ext3 filesystems - Andreas Dilger [EMAIL PROTECTED]
0x438   leshort 0xEF53  Linux
0x44c  lelong  x   rev %d
0x43e  leshort x   \b.%d
# No journal?  ext2
0x45c  lelong  ^0x004  ext2 filesystem data
0x43a leshort ^0x001  (mounted or unclean)
# Has a journal?  ext3 or ext4
0x45c  lelong  0x004
#  and small INCOMPAT?
0x460 lelong  0x040
#   and small RO_COMPAT?
0x464 lelong 0x008  ext3 filesystem data
#   else large RO_COMPAT?
0x464 lelong 0x007  ext4 filesystem data
#  else large INCOMPAT?
0x460 lelong  0x03f  ext4 filesystem data
# General flags for any ext* fs
0x460  lelong  0x004  (needs journal recovery)
0x43a  leshort 0x002  (errors)
# INCOMPAT flags
0x460  lelong  0x001  (compressed)
#0x460 lelong  0x002  (filetype)
#0x460 lelong  0x010  (meta bg)
0x460  lelong  0x040  (extents)
0x460  lelong  0x080  (64bit)
#0x460 lelong  0x100  (mmp)
#0x460 lelong  0x200  (flex bg)
# RO_INCOMPAT flags
#0x464 lelong  0x001  (sparse super)
0x464  lelong  0x002  (large files)
0x464  lelong  0x008  (huge files)
#0x464 lelong  0x010  (gdt checksum)
#0x464 lelong  0x020  (many subdirs)
#0x463 lelong  0x040  (extra isize)


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck ext4 with extents possible?

2008-01-30 Thread Eric Sandeen
supersud501 wrote:
 hello,
 
 i'm testing the ext4-filesystem for months now (no data loss experienced 
 ), but i noticed e2fsck not being able to check a ext4-filesystem with 
 extents (e2fsck: Filesystem has unsupported feature(s)).
 
 i just pulled linus' kernel tree (with the newly added patches to ext4) 
 and e2fsprogs git tree (from git.kernel.org), but still e2fsprogs is not 
 able to check my ext4-fs.
 
 so i wonder whether there's somewhere a patch for e2fsprogs flying 
 around i've missed or whether checking ext4 with extents is simply not 
 supported yet? if so, how's development going on with this?

No released e2fsprogs supports ext4 filesystems with new on-disk
features yet; that's partly what the dev on the end of ext4dev.ko is
for.  Ted's working on it.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH e2fsprogs] - detect LVM2 PVs in libblkid

2008-01-30 Thread Eric Sandeen
The anaconda folks are now using blkid instead of hand-rolled
tests for filesystem type at install time, but they had one
more request:

Bugzilla Bug 409321: RFE: information on blkdevs formatted as PVs
https://bugzilla.redhat.com/show_bug.cgi?id=409321

The attached patch does the right thing for me on my sample
set of exactly 1 PV...

Any issues with reporting back something which is not actually
a filesystem (lvm2pv) ?

[EMAIL PROTECTED] misc/blkid -c /dev/null /dev/sda2
/dev/sda2: UUID=guOQGdcOE3IafCm0190XkPZTy5fCEanQ TYPE=lvm2pv 
[EMAIL PROTECTED] pvs -o pv_name,pv_uuid
  PV PV UUID   
  /dev/sda2  guOQGd-cOE3-IafC-m019-0XkP-ZTy5-fCEanQ

Bits liberally stolen from lvm2 userspace.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: e2fsprogs-1.40.5/lib/blkid/probe.h
===
--- e2fsprogs-1.40.5.orig/lib/blkid/probe.h
+++ e2fsprogs-1.40.5/lib/blkid/probe.h
@@ -531,6 +531,20 @@ struct hfs_mdb {
 __u16embed_blockcount;
 } __attribute__((packed));
 
+/* this is lvm's label_header  pv_header combined. */
+
+#define LVM2_ID_LEN 32
+
+struct lvm2_pv_label_header {
+   /* label_header */
+   __u8id[8];  /* LABELONE */
+   __u64   sector_xl;  /* Sector number of this label */
+   __u32   crc_xl; /* From next field to end of sector */
+   __u32   offset_xl;  /* Offset from start of struct to contents */
+   __u8type[8];/* LVM2 001 */
+   /* pv_header */
+   __u8pv_uuid[LVM2_ID_LEN];
+} __attribute__ ((packed));
 
 /*
  * Byte swap functions
Index: e2fsprogs-1.40.5/lib/blkid/probe.c
===
--- e2fsprogs-1.40.5.orig/lib/blkid/probe.c
+++ e2fsprogs-1.40.5/lib/blkid/probe.c
@@ -895,6 +895,64 @@ static int probe_hfsplus(struct blkid_pr
return 1;
 }
 
+#define LVM2_LABEL_SIZE 512
+static int lvm2_calc_crc(const void *buf, uint size)
+{
+   static const uint crctab[] = {
+   0x, 0x1db71064, 0x3b6e20c8, 0x26d930ac,
+   0x76dc4190, 0x6b6b51f4, 0x4db26158, 0x5005713c,
+   0xedb88320, 0xf00f9344, 0xd6d6a3e8, 0xcb61b38c,
+   0x9b64c2b0, 0x86d3d2d4, 0xa00ae278, 0xbdbdf21c
+   };
+   uint i, crc = 0xf597a6cf;
+   const __u8 *data = (const __u8 *) buf;
+
+   for (i = 0; i  size; i++) {
+   crc ^= *data++;
+   crc = (crc  4) ^ crctab[crc  0xf];
+   crc = (crc  4) ^ crctab[crc  0xf];
+   }
+   return crc;
+}
+
+static int probe_lvm2(struct blkid_probe *probe,
+   struct blkid_magic *id __BLKID_ATTR((unused)),
+   unsigned char *buf)
+{
+   int sector = (id-bim_kboff)  1;;
+   struct lvm2_pv_label_header *label;
+   label = (struct lvm2_pv_label_header *)buf;
+
+   /* buf is at 0k or 1k offset; find label inside */
+   if (memcmp(buf, LABELONE, 8) == 0) {
+   label = (struct lvm2_pv_label_header *)buf;
+   } else if (memcmp(buf + 512, LABELONE, 8) == 0) {
+   label = (struct lvm2_pv_label_header *)(buf + 512);
+   sector++;
+   } else {
+   return 1;
+   }
+
+   if (blkid_le64(label-sector_xl) != sector) {
+   DBG(DEBUG_PROBE,
+   printf(LVM2: label for sector %d found at sector %d\n,
+  blkid_le64(label-sector_xl), sector));
+   return 1;
+   }
+
+   if (lvm2_calc_crc(label-offset_xl, LVM2_LABEL_SIZE -
+   ((void *)label-offset_xl - (void *)label)) !=
+   blkid_le32(label-crc_xl)) {
+   DBG(DEBUG_PROBE,
+   printf(LVM2: label checksum incorrect at sector %d\n,
+  sector));
+   return 1;
+   }
+
+   blkid_set_tag(probe-dev, UUID, label-pv_uuid, LVM2_ID_LEN);
+
+   return 0;
+}
 /*
  * BLKID_BLK_OFFS is at least as large as the highest bim_kboff defined
  * in the type_array table below + bim_kbalign.
@@ -988,6 +1046,10 @@ static struct blkid_magic type_array[] =
   { crypt_LUKS, 0,0,  6, LUKS\xba\xbe, probe_luks },
   { squashfs, 0,  0,  4, sqsh, 0 },
   { squashfs, 0,  0,  4, hsqs, 0 },
+  { lvm2pv,   0,  0x218,  8, LVM2 001, probe_lvm2 },
+  { lvm2pv,   0,  0x018,  8, LVM2 001, probe_lvm2 },
+  { lvm2pv,   1,  0x018,  8, LVM2 001, probe_lvm2 },
+  { lvm2pv,   1,  0x218,  8, LVM2 001, probe_lvm2 },
   {   NULL, 0,  0,  0, NULL,   NULL }
 };
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] allow in-inode EAs on ext4 root inode

2008-01-29 Thread Eric Sandeen
The ext3 root inode was treated specially with respect 
to in-inode extended attributes, for reasons detailed 
in the removed comment below.  The first mkfs-created
inodes would not get extra_i_size or the EXT3_STATE_XATTR 
flag set in ext3_read_inode, which disallowed reading or 
setting in-inode EAs on the root.

However, in ext4, ext4_mark_inode_dirty calls 
ext4_expand_extra_isize for all inodes; once this is done 
EAs may be placed in the root ext4 inode body.

But for reasons above, it won't be found after a reboot.

testcase:

setfattr -n user.name -v value mntpt/
setfattr -n user.name2 -v value2 mntpt/
umount mntpt/; remount mntpt/
getfattr -d mntpt/

name2/value2 has gone missing; debugfs shows it in the
inode body, but it is not found there by getattr.

The following fixes it up; newer mkfs appears to properly
zero the inodes, so this workaround isn't needed for ext4.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc6-mm1/fs/ext4/inode.c
===
--- linux-2.6.24-rc6-mm1.orig/fs/ext4/inode.c
+++ linux-2.6.24-rc6-mm1/fs/ext4/inode.c
@@ -2750,13 +2750,7 @@ void ext4_read_inode(struct inode * inod
ei-i_data[block] = raw_inode-i_block[block];
INIT_LIST_HEAD(ei-i_orphan);
 
-   if (inode-i_ino = EXT4_FIRST_INO(inode-i_sb) + 1 
-   EXT4_INODE_SIZE(inode-i_sb)  EXT4_GOOD_OLD_INODE_SIZE) {
-   /*
-* When mke2fs creates big inodes it does not zero out
-* the unused bytes above EXT4_GOOD_OLD_INODE_SIZE,
-* so ignore those first few inodes.
-*/
+   if (EXT4_INODE_SIZE(inode-i_sb)  EXT4_GOOD_OLD_INODE_SIZE) {
ei-i_extra_isize = le16_to_cpu(raw_inode-i_extra_isize);
if (EXT4_GOOD_OLD_INODE_SIZE + ei-i_extra_isize 
EXT4_INODE_SIZE(inode-i_sb)) {

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH e2fsprogs] print mostly-printable xattr strings in debugfs

2008-01-29 Thread Eric Sandeen
Taking a cue from getfattr... if a string is mostly
printable characters, go ahead  print as a string,
and escape what's left over.

so we get:

Extended attributes stored in inode body: 
  selinux = system_u:object_r:root_t:s0\000 (28)

instead of:

Extended attributes stored in inode body: 
  selinux = 73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f 72 3a 72 6f 6f 74 
5f 74 3a 73 30 00  (28)

(selinux includes the trailing null in len so it
never prints as a string today)

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
---
 
Index: e2fsprogs-1.40.5/debugfs/debugfs.c
===
--- e2fsprogs-1.40.5.orig/debugfs/debugfs.c
+++ e2fsprogs-1.40.5/debugfs/debugfs.c
@@ -434,19 +434,21 @@ static int list_blocks_proc(ext2_filsys 
 
 static void dump_xattr_string(FILE *out, const char *str, int len)
 {
-   int printable = 1;
+   int printable = 0;
int i;

-   /* check is string printable? */
+   /* check: is string printable enough? */
for (i = 0; i  len; i++)
-   if (!isprint(str[i])) {
-   printable = 0;
-   break;
-   }
+   if (isprint(str[i]))
+   printable++;
+
+   if (printable = len*7/8)
+   printable = 0;
 
for (i = 0; i  len; i++)
if (printable)
-   fprintf(out, %c, (unsigned char)str[i]);
+   fprintf(out, isprint(str[i]) ? %c : \\%03o,
+   (unsigned char)str[i]);
else
fprintf(out, %02x , (unsigned char)str[i]);
 }

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-28 Thread Eric Sandeen
Theodore Tso wrote:
 On Mon, Jan 28, 2008 at 04:26:53PM +0100, Matthias Koenig wrote:
 Patch6: e2fsprogs-mdraid.patch

 This apparently adds a new environment variable,
 BLKID_SKIP_CHECK_MDRAID, which forces blkid to not detect mdraid
 devices.  I'm not sure why.
 Workaround for people having stale RAID signature on their disk:
 https://bugzilla.novell.com/show_bug.cgi?id=100530
 
 Hmm... there's got to be a better way around this.

Won't help existing block devices, but it'd be nice to have a common
library which could be called @ mkfs time to wipe out all known
signatures...

mkfs.xfs tries to do this, but it'd be silly to duplicate in every mkfs.

 On my one of these days list is to get another cheap/used laptop so
 I can try out the latest Fedora Core Rawhide without having to fire up
 a huge (noisy) x86_64 box

Just partition... ;)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-27 Thread Eric Sandeen
Theodore Tso wrote:

 Patch10:close.patch
 
 I don't understand what this patch is trying to do.

* Tue Nov 15 2005 [EMAIL PROTECTED]
- added close.patch provided by Ted Tso (IBM) to fix bug #132708

*grin* Maybe obsolete by now?  haven't looked closely.

 Patch12:e2fsprogs-mkinstalldirs.patch
 
 Why?
 

Probably same as why we have something similar; for one reason or other
need to rerun autoconf, and e2fsprogs isn't compatible with latest
autoconf.  (This is a patch I inherited, and haven't yet investigated
all the details)

 Patch22:e2fsprogs-1.40.4-uuidd_pid_path.patch
 
 The problem with this patch is that /var/run is cleared via rm -rf, so
 it is highly problamtic to put the scratch directory for uuidd in
 /var/run.

Hm, I had similar issues with uuidd too - common theme here?

 
 Patch32:libcom_err-no-e2fsck.static.patch
 
 This patch does two completely unrelated things.  One is to disable
 the libcom_err regression test suite (probably because some of the
 other changes made) and the other is to disable building the
 e2fsck.static file.  Why these two are bundled into a single patch I'm
 not sure.

And I have a patch to do the latter as well.  Interesting how we've
arrived at similar needed changes, independently.  :)

and Patch99:e2fsprogs-no_cmd_hiding.patch

honestly I like that; I should whip up a nice patch to emulate kbuild,
with V=1 or something, unless there is some other easy way to show full
build commands already?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] UPDATED: ignore safe flag differences when fsck compares superblocks

2008-01-26 Thread Eric Sandeen
Theodore Tso wrote:
 This is what I've checked in.  See the comment about why we can't
 ignore a difference for the extended attribute feature.

Ok.  Unfortunately, that one really hurts on fedora, or any distro which
writes xattrs during install.

The installer mkfs's, mounts, installs with selinux, reboots = force
recheck of the entire filesystem, because the kernel wrote extended
attributes for selinux, but only to the primary superblock.

Solely having this set in primary but not in the backups should hardly
be a trigger for a full filesystem check, should it?

-Eric

  - Ted
 
 commit a8cde73acbf6e0f9c0a3601e4f5fac2b01a27bd2
 Author: Theodore Ts'o [EMAIL PROTECTED]
 Date:   Sat Jan 26 23:17:50 2008 -0500
 
 Ignore safe flag differences when e2fsck compares superblocks
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] UPDATED: ignore safe flag differences when fsck compares superblocks

2008-01-25 Thread Eric Sandeen
(updated for thinko: when proper flag *is* set on both primary  backup)

Recent e2fsprogs (1.40.3 and higher) fsck compares primary superblock to 
backups, and if things differ, it forces a full check.  However, the
kernel has a penchant for updating flags the first time a feature is
used - attributes, large files, etc.

However, it only updates these on the primary sb.  This then causes
the new e2fsck behavior to trigger a full check.  I think these flags
can be safely ignored on this check; having them set on the primary
but not the backups doesn't indicate corruption; if they're wrongly
set on the primary, really no damage is done, and if the backup is
used, but it doesn't have the flags set when it should, I'm pretty sure
e2fsck can cope with that.

I'll admit the patch below is not glamorous.  Any comments, either
on the style(sic) or the intent of the patch?

Thanks,

-Eric

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

Index: e2fsprogs-1.40.4/e2fsck/super.c
===
--- e2fsprogs-1.40.4.orig/e2fsck/super.c
+++ e2fsprogs-1.40.4/e2fsck/super.c
@@ -814,10 +814,32 @@ int check_backup_super_block(e2fsck_t ct
continue;
}
 
-#define SUPER_DIFFERENT(x) (fs-super-x != tfs-super-x)
-   if (SUPER_DIFFERENT(s_feature_compat) ||
-   SUPER_DIFFERENT(s_feature_incompat) ||
-   SUPER_DIFFERENT(s_feature_ro_compat) ||
+   /*
+* A few flags are set on the fly by the kernel, but
+* only in the primary superblock.  They are safe
+* to copy even if they differ.
+*/ 
+
+#define FEATURE_COMPAT_IGNORE  (EXT2_FEATURE_COMPAT_EXT_ATTR)
+#define FEATURE_RO_COMPAT_IGNORE   (EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
+EXT4_FEATURE_RO_COMPAT_DIR_NLINK)
+#define FEATURE_INCOMPAT_IGNORE(EXT3_FEATURE_INCOMPAT_EXTENTS)
+
+#define SUPER_COMPAT_DIFFERENT(x)  \
+   (( fs-super-x  ~FEATURE_COMPAT_IGNORE) !=\
+(tfs-super-x  ~FEATURE_COMPAT_IGNORE))
+#define SUPER_INCOMPAT_DIFFERENT(x)\
+   (( fs-super-x  ~FEATURE_INCOMPAT_IGNORE) !=  \
+(tfs-super-x  ~FEATURE_INCOMPAT_IGNORE))
+#define SUPER_RO_COMPAT_DIFFERENT(x)   \
+   (( fs-super-x  ~FEATURE_RO_COMPAT_IGNORE) != \
+(tfs-super-x  ~FEATURE_RO_COMPAT_IGNORE))
+#define SUPER_DIFFERENT(x) \
+   (fs-super-x != tfs-super-x)
+
+   if (SUPER_COMPAT_DIFFERENT(s_feature_compat) ||
+   SUPER_INCOMPAT_DIFFERENT(s_feature_incompat) ||
+   SUPER_RO_COMPAT_DIFFERENT(s_feature_ro_compat) ||
SUPER_DIFFERENT(s_blocks_count) ||
SUPER_DIFFERENT(s_inodes_count) ||
memcmp(fs-super-s_uuid, tfs-super-s_uuid,


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Eric Sandeen
Theodore Tso wrote:
 The other approach would be to say, oh well, the freeze ioctl is
 inherently dangerous, and root is allowed to himself in the foot, so
 who cares.  :-)

I tend to agree.  Either you need your fs frozen, or not, and if you do,
be prepared for the consequences.

 But it was this concern which is why ext3 never exported freeze
 functionality to userspace, even though other commercial filesystems
 do support this.  It wasn't that it wasn't considered, but the concern
 about whether or not it was sufficiently safe to make available.

What's the safety concern; that the admin will forget to unfreeze?

 And I do agree that we probably should just implement this in
 filesystem independent way, in which case all of the filesystems that
 support this already have super_operations functions
 write_super_lockfs() and unlockfs().

That's what I was thinking; can't the path to freeze_bdev just be
elevated out of dm-ioctl.c to fs/ioctl.c and exposed, such that any
filesystem which implements .write_super_lockfs can be frozen?  This is
essentially what the xfs_freeze userspace does via
xfs_ioctl/XFS_IOC_FREEZE - which, AFAIK, isn't used much now that the
lvm hooks are in place.

I'm also not sure I see the point of the timeout in the original patch;
either you are done snapshotting and ready to unfreeze, or you're not;
1, or 2, or 3 seconds doesn't really matter.  When you're done, you're
done, and you can only unfreeze then.  Shouldn't this be done
programmatically, and not with some pre-determined timeout?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH e2fsprogs] ignore safe flag differences when fsck compares superblocks

2008-01-22 Thread Eric Sandeen
Recent e2fsprogs (1.40.3 and higher) fsck compares primary superblock to 
backups, and if things differ, it forces a full check.  However, the
kernel has a penchant for updating flags the first time a feature is
used - attributes, large files, etc.

However, it only updates these on the primary sb.  This then causes
the new e2fsck behavior to trigger a full check.  I think these flags
can be safely ignored on this check; having them set on the primary
but not the backups doesn't indicate corruption; if they're wrongly
set on the primary, really no damage is done, and if the backup is
used, but it doesn't have the flags set when it should, I'm pretty sure
e2fsck can cope with that.

I'll admit the patch below is not glamorous.  Any comments, either
on the style(sic) or the intent of the patch?

Thanks,

-Eric

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: e2fsprogs-1.40.4/e2fsck/super.c
===
--- e2fsprogs-1.40.4.orig/e2fsck/super.c
+++ e2fsprogs-1.40.4/e2fsck/super.c
@@ -814,10 +814,29 @@ int check_backup_super_block(e2fsck_t ct
continue;
}
 
-#define SUPER_DIFFERENT(x) (fs-super-x != tfs-super-x)
-   if (SUPER_DIFFERENT(s_feature_compat) ||
-   SUPER_DIFFERENT(s_feature_incompat) ||
-   SUPER_DIFFERENT(s_feature_ro_compat) ||
+   /*
+* A few flags are set on the fly by the kernel, but
+* only in the primary superblock.  They are safe
+* to copy even if they differ.
+*/ 
+
+#define FEATURE_COMPAT_IGNORE  (EXT2_FEATURE_COMPAT_EXT_ATTR)
+#define FEATURE_RO_COMPAT_IGNORE   (EXT2_FEATURE_RO_COMPAT_LARGE_FILE | \
+EXT4_FEATURE_RO_COMPAT_DIR_NLINK)
+#define FEATURE_INCOMPAT_IGNORE(EXT3_FEATURE_INCOMPAT_EXTENTS)
+
+#define SUPER_COMPAT_DIFFERENT(x)  \
+   ((fs-super-x  ~FEATURE_COMPAT_IGNORE) != tfs-super-x)
+#define SUPER_INCOMPAT_DIFFERENT(x)\
+   ((fs-super-x  ~FEATURE_INCOMPAT_IGNORE) != tfs-super-x)
+#define SUPER_RO_COMPAT_DIFFERENT(x)   \
+   ((fs-super-x  ~FEATURE_RO_COMPAT_IGNORE) != tfs-super-x)
+#define SUPER_DIFFERENT(x) \
+   (fs-super-x != tfs-super-x)
+
+   if (SUPER_COMPAT_DIFFERENT(s_feature_compat) ||
+   SUPER_INCOMPAT_DIFFERENT(s_feature_incompat) ||
+   SUPER_RO_COMPAT_DIFFERENT(s_feature_ro_compat) ||
SUPER_DIFFERENT(s_blocks_count) ||
SUPER_DIFFERENT(s_inodes_count) ||
memcmp(fs-super-s_uuid, tfs-super-s_uuid,



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH e2fsprogs] teach libblkid about ext4(dev)

2008-01-22 Thread Eric Sandeen
Returns ext4dev for now; will need to change to ext4 at the appropriate
time I guess.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

--- 

Index: e2fsprogs-1.40.4/lib/blkid/probe.c

===
--- e2fsprogs-1.40.4.orig/lib/blkid/probe.c
+++ e2fsprogs-1.40.4/lib/blkid/probe.c
@@ -148,6 +148,38 @@ static void get_ext2_info(blkid_dev dev,
set_uuid(dev, es-s_uuid, 0);
 }
 
+static int probe_ext4(struct blkid_probe *probe,
+ struct blkid_magic *id __BLKID_ATTR((unused)),
+ unsigned char *buf)
+{
+   struct ext2_super_block *es;
+   es = (struct ext2_super_block *)buf;
+
+   /* Distinguish between jbd and ext2/3/4 fs */
+   if (blkid_le32(es-s_feature_incompat) 
+   EXT3_FEATURE_INCOMPAT_JOURNAL_DEV)
+   return -BLKID_ERR_PARAM;
+
+   /* Distinguish between ext3/4 and ext2 */
+   if (!(blkid_le32(es-s_feature_compat) 
+ EXT3_FEATURE_COMPAT_HAS_JOURNAL))
+   return -BLKID_ERR_PARAM;
+
+   /* Distinguish between ext4 and ext3 */
+   if (!(blkid_le32(es-s_feature_ro_compat) 
+ EXT4_FEATURES_RO_COMPAT) 
+   !(blkid_le32(es-s_feature_incompat) 
+ EXT4_FEATURES_INCOMPAT))
+   return -BLKID_ERR_PARAM;
+
+   get_ext2_info(probe-dev, buf);
+
+   if ((es-s_feature_compat  EXT3_FEATURE_COMPAT_HAS_JOURNAL) 
+   !uuid_is_null(es-s_journal_uuid))
+   set_uuid(probe-dev, es-s_journal_uuid, EXT_JOURNAL);
+
+   return 0;
+}
 static int probe_ext3(struct blkid_probe *probe, 
  struct blkid_magic *id __BLKID_ATTR((unused)), 
  unsigned char *buf)
@@ -833,6 +865,7 @@ static struct blkid_magic type_array[] =
   { oracleasm, 0,32,  8, ORCLDISK, probe_oracleasm },
   { ntfs, 0,  3,  8, NTFS, probe_ntfs },
   { jbd,  1,   0x38,  2, \123\357, probe_jbd },
+  { ext4dev,  1,   0x38,  2, \123\357, probe_ext4 },
   { ext3, 1,   0x38,  2, \123\357, probe_ext3 },
   { ext2, 1,   0x38,  2, \123\357, probe_ext2 },
   { reiserfs, 8,   0x34,  8, ReIsErFs, probe_reiserfs 
},
Index: e2fsprogs-1.40.4/lib/blkid/probe.h
===
--- e2fsprogs-1.40.4.orig/lib/blkid/probe.h
+++ e2fsprogs-1.40.4/lib/blkid/probe.h
@@ -88,6 +88,26 @@ struct ext2_super_block {
 #define EXT3_FEATURE_INCOMPAT_RECOVER  0x0004
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV  0x0008
 
+#define EXT4_FEATURE_RO_COMPAT_HUGE_FILE   0x0008
+#define EXT4_FEATURE_RO_COMPAT_GDT_CSUM0x0010
+#define EXT4_FEATURE_RO_COMPAT_DIR_NLINK   0x0020
+#define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE 0x0040
+
+#define EXT4_FEATURE_INCOMPAT_EXTENTS  0x0040 /* extents support */
+#define EXT4_FEATURE_INCOMPAT_64BIT0x0080
+#define EXT4_FEATURE_INCOMPAT_MMP  0x0100
+#define EXT4_FEATURE_INCOMPAT_FLEX_BG  0x0200
+
+#define EXT4_FEATURES_RO_COMPAT(EXT4_FEATURE_RO_COMPAT_HUGE_FILE| \
+EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
+EXT4_FEATURE_RO_COMPAT_DIR_NLINK| \
+EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)
+
+#define EXT4_FEATURES_INCOMPAT (EXT4_FEATURE_INCOMPAT_EXTENTS| \
+EXT4_FEATURE_INCOMPAT_64BIT| \
+EXT4_FEATURE_INCOMPAT_MMP| \
+EXT4_FEATURE_INCOMPAT_FLEX_BG)
+
 struct xfs_super_block {
unsigned char   xs_magic[4];
__u32   xs_blocksize;

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] Add new development flag to the ext4 filesystem

2008-01-22 Thread Eric Sandeen
Theodore Tso wrote:
 As discussed on RFC, this flag is simply a generic this is a
 crash/burn test filesystem marker.  If it is set, then filesystem
 code which is in development will be allowed to mount the
 filesystem.  Filesystem code which is not considered ready for
 prime-time will check for this flag, and if it is not set, it will
 refuse to touch the filesystem.
 
 As we start rolling ext4 out to distro's like Fedora, et. al, this
 makes it less likely that a user might accidentally start using ext4
 on a production filesystem; a bad thing, since that will essentially
 make it be unfsckable until e2fsprogs catches up.
 
   - Ted
 

Overall, seems ok.  One other question though, when ext4 is a
fully-fledged production filesystem, and the flag requirement is gone,
what stops ext3 filesystems from being silently mounted as ext4, just as
happened with ext4dev today?  Will users need to be sure everything
explicitly mounts -t ext3 to avoid this?


+static void parse_extended_opts(ext2_filsys fs, const char *opts)

...

+   if (!strcmp(token, test_fs)) {
+   fs-super-s_flags |= EXT2_FLAGS_TEST_FILESYS;
+   printf(Setting test filesystem flag\n);
+   ext2fs_mark_super_dirty(fs);
+   } else if (!strcmp(token, production_fs)) {
+   fs-super-s_flags = ~EXT2_FLAGS_TEST_FILESYS;
+   printf(Clearing test filesystem flag\n);
+   ext2fs_mark_super_dirty(fs);
+   } else
+   r_usage++;
+   }
+   if (r_usage) {
+   fprintf(stderr, _(\nBad options specified.\n\n
+   Extended options are separated by commas, 
+   and may take an argument which\n
+   \tis set off by an equals ('=') sign.\n\n
+   Valid extended options are:\n
+   \ttestfs\n));

help text doesn't match reality here, missing a _

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


why are acls user xattrs off by default?

2008-01-16 Thread Eric Sandeen
I was looking through Anaconda to see what it's doing when it makes ext3
filesystems:

rc = iutil.execWithRedirect(tune2fs,
[-c0, -i0, -Odir_index,
 -ouser_xattr,acl, devicePath],
stdout = /dev/tty5,
stderr = /dev/tty5, searchPath = 1)

-Odir_index is redundant by now; it's default in e2fsprogs.

-c0 -i0 are probably subject to some debate, but...

why are user_xattr  acl required to be mount options for ext3?  (added
to defaults or explicit?)  I'm sure there is some history here, but why
build it all into the module, and then turn them off without some extra
magic passed in?  What's the advantage to disabling user xattrs  acls
at runtime by default?

Thanks  just curious,

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Fwd: [Bug 9732] New: oops in extent code via ext4_fallocate]

2008-01-11 Thread Eric Sandeen
---BeginMessage---
http://bugzilla.kernel.org/show_bug.cgi?id=9732

   Summary: oops in extent code via ext4_fallocate
   Product: File System
   Version: 2.5
 KernelVersion: 2.6.24-rc7
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: ext4
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


Latest working kernel version: Unknown
Earliest failing kernel version: 2.6.24-rc7
Distribution: Fedora Rawhide
Hardware Environment: x86_64
Software Environment:
Problem Description:

a simple fallocate; truncate; fallocate series oopses the kernel.

2.6.24-rc7 with the latest ext4 patch stack as of about Jan 9, 2008, minus
delalloc patches.

Steps to reproduce, using the test program at
http://oss.sgi.com/archives/xfs/2007-07/msg00092.html

[EMAIL PROTECTED] sdb8]# touch testfile
[EMAIL PROTECTED] sdb8]# ./testfallocate -f testfile 0 65536
Trying to preallocate blocks (offset=0, len=65536)
fallocate system call succedded !  ret=0
# FALLOCATE TEST REPORT #
New blocks preallocated = 16.
Number of bytes preallocated = 65536
Old file size = 0, New file size 65536.
Old num blocks = 0, New num blocks 64.


### TESTS PASSED ###
[EMAIL PROTECTED] sdb8]# ls  -lh testfile; du -hc testfile
-rw-r--r-- 1 root root 64K Jan 11 14:12 testfile
64K testfile
64K total
[EMAIL PROTECTED] sdb8]# /root/truncate testfile 32768
Truncating testfile to 32768
[EMAIL PROTECTED] sdb8]# ls  -lh testfile; du -hc testfile
-rw-r--r-- 1 root root 32K Jan 11 14:12 testfile
32K testfile
32K total
[EMAIL PROTECTED] sdb8]# ./testfallocate -f testfile 0 65536
Trying to preallocate blocks (offset=0, len=65536)
Segmentation fault


and yields:
EXT4-fs: file extents enabled
EXT4-fs: mballoc enabled
[ cut here ]
kernel BUG at fs/ext4/extents.c:1056!
invalid opcode:  [1] SMP 
CPU 2 
Modules linked in: ext4dev jbd2 crc16 autofs4 hidp nfs lockd nfs_acl rfcomm
l2cap bluetooth sunrpc ipv6 cpufreq_ondemand dm_multipath video output sbs
sbshc battery ac power_supply parport_pc lp parport sg pata_acpi
cfi_cmdset_0002 ata_generic cfi_util button jedec_probe serio_raw cfi_probe tg3
gen_probe ck804xrom mtd rtc_cmos chipreg pata_amd map_funcs k8temp libata hwmon
i2c_nforce2 shpchp i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_mod qla2xxx
scsi_transport_fc scsi_tgt mptspi mptscsih mptbase scsi_transport_spi sd_mod
scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
Pid: 3554, comm: testfallocate Not tainted 2.6.24-0.147.rc7.git2.fc9 #1
RIP: 0010:[88364497]  [88364497]
:ext4dev:ext4_ext_search_left+0x97/0xbc
RSP: 0018:81012dc65cf0  EFLAGS: 00010287
RAX: 8008 RBX: 81012dc65d78 RCX: 
RDX: 81012dc65d90 RSI: 81012e5f37e0 RDI: 81012e04c00c
RBP: 81012e04c180 R08: 0008 R09: 
R10:  R11: 81012dc65d98 R12: 0008
R13: 81012e04c180 R14: 0008 R15: 
FS:  2aab96f0() GS:81013fc01a28() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00327d0414a0 CR3: 000130ce8000 CR4: 06a0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process testfallocate (pid: 3554, threadinfo 81012dc64000, task
81012dd5a000)
Stack:  88366073 8101349e71d0 81053c91 00022dcc30b8
 81012dc65e88 0008 81012e549000 81012dcc30b8
 81012dcc3090 81012e04c000 81012dcc30b8 0008
Call Trace:
 [88366073] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1
 [81053c91] lock_release_holdtime+0x27/0x49
 [812748f6] _spin_unlock+0x17/0x20
 [883400a6] :jbd2:start_this_handle+0x4e0/0x4fe
 [88366564] :ext4dev:ext4_fallocate+0x175/0x39a
 [81053c91] lock_release_holdtime+0x27/0x49
 [81056480] __lock_acquire+0x4e7/0xc4d
 [81053c91] lock_release_holdtime+0x27/0x49
 [810a8de7] sys_fallocate+0xe4/0x10d
 [8100c043] tracesys+0xd5/0xda


Code: 0f 0b eb fe ff c8 89 02 0f b7 47 06 8b 4f 08 0f b7 57 04 48 
RIP  [88364497] :ext4dev:ext4_ext_search_left+0x97/0xbc
 RSP 81012dc65cf0
---[ end trace 9a60a6a6c694770a ]---
SysRq : Resetting


The BUG_ON is:

BUG_ON(*logical  le32_to_cpu(ex-ee_block) + le16_to_cpu(ex-ee_len));

where these were the values:

logical 8 ee_block 0 ee_len 32776

Haven't looked further into it yet.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You reported the bug, or are watching the reporter.
---End Message---


[PATCH] Fix up uuidd man page

2008-01-09 Thread Eric Sandeen
uuidd(8) man page had a typo, and a couple of stale defaults
documented.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: e2fsprogs-1.40.4/misc/uuidd.8.in
===
--- e2fsprogs-1.40.4.orig/misc/uuidd.8.in
+++ e2fsprogs-1.40.4/misc/uuidd.8.in
@@ -64,11 +64,11 @@ UUID's.
 .TP
 .BI \-p   pidfile
 Specify the pathname where the pid file should be written.  By default,
-the pid file is written to /var/run/uuidd.pid.
+the pid file is written to /var/lib/libuuid/uuidd.pid.
 .TP
 .BI \-s  socketpath
 Specify the pathname used for the unix-domain socket used by uuidd.  By
-qdefault, the pathname used is /var/run/uuidd.sock.  This is primarily
+default, the pathname used is /var/lib/libuuid/request.  This is primarily
 for debugging purposes, since the pathname is hard-coded in the libuuid
 library.
 .TP

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix up uuidd man page

2008-01-09 Thread Eric Sandeen
Eric Sandeen wrote:
 uuidd(8) man page had a typo, and a couple of stale defaults
 documented.
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 
 ---
 
 Index: e2fsprogs-1.40.4/misc/uuidd.8.in
 ===
 --- e2fsprogs-1.40.4.orig/misc/uuidd.8.in
 +++ e2fsprogs-1.40.4/misc/uuidd.8.in
 @@ -64,11 +64,11 @@ UUID's.
  .TP
  .BI \-p   pidfile
  Specify the pathname where the pid file should be written.  By default,
 -the pid file is written to /var/run/uuidd.pid.
 +the pid file is written to /var/lib/libuuid/uuidd.pid.

Actually, hang on.  Why is this in /var/lib, not /var/run?

I can see keeping state files in /var/lib so they're persistent across
boots, but the pidfile?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Use superblock s_raid_stripe_width as stripe size during block allocation.

2008-01-09 Thread Eric Sandeen
Aneesh Kumar K.V wrote:
 The stipe size used during block allocation is calculated as below.
 a) if we specify a stripe=value option using mount time. Use that value.
 b) if not use the value specified in super block.
 b) If the value specfied at mount time is greater than blocks per group use
 the value specified ini the super block.
 
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/super.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)
 
 diff --git a/fs/ext4/super.c b/fs/ext4/super.c
 index db1edc8..10330eb 100644
 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -2136,6 +2136,16 @@ static int ext4_fill_super (struct super_block *sb, 
 void *data, int silent)
   sbi-s_rsv_window_head.rsv_alloc_hit = 0;
   sbi-s_rsv_window_head.rsv_goal_size = 0;
   ext4_rsv_window_add(sb, sbi-s_rsv_window_head);
 + /*
 +  * set the stripe size. If we have specified it via mount option, then
 +  * use the mount option value. If the value specified at mount time is
 +  * greater than the blocks per group use the super block value.
 +  * Allocator needs it be less than blocks per group.
 +  */
 + if (!sbi-s_stripe ||
 + sbi-s_stripe = sbi-s_blocks_per_group) {

Is stripe == s_blocks_per_group problematic?

Should we warn/printk in the case that the specified size is too big?

Thanks,

-Eric

 + sbi-s_stripe = le32_to_cpu(es-s_raid_stripe_width);
 + }
  
   /*
* set up enough so that it can read an inode

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] UPDATED3: types fixup for mballoc

2008-01-08 Thread Eric Sandeen
4th time's the charm?

Note, the calculations Andreas  I were discussing only work properly
for stripe = blocks per group... I don't know if we'd need to enforce
that at mount time?

-

I ran into a potential overflow in ext4_mb_scan_aligned, 
and went looking for others in mballoc.  This patch hits a 
few spots, compile-tested only at this point, comments welcome.

This patch:

changes fe_len to an int, I don't think we need it to be a long,
looking at how it's used (should it be a grpblk_t?)  Also change
anything assigned to return value of mb_find_extent, since it returns
fe_len.

changes anything that does groupno * EXT4_BLOCKS_PER_GROUP
or pa-pa_pstart + whatever to an ext4_fsblk_t

avoids 64-bit divides  modulos, and...

fixes up any related formats

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc3/fs/ext4/mballoc.c
===
--- linux-2.6.24-rc3.orig/fs/ext4/mballoc.c
+++ linux-2.6.24-rc3/fs/ext4/mballoc.c
@@ -363,7 +363,7 @@ struct ext4_free_extent {
ext4_lblk_t fe_logical;
ext4_grpblk_t fe_start;
ext4_group_t fe_group;
-   unsigned long fe_len;
+   int fe_len;
 };
 
 /*
@@ -586,14 +586,14 @@ static void mb_free_blocks_double(struct
BUG_ON(!ext4_is_group_locked(sb, e4b-bd_group));
for (i = 0; i  count; i++) {
if (!mb_test_bit(first + i, e4b-bd_info-bb_bitmap)) {
-   unsigned long blocknr;
+   ext4_fsblk_t blocknr;
blocknr = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb);
blocknr += first + i;
blocknr +=
le32_to_cpu(EXT4_SB(sb)-s_es-s_first_data_block);
 
ext4_error(sb, __FUNCTION__, double-free of inode
-   %lu's block %lu(bit %u in group %lu)\n,
+   %lu's block %llu(bit %u in group %lu)\n,
   inode ? inode-i_ino : 0, blocknr,
   first + i, e4b-bd_group);
}
@@ -1226,14 +1226,14 @@ static int mb_free_blocks(struct inode *
order = 0;
 
if (!mb_test_bit(block, EXT4_MB_BITMAP(e4b))) {
-   unsigned long blocknr;
+   ext4_fsblk_t blocknr;
blocknr = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb);
blocknr += block;
blocknr +=
le32_to_cpu(EXT4_SB(sb)-s_es-s_first_data_block);
 
ext4_error(sb, __FUNCTION__, double-free of inode
-   %lu's block %lu(bit %u in group %lu)\n,
+   %lu's block %llu(bit %u in group %lu)\n,
   inode ? inode-i_ino : 0, blocknr, block,
   e4b-bd_group);
}
@@ -1416,7 +1416,7 @@ static void ext4_mb_use_best_found(struc
struct ext4_buddy *e4b)
 {
struct ext4_sb_info *sbi = EXT4_SB(ac-ac_sb);
-   unsigned long ret;
+   int ret;
 
BUG_ON(ac-ac_b_ex.fe_group != e4b-bd_group);
BUG_ON(ac-ac_status == AC_STATUS_FOUND);
@@ -1609,10 +1609,12 @@ static int ext4_mb_find_by_goal(struct e
 ac-ac_g_ex.fe_len, ex);
 
if (max = ac-ac_g_ex.fe_len  ac-ac_g_ex.fe_len == sbi-s_stripe) {
-   unsigned long start;
-   start = (e4b-bd_group * EXT4_BLOCKS_PER_GROUP(ac-ac_sb) +
-   ex.fe_start + le32_to_cpu(es-s_first_data_block));
-   if (start % sbi-s_stripe == 0) {
+   ext4_fsblk_t start;
+
+   start = (e4b-bd_group * EXT4_BLOCKS_PER_GROUP(ac-ac_sb)) +
+   ex.fe_start + le32_to_cpu(es-s_first_data_block);
+   /* use do_div to get remainder (would be 64-bit modulo) */
+   if (do_div(start, sbi-s_stripe) == 0) {
ac-ac_found++;
ac-ac_b_ex = ex;
ext4_mb_use_best_found(ac, e4b);
@@ -1724,6 +1726,7 @@ static void ext4_mb_complex_scan_group(s
 /*
  * This is a special case for storages like raid5
  * we try to find stripe-aligned chunks for stripe-size requests
+ * XXX should do so at least for multiples of stripe size as well
  */
 static void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
 struct ext4_buddy *e4b)
@@ -1732,17 +1735,19 @@ static void ext4_mb_scan_aligned(struct 
struct ext4_sb_info *sbi = EXT4_SB(sb);
void *bitmap = EXT4_MB_BITMAP(e4b);
struct ext4_free_extent ex;
-   unsigned long i;
-   unsigned long max;
+   ext4_fsblk_t first_group_block;
+   ext4_fsblk_t a;
+   ext4_grpblk_t i;
+   int max;
 
BUG_ON(sbi-s_stripe == 0

Re: [PATCH] UPDATED: types fixup for mballoc

2008-01-03 Thread Eric Sandeen
Andreas Dilger wrote:
 On Jan 02, 2008  14:01 -0600, Eric Sandeen wrote:
 This patch:

 changes fe_len to an int, I don't think we need it to be a long,
 looking at how it's used (should it be a grpblk_t?)  Also change
 anything assigned to return value of mb_find_extent, since it returns
 fe_len.

 changes anything that does groupno * EXT4_BLOCKS_PER_GROUP
 or pa-pa_pstart + whatever to an ext4_fsblk_t

 fixes up any related formats

 The change to ext4_mb_scan_aligned to avoid a 64-bit modulo
 could use an extra look I think.

 @@ -1732,17 +1735,18 @@ static void ext4_mb_scan_aligned(struct 
  BUG_ON(sbi-s_stripe == 0);
  
 -/* find first stripe-aligned block */
 -i = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb)
 -+ le32_to_cpu(sbi-s_es-s_first_data_block);
 -i = ((i + sbi-s_stripe - 1) / sbi-s_stripe) * sbi-s_stripe;
 -i = (i - le32_to_cpu(sbi-s_es-s_first_data_block))
 -% EXT4_BLOCKS_PER_GROUP(sb);
 
 +/* find first stripe-aligned block in group */
 +a = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb)
 ++ le32_to_cpu(sbi-s_es-s_first_data_block);
 +a = (a + sbi-s_stripe - 1)  ~((ext4_fsblk_t)sbi-s_stripe - 1);
 +a = a - le32_to_cpu(sbi-s_es-s_first_data_block);
 +i = do_div(a, EXT4_BLOCKS_PER_GROUP(sb));
 
 I don't think this is correct...  This code should only be used if s_stripe
 is NOT a power-of-two value, otherwise we can just use the buddy maps to
 find aligned chunks.  As a result I don't think that  (s_stripe - 1)
 change is equivalent to / s_stripe * s_stripe.

Hmmm ok let me re-check that then.

(...curses 64-bit math trickiness...)

  while (i  sb-s_blocksize * 8) {
  if (!mb_test_bit(i, bitmap)) {
 
 Hrmmm, I thought this should be while (i  EXT4_BLOCKS_PER_GROUP(sb))
 and not sb-s_blocksize * 8?  Did that fix get lost somewhere?

I did wonder about that wondered about the magic 8 number but forgot
to comment.

Thanks,
-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] UPDATED2: types fixup for mballoc

2008-01-03 Thread Eric Sandeen
Andreas Dilger wrote:
 On Jan 03, 2008  14:10 -0600, Eric Sandeen wrote:
 @@ -1732,17 +1735,19 @@ static void ext4_mb_scan_aligned(struct 
 +/* find first stripe-aligned block in group */
 +a = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb)
  + le32_to_cpu(sbi-s_es-s_first_data_block);
 +a += sbi-s_stripe - 1;
 
 Why not just have + sbi-s_stipe - 1 on the previous line, instead of +=?

hmm good point.

 Also minor coding style nit: the + is normally at the end of the previous
 line instead of at the start of the next line, so:
 
   /* find first stripe-aligned block in group */
   a = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb) +
   le32_to_cpu(sbi-s_es-s_first_data_block) + sbi-s_stripe - 1;

Ok, that was inherited :)

 +a = (a * sbi-s_stripe) - le32_to_cpu(sbi-s_es-s_first_data_block);
 +i = do_div(a, EXT4_BLOCKS_PER_GROUP(sb));
 
 I guess this doesn't stricly need to be a modulus either, because we know
 which group this will be in and can just subtract the start.  You can just do:
  
   /* find first stripe-aligned block in this group */
   group_start = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb) +;
   le32_to_cpu(sbi-s_es-s_first_data_block);
 
   a = group_start + sbi-s_stripe - 1;
   do_div(a, sbi-s_stripe);
   a = (a * sbi-s_stripe) - group_start;

Ok, I knew I had stared at this for too long :)

4th try coming soon.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] UPDATED: types fixup for mballoc

2008-01-02 Thread Eric Sandeen
Eric Sandeen wrote:
 Eric Sandeen wrote:
   
 I ran into a potential overflow in ext4_mb_scan_aligned, 
 and went looking for others in mballoc.  This patch hits a 
 few spots, compile-tested only at this point, comments welcome.

 This patch:

 

 ... introduces a 64-bit divide.  Oops... will fix that up  resend.

   
Took a while to get back to this :)  But just used do_div return value
for remainder instead of modulo (which would have been a 64-bit modulo)

There's another do_div trick, and a bitwise round-up in there too,
now that there are more 64-bit containers...  Anyway, patch follows.

-

I ran into a potential overflow in ext4_mb_scan_aligned, 
and went looking for others in mballoc.  This patch hits a 
few spots, compile-tested only at this point, comments welcome.

This patch:

changes fe_len to an int, I don't think we need it to be a long,
looking at how it's used (should it be a grpblk_t?)  Also change
anything assigned to return value of mb_find_extent, since it returns
fe_len.

changes anything that does groupno * EXT4_BLOCKS_PER_GROUP
or pa-pa_pstart + whatever to an ext4_fsblk_t

fixes up any related formats

The change to ext4_mb_scan_aligned to avoid a 64-bit modulo
could use an extra look I think.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---


Index: linux-2.6.24-rc3/fs/ext4/mballoc.c
===
--- linux-2.6.24-rc3.orig/fs/ext4/mballoc.c
+++ linux-2.6.24-rc3/fs/ext4/mballoc.c
@@ -363,7 +363,7 @@ struct ext4_free_extent {
ext4_lblk_t fe_logical;
ext4_grpblk_t fe_start;
ext4_group_t fe_group;
-   unsigned long fe_len;
+   int fe_len;
 };
 
 /*
@@ -586,14 +586,14 @@ static void mb_free_blocks_double(struct
BUG_ON(!ext4_is_group_locked(sb, e4b-bd_group));
for (i = 0; i  count; i++) {
if (!mb_test_bit(first + i, e4b-bd_info-bb_bitmap)) {
-   unsigned long blocknr;
+   ext4_fsblk_t blocknr;
blocknr = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb);
blocknr += first + i;
blocknr +=
le32_to_cpu(EXT4_SB(sb)-s_es-s_first_data_block);
 
ext4_error(sb, __FUNCTION__, double-free of inode
-   %lu's block %lu(bit %u in group %lu)\n,
+   %lu's block %llu(bit %u in group %lu)\n,
   inode ? inode-i_ino : 0, blocknr,
   first + i, e4b-bd_group);
}
@@ -1226,14 +1226,14 @@ static int mb_free_blocks(struct inode *
order = 0;
 
if (!mb_test_bit(block, EXT4_MB_BITMAP(e4b))) {
-   unsigned long blocknr;
+   ext4_fsblk_t blocknr;
blocknr = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb);
blocknr += block;
blocknr +=
le32_to_cpu(EXT4_SB(sb)-s_es-s_first_data_block);
 
ext4_error(sb, __FUNCTION__, double-free of inode
-   %lu's block %lu(bit %u in group %lu)\n,
+   %lu's block %llu(bit %u in group %lu)\n,
   inode ? inode-i_ino : 0, blocknr, block,
   e4b-bd_group);
}
@@ -1416,7 +1416,7 @@ static void ext4_mb_use_best_found(struc
struct ext4_buddy *e4b)
 {
struct ext4_sb_info *sbi = EXT4_SB(ac-ac_sb);
-   unsigned long ret;
+   int ret;
 
BUG_ON(ac-ac_b_ex.fe_group != e4b-bd_group);
BUG_ON(ac-ac_status == AC_STATUS_FOUND);
@@ -1609,10 +1609,12 @@ static int ext4_mb_find_by_goal(struct e
 ac-ac_g_ex.fe_len, ex);
 
if (max = ac-ac_g_ex.fe_len  ac-ac_g_ex.fe_len == sbi-s_stripe) {
-   unsigned long start;
-   start = (e4b-bd_group * EXT4_BLOCKS_PER_GROUP(ac-ac_sb) +
-   ex.fe_start + le32_to_cpu(es-s_first_data_block));
-   if (start % sbi-s_stripe == 0) {
+   ext4_fsblk_t start;
+
+   start = (e4b-bd_group * EXT4_BLOCKS_PER_GROUP(ac-ac_sb)) +
+   ex.fe_start + le32_to_cpu(es-s_first_data_block);
+   /* use do_div to get remainder (would be 64-bit modulo) */
+   if (do_div(start, sbi-s_stripe) == 0) {
ac-ac_found++;
ac-ac_b_ex = ex;
ext4_mb_use_best_found(ac, e4b);
@@ -1724,6 +1726,7 @@ static void ext4_mb_complex_scan_group(s
 /*
  * This is a special case for storages like raid5
  * we try to find stripe-aligned chunks for stripe-size requests
+ * XXX should do so at least for multiples of stripe size as well
  */
 static void ext4_mb_scan_aligned

Re: [2.6.24 patch] let EXT4DEV_FS depend on BROKEN

2008-01-02 Thread Eric Sandeen
Adrian Bunk wrote:
 Most people and all distributions use CONFIG_EXPERIMENTAL=y simply 
 because too many options (including options required for hardware 
 support) depend on it.
 
 Compare e.g.:
 - Marvell SATA support (HIGHLY EXPERIMENTAL)
 - Provide NFSv4 client support (EXPERIMENTAL)
 - Ext4dev/ext4 extended fs support development (EXPERIMENTAL)

   tristate Snapshot target (EXPERIMENTAL)
   depends on BLK_DEV_DM  EXPERIMENTAL

   tristate Mirror target (EXPERIMENTAL)
   depends on BLK_DEV_DM  EXPERIMENTAL

...

It does seem that it might be a good goal to revisit options marked
EXPERIMENTAL, and see if they still should be marked as such, rather
than removing the option altogether.

init/Kconfig describes things in EXPERIMENTAL as alpha-test - I bet
there are a few things which have moved beyond this, but are still
marked as such.

-Eric


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] export iov_shorten for ext4's use

2007-12-17 Thread Eric Sandeen
(And, take 2... follow the coding style on export declarations...)

ext4 needs to deal with 2 different max file offsets for block- and 
extent-allocated file formats, whereas the s_maxbytes scheme can only deal 
with one.  So, for block-allocated files, we must catch and fix up
too-large offsets from within the filesystem.

Having iov_shorten exported allows such things as:

if (pos + length  sbi-s_bitmap_maxbytes) {
nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
  sbi-s_bitmap_maxbytes - pos);
}

to fix up too-large writes to these files in ext4_file_write().

This patch is currently living in the ext4 patch queue.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---


Index: linux-2.6.24-rc3/fs/read_write.c
===
--- linux-2.6.24-rc3.orig/fs/read_write.c
+++ linux-2.6.24-rc3/fs/read_write.c
@@ -450,6 +450,7 @@ unsigned long iov_shorten(struct iovec *
}
return seg;
 }
+EXPORT_SYMBOL(iov_shorten)
 
 ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
unsigned long nr_segs, size_t len, loff_t *ppos, iov_fn_t fn)

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext3: issue warning when bad inode found via ext3_lookup

2007-12-14 Thread Eric Sandeen
I have a hand-crafted bad filesystem image which has corruption:

[EMAIL PROTECTED] ~]# ls mnt/dir
file1  file2  file3  file4  file5
[EMAIL PROTECTED] ~]# ls mnt/dir/file4 
ls: cannot access mnt/dir/file4: No such file or directory
[EMAIL PROTECTED] ~]# ls -l mnt/dir
ls: cannot access mnt/dir/file4: No such file or directory
total 8
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file1
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file2
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file3
d? ? ??   ?? file4
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file5

e2fsck also knows it's corrupted:
Pass 2: Checking directory structure
Entry 'file4' in /dir (2049) has deleted/unused inode 13.  Clear? no

Entry 'file4' in /dir (2049) has an incorrect filetype (was 2, should be 1).
Fix? no

Pass 3: Checking directory connectivity
Unconnected directory inode 2053 (/dir/???)

BUT there are no kernel messages logged anywhere because ext3_read_inode
silently makes a bad_inode in this case, so that stale NFS filehandles
aren't noisy.  However, when we encounter such a problem after a by-name
lookup, I think a warning is appropriate, as it indicates filesystem
corruption.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
---

Index: linux-2.6.24-rc3/fs/ext3/namei.c
===
--- linux-2.6.24-rc3.orig/fs/ext3/namei.c
+++ linux-2.6.24-rc3/fs/ext3/namei.c
@@ -1049,6 +1049,9 @@ static struct dentry *ext3_lookup(struct
return ERR_PTR(-EACCES);
 
if (is_bad_inode(inode)) {
+   ext3_warning(inode-i_sb, __FUNCTION__,
+bad inode %lu in dir #%lu,
+ inode-i_ino, dir-i_ino);
iput(inode);
return ERR_PTR(-ENOENT);

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: issue warning when bad inode found via ext4_lookup

2007-12-14 Thread Eric Sandeen
I have a hand-crafted bad filesystem image which has corruption:

[EMAIL PROTECTED] ~]# ls mnt/dir
file1  file2  file3  file4  file5
[EMAIL PROTECTED] ~]# ls mnt/dir/file4 
ls: cannot access mnt/dir/file4: No such file or directory
[EMAIL PROTECTED] ~]# ls -l mnt/dir
ls: cannot access mnt/dir/file4: No such file or directory
total 8
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file1
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file2
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file3
d? ? ??   ?? file4
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 file5

e2fsck also knows it's corrupted:
Pass 2: Checking directory structure
Entry 'file4' in /dir (2049) has deleted/unused inode 13.  Clear? no

Entry 'file4' in /dir (2049) has an incorrect filetype (was 2, should be 1).
Fix? no

Pass 3: Checking directory connectivity
Unconnected directory inode 2053 (/dir/???)

BUT there are no kernel messages logged anywhere because ext4_read_inode
silently makes a bad_inode in this case, so that stale NFS filehandles
aren't noisy.  However, when we encounter such a problem after a by-name
lookup, I think a warning is appropriate, as it indicates filesystem
corruption.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
---

Index: linux-2.6.24-rc3/fs/ext4/namei.c
===
--- linux-2.6.24-rc3.orig/fs/ext4/namei.c
+++ linux-2.6.24-rc3/fs/ext4/namei.c
@@ -1050,6 +1050,9 @@ static struct dentry *ext4_lookup(struct
return ERR_PTR(-EACCES);
 
if (is_bad_inode(inode)) {
+   ext4_warning(inode-i_sb, __FUNCTION__,
+bad inode %lu in dir #%lu,
+inode-i_ino, dir-i_ino);
iput(inode);
return ERR_PTR(-ENOENT);
}

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] export iov_shorten for ext4's use

2007-12-14 Thread Eric Sandeen
ext4 needs to deal with 2 different max file offsets for block- and 
extent-allocated file formats, whereas the s_maxbytes scheme can only deal 
with one.  So, for block-allocated files, we must catch and fix up
too-large offsets from within the filesystem.

Having iov_shorten exported allows such things as:

if (pos + length  sbi-s_bitmap_maxbytes) {
nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
  sbi-s_bitmap_maxbytes - pos);
}

to fix up too-large writes to these files in ext4_file_write().

This patch is currently living in the ext4 patch queue.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc3/fs/read_write.c
===
--- linux-2.6.24-rc3.orig/fs/read_write.c
+++ linux-2.6.24-rc3/fs/read_write.c
@@ -451,6 +451,8 @@ unsigned long iov_shorten(struct iovec *
return seg;
 }
 
+EXPORT_SYMBOL(iov_shorten);
+
 ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
unsigned long nr_segs, size_t len, loff_t *ppos, iov_fn_t fn)
 {



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH, RFC] - issue warning when bad inode found via ext3_lookup

2007-12-13 Thread Eric Sandeen
I have a hand-crafted bad filesystem image which has a corrupted directory
entry:

[EMAIL PROTECTED] ~]# ls mnt/Picture
LINKS_20  OBEN_20  VORNE_20  VORN_LINKS_20  VORN_RECHTS_20
[EMAIL PROTECTED] ~]# ls mnt/Picture/VORN_LINKS_20 
ls: cannot access mnt/Picture/VORN_LINKS_20: No such file or directory
[EMAIL PROTECTED] ~]# stat mnt/Picture/VORN_LINKS_20 
stat: cannot stat `mnt/Picture/VORN_LINKS_20': No such file or directory
[EMAIL PROTECTED] ~]# ls -l mnt/Picture
ls: cannot access mnt/Picture/VORN_LINKS_20: No such file or directory
total 8
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 LINKS_20
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 OBEN_20
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 VORNE_20
d? ? ??   ?? VORN_LINKS_20
drwxr-xr-x 2 root root 1024 2007-09-04 13:36 VORN_RECHTS_20

e2fsck also knows it's corrupted:
Pass 2: Checking directory structure
Entry 'VORN_LINKS_20' in /Picture (2049) has deleted/unused inode 13.  Clear? no

Entry 'VORN_LINKS_20' in /Picture (2049) has an incorrect filetype (was 2, 
should be 1).
Fix? no

Pass 3: Checking directory connectivity
Unconnected directory inode 2053 (/Picture/???)

BUT there are no kernel messages anywhere.

I think the below makes sense; I don't think there are any instances where an
inode found via a filename passed to ext3_lookup should be returning a bad inode
without warning; NFS may pass in a stale filehandle which also makes a bad 
inode, 
but that shouldn't go through lookup...

Comments?  (if it looks good I'll resubmit for ext*_lookup)

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc3/fs/ext3/namei.c
===
--- linux-2.6.24-rc3.orig/fs/ext3/namei.c
+++ linux-2.6.24-rc3/fs/ext3/namei.c
@@ -1049,6 +1049,10 @@ static struct dentry *ext3_lookup(struct
return ERR_PTR(-EACCES);
 
if (is_bad_inode(inode)) {
+   ext3_warning(inode-i_sb, __FUNCTION__,
+  bad inode %lu for file %s in dir #%lu,
+  inode-i_ino, dentry-d_name.name,
+  dir-i_ino);
iput(inode);
return ERR_PTR(-ENOENT);
}

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] ext4: different maxbytes functions for bitmap extent files

2007-12-05 Thread Eric Sandeen
Andreas Dilger wrote:
 On Dec 04, 2007  09:58 -0600, Eric Sandeen wrote:
 +static loff_t ext4_max_size(int blkbits)
 +{
 +loff_t res;
 +loff_t upper_limit = MAX_LFS_FILESIZE;
 +
 +/* small i_blocks in vfs inode? */
 +if (sizeof(blkcnt_t)  sizeof(u64)) {
 
 It would probably be more future-proof by having a struct inode here and
 using if (sizeof(inode-i_blocks)  sizeof(__u64)).

Ok.  Honestly these were both just lifted from the original max_size
function for bitmap files :)

-Eric

 +upper_limit = (1LL  32) - 1;
 +
 +/* total blocks in file system block size */
 +upper_limit = (blkbits - 9);
 +upper_limit = blkbits;
 
 There is probably a cleaner way to do this, maybe:

   upper_limit = (1LL  (32 + 9) - 1)  ~((1  blkbits) - 1);
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] ext4: different maxbytes functions for bitmap extent files

2007-12-04 Thread Eric Sandeen
use 2 different maxbytes functions for bitmapped  extent-based
files.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc3/fs/ext4/super.c
===
--- linux-2.6.24-rc3.orig/fs/ext4/super.c
+++ linux-2.6.24-rc3/fs/ext4/super.c
@@ -1656,17 +1656,57 @@ static void ext4_orphan_cleanup (struct 
 }
 
 /*
- * Maximal file size.  There is a direct, and {,double-,triple-}indirect
+ * Maximal extent format file size.
+ * Resulting logical blkno at s_maxbytes must fit in our on-disk
+ * extent format containers, within a sector_t, and within i_blocks
+ * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
+ * so that won't be a limiting factor.
+ *
+ * Note, this does *not* consider any metadata overhead for vfs i_blocks.
+ */
+static loff_t ext4_max_size(int blkbits)
+{
+   loff_t res;
+   loff_t upper_limit = MAX_LFS_FILESIZE;
+
+   /* small i_blocks in vfs inode? */
+   if (sizeof(blkcnt_t)  sizeof(u64)) {
+   /*
+* CONFIG_LSF is not enabled implies the inode
+* i_block represent total blocks in 512 bytes
+* 32 == size of vfs inode i_blocks * 8
+*/
+   upper_limit = (1LL  32) - 1;
+
+   /* total blocks in file system block size */
+   upper_limit = (blkbits - 9);
+   upper_limit = blkbits;
+   }
+
+   /* 32-bit extent-start container, ee_block */
+   res = 1LL  32;
+   res = blkbits;
+   res -= 1;
+
+   /* Sanity check against vm-  vfs- imposed limits */
+   if (res  upper_limit)
+   res = upper_limit;
+
+   return res;
+}
+
+/*
+ * Maximal bitmap file size.  There is a direct, and {,double-,triple-}indirect
  * block limit, and also a limit of (248 - 1) 512-byte sectors in i_blocks.
  * We need to be 1 filesystem block less than the 248 sector limit.
  */
-static loff_t ext4_max_size(int bits)
+static loff_t ext4_max_bitmap_size(int bits)
 {
loff_t res = EXT4_NDIR_BLOCKS;
int meta_blocks;
loff_t upper_limit;
/* This is calculated to be the largest file size for a
-* dense, file such that the total number of
+* dense, bitmapped file such that the total number of
 * sectors in the file, including data and all indirect blocks,
 * does not exceed 248 -1
 * __u32 i_blocks_lo and _u16 i_blocks_high representing the

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] ext4: Handle different max offsets for bitmap extent-based files

2007-12-04 Thread Eric Sandeen
Basic approach: have both ext4_max_bitmap_size() and ext4_max_size()
functions to compute max offsets for both types of formats.

Use vfs sb-s_maxbytes for the native maxbytes, i.e. extent-format files.

Put the smaller bitmap limit in a new sbi-s_bitmap_maxbytes in the ext4
superblock info structure.

Catch bitmap files in ext4_file_write() and ext4_setattr() to limit
extending writes, llseeks, and truncates to too-large offsets which the
VFS let through due to the extent-format maxbytes.  On write, allow
writes up to the max, but then stop, by using iov_shorten() to limit the
size of the write to the maximum.

3 patches follow:

ext4_two_maxbytes_functions.patch - differentiate the maxbytes f'ns
ext4_bitmap_maxbytes_vfs.patch - export iov_shorten from kernel
ext4_bitmap_maxbytes.patch - store, and limit to, bitmap_maxbytes

Comments?

Thanks,
-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ext4: (V2) store maxbytes for bitmapped files and return EFBIG as appropriate

2007-12-04 Thread Eric Sandeen
Crud, minor off-by-one in ext4_file_write.  V2 below:

===

Calculate  store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc3/fs/ext4/super.c
===
--- linux-2.6.24-rc3.orig/fs/ext4/super.c
+++ linux-2.6.24-rc3/fs/ext4/super.c
@@ -1979,6 +1979,7 @@ static int ext4_fill_super (struct super
}
}
 
+   sbi-s_bitmap_maxbytes = ext4_max_bitmap_size(sb-s_blocksize_bits);
sb-s_maxbytes = ext4_max_size(sb-s_blocksize_bits);
 
if (le32_to_cpu(es-s_rev_level) == EXT4_GOOD_OLD_REV) {
Index: linux-2.6.24-rc3/fs/ext4/file.c
===
--- linux-2.6.24-rc3.orig/fs/ext4/file.c
+++ linux-2.6.24-rc3/fs/ext4/file.c
@@ -56,8 +56,29 @@ ext4_file_write(struct kiocb *iocb, cons
ssize_t ret;
int err;
 
-   ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+   /*
+* If we have encountered a bitmap-format file, the size limit
+* is smaller than s_maxbytes, which is for extent-mapped files.
+*/
 
+   if (!(EXT4_I(inode)-i_flags  EXT4_EXTENTS_FL)) {
+   struct ext4_sb_info *sbi = EXT4_SB(inode-i_sb);
+   size_t length = iov_length(iov, nr_segs);
+
+   if (pos = sbi-s_bitmap_maxbytes) {
+   if (length || pos  sbi-s_bitmap_maxbytes) {
+   return -EFBIG;
+   }
+   /* zero-length writes at maxbytes are OK */
+   }
+
+   if (pos + length  sbi-s_bitmap_maxbytes) {
+   nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
+ sbi-s_bitmap_maxbytes - pos);
+   }
+   }
+
+   ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
/*
 * Skip flushing if there was an error, or if nothing was written.
 */
Index: linux-2.6.24-rc3/fs/ext4/inode.c
===
--- linux-2.6.24-rc3.orig/fs/ext4/inode.c
+++ linux-2.6.24-rc3/fs/ext4/inode.c
@@ -309,7 +309,9 @@ static int ext4_block_to_path(struct ino
offsets[n++] = i_block  (ptrs - 1);
final = ptrs;
} else {
-   ext4_warning(inode-i_sb, ext4_block_to_path, block  big);
+   ext4_warning(inode-i_sb, ext4_block_to_path, block %u  
max,
+   i_block + direct_blocks +
+   indirect_blocks + double_blocks);
}
if (boundary)
*boundary = final - 1 - (i_block  (ptrs - 1));
@@ -3212,6 +3214,17 @@ int ext4_setattr(struct dentry *dentry, 
ext4_journal_stop(handle);
}
 
+   if (attr-ia_valid  ATTR_SIZE) {
+   if (!(EXT4_I(inode)-i_flags  EXT4_EXTENTS_FL)) {
+   struct ext4_sb_info *sbi = EXT4_SB(inode-i_sb);
+
+   if (attr-ia_size  sbi-s_bitmap_maxbytes) {
+   error = -EFBIG;
+   goto err_out;
+   }
+   }
+   }
+
if (S_ISREG(inode-i_mode) 
attr-ia_valid  ATTR_SIZE  attr-ia_size  inode-i_size) {
handle_t *handle;
Index: linux-2.6.24-rc3/include/linux/ext4_fs_sb.h
===
--- linux-2.6.24-rc3.orig/include/linux/ext4_fs_sb.h
+++ linux-2.6.24-rc3/include/linux/ext4_fs_sb.h
@@ -38,6 +38,7 @@ struct ext4_sb_info {
ext4_group_t s_groups_count;/* Number of groups in the fs */
unsigned long s_overhead_last;  /* Last calculated overhead */
unsigned long s_blocks_last;/* Last seen block count */
+   loff_t s_bitmap_maxbytes;   /* max bytes for bitmap files */
struct buffer_head * s_sbh; /* Buffer containing the super block */
struct ext4_super_block * s_es; /* Pointer to the super block in the 
buffer */
struct buffer_head ** s_group_desc;

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Check for the correct error return from ext4_ext_get_blocks

2007-12-03 Thread Eric Sandeen
Aneesh Kumar K.V wrote:
 ext4_ext_get_blocks returns negative values on error. We should
 check for  = 0
 
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index a2475d4..ce57245 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2558,8 +2558,8 @@ retry:
   ret = ext4_ext_get_blocks(handle, inode, block,
 max_blocks, map_bh,
 EXT4_CREATE_UNINITIALIZED_EXT, 0);
 - WARN_ON(!ret);
 - if (!ret) {
 + WARN_ON(ret = 0);
 + if (ret = 0) {
   ext4_error(inode-i_sb, ext4_fallocate,
  ext4_ext_get_blocks returned 0! inode#%lu
  , block=%llu, max_blocks=%llu,

Need to change the returned 0! part of the error to returned %d!
then too, I guess?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] ext4: accumulate jbd2 stats in jiffies

2007-12-03 Thread Eric Sandeen
Accumulate jbd2 stats in jiffies not msecs, per akpm's suggestion.
Convert to msecs on when displayed.

jbd2_time_diff() should still be moved to a common header file.
 
Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc3/fs/jbd2/checkpoint.c
===
--- linux-2.6.24-rc3.orig/fs/jbd2/checkpoint.c
+++ linux-2.6.24-rc3/fs/jbd2/checkpoint.c
@@ -326,7 +326,7 @@ int jbd2_log_do_checkpoint(journal_t *jo
goto out;
transaction = journal-j_checkpoint_transactions;
if (transaction-t_chp_stats.cs_chp_time == 0)
-   transaction-t_chp_stats.cs_chp_time = CURRENT_MSECS;
+   transaction-t_chp_stats.cs_chp_time = jiffies;
this_tid = transaction-t_tid;
 restart:
/*
Index: linux-2.6.24-rc3/fs/jbd2/commit.c
===
--- linux-2.6.24-rc3.orig/fs/jbd2/commit.c
+++ linux-2.6.24-rc3/fs/jbd2/commit.c
@@ -340,7 +340,7 @@ void jbd2_journal_commit_transaction(jou
commit_transaction-t_state = T_LOCKED;
 
stats.ts_wait = commit_transaction-t_max_wait;
-   stats.ts_locked = CURRENT_MSECS;
+   stats.ts_locked = jiffies;
stats.ts_running = jbd2_time_diff(commit_transaction-t_start,
stats.ts_locked);
 
@@ -414,7 +414,7 @@ void jbd2_journal_commit_transaction(jou
 */
jbd2_journal_switch_revoke_table(journal);
 
-   stats.ts_flushing = CURRENT_MSECS;
+   stats.ts_flushing = jiffies;
stats.ts_locked = jbd2_time_diff(stats.ts_locked, stats.ts_flushing);
 
commit_transaction-t_state = T_FLUSH;
@@ -508,7 +508,7 @@ void jbd2_journal_commit_transaction(jou
 */
commit_transaction-t_state = T_COMMIT;
 
-   stats.ts_logging = CURRENT_MSECS;
+   stats.ts_logging = jiffies;
stats.ts_flushing = jbd2_time_diff(stats.ts_flushing, stats.ts_logging);
stats.ts_blocks = commit_transaction-t_outstanding_credits;
stats.ts_blocks_logged = 0;
@@ -907,7 +907,7 @@ restart_loop:
 
J_ASSERT(commit_transaction-t_state == T_COMMIT);
 
-   commit_transaction-t_start = CURRENT_MSECS;
+   commit_transaction-t_start = jiffies;
stats.ts_logging = jbd2_time_diff(stats.ts_logging,
commit_transaction-t_start);
 
Index: linux-2.6.24-rc3/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc3.orig/fs/jbd2/transaction.c
+++ linux-2.6.24-rc3/fs/jbd2/transaction.c
@@ -60,7 +60,7 @@ jbd2_get_transaction(journal_t *journal,
J_ASSERT(journal-j_running_transaction == NULL);
journal-j_running_transaction = transaction;
transaction-t_max_wait = 0;
-   transaction-t_start = CURRENT_MSECS;
+   transaction-t_start = jiffies;
 
return transaction;
 }
@@ -87,7 +87,7 @@ static int start_this_handle(journal_t *
int nblocks = handle-h_buffer_credits;
transaction_t *new_transaction = NULL;
int ret = 0;
-   unsigned long ts = CURRENT_MSECS;
+   unsigned long ts = jiffies;
 
if (nblocks  journal-j_max_transaction_buffers) {
printk(KERN_ERR JBD: %s wants too many credits (%d  %d)\n,
Index: linux-2.6.24-rc3/include/linux/jbd2.h
===
--- linux-2.6.24-rc3.orig/include/linux/jbd2.h
+++ linux-2.6.24-rc3/include/linux/jbd2.h
@@ -626,16 +626,13 @@ struct transaction_stats_s
 #define ts_written u.chp.cs_written
 #define ts_dropped u.chp.cs_dropped
 
-#define CURRENT_MSECS  (jiffies_to_msecs(jiffies))
-
-static inline unsigned int
-jbd2_time_diff(unsigned int start, unsigned int end)
+static inline unsigned long
+jbd2_time_diff(unsigned long start, unsigned long end)
 {
-   if (unlikely(start  end))
-   end = end + (~0UL - start);
-   else
-   end -= start;
-   return end;
+   if (end = start)
+   return end - start;
+
+   return end + (MAX_JIFFY_OFFSET - start);
 }
 
 /**
Index: linux-2.6.24-rc3/fs/jbd2/journal.c
===
--- linux-2.6.24-rc3.orig/fs/jbd2/journal.c
+++ linux-2.6.24-rc3/fs/jbd2/journal.c
@@ -712,15 +712,19 @@ static int jbd2_seq_history_show(struct 
return 0;
}
if (ts-ts_type == JBD2_STATS_RUN)
-   seq_printf(seq, %-4s %-5lu %-5lu %-5lu %-5lu %-5lu %-5lu 
+   seq_printf(seq, %-4s %-5lu %-5u %-5u %-5u %-5u %-5u 
%-6lu %-5lu %-5lu\n, R, ts-ts_tid,
-   ts-ts_wait, ts-ts_running, ts-ts_locked,
-   ts-ts_flushing, ts-ts_logging,
+   jiffies_to_msecs(ts-ts_wait),
+   jiffies_to_msecs(ts-ts_running

[PATCH 2/4] ext4: Address various akpm jbd2 stats comments

2007-12-03 Thread Eric Sandeen
(Andrew, cc'ing you since these address your original review comments; feel free
to just pick the patch up via Ted/Mingming if you prefer)

Address several of akpm's comments on the jbd2 stats patch:

o return -ENOMEM not -EIO on memory failure
o avoid unneeded casts of void pointers
o minor formatting changes
o size bdevname char arrays with BDEVNAME_SIZE
o use #ifdef vs. #if defined() for single test

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc3/fs/jbd2/journal.c
===
--- linux-2.6.24-rc3.orig/fs/jbd2/journal.c
+++ linux-2.6.24-rc3/fs/jbd2/journal.c
@@ -747,12 +747,12 @@ static int jbd2_seq_history_open(struct 
 
s = kmalloc(sizeof(*s), GFP_KERNEL);
if (s == NULL)
-   return -EIO;
+   return -ENOMEM;
size = sizeof(struct transaction_stats_s) * journal-j_history_max;
s-stats = kmalloc(size, GFP_KERNEL);
if (s-stats == NULL) {
kfree(s);
-   return -EIO;
+   return -ENOMEM;
}
spin_lock(journal-j_history_lock);
memcpy(s-stats, journal-j_history, size);
@@ -762,7 +762,7 @@ static int jbd2_seq_history_open(struct 
 
rc = seq_open(file, jbd2_seq_history_ops);
if (rc == 0) {
-   struct seq_file *m = (struct seq_file *)file-private_data;
+   struct seq_file *m = file-private_data;
m-private = s;
} else {
kfree(s-stats);
@@ -774,8 +774,9 @@ static int jbd2_seq_history_open(struct 
 
 static int jbd2_seq_history_release(struct inode *inode, struct file *file)
 {
-   struct seq_file *seq = (struct seq_file *)file-private_data;
+   struct seq_file *seq = file-private_data;
struct jbd2_stats_proc_session *s = seq-private;
+
kfree(s-stats);
kfree(s);
return seq_release(inode, file);
@@ -802,6 +803,7 @@ static void *jbd2_seq_info_next(struct s
 static int jbd2_seq_info_show(struct seq_file *seq, void *v)
 {
struct jbd2_stats_proc_session *s = seq-private;
+
if (v != SEQ_START_TOKEN)
return 0;
seq_printf(seq, %lu transaction, each upto %u blocks\n,
@@ -847,12 +849,12 @@ static int jbd2_seq_info_open(struct ino
 
s = kmalloc(sizeof(*s), GFP_KERNEL);
if (s == NULL)
-   return -EIO;
+   return -ENOMEM;
size = sizeof(struct transaction_stats_s);
s-stats = kmalloc(size, GFP_KERNEL);
if (s-stats == NULL) {
kfree(s);
-   return -EIO;
+   return -ENOMEM;
}
spin_lock(journal-j_history_lock);
memcpy(s-stats, journal-j_stats, size);
@@ -861,7 +863,7 @@ static int jbd2_seq_info_open(struct ino
 
rc = seq_open(file, jbd2_seq_info_ops);
if (rc == 0) {
-   struct seq_file *m = (struct seq_file *)file-private_data;
+   struct seq_file *m = file-private_data;
m-private = s;
} else {
kfree(s-stats);
@@ -873,7 +875,7 @@ static int jbd2_seq_info_open(struct ino
 
 static int jbd2_seq_info_release(struct inode *inode, struct file *file)
 {
-   struct seq_file *seq = (struct seq_file *)file-private_data;
+   struct seq_file *seq = file-private_data;
struct jbd2_stats_proc_session *s = seq-private;
kfree(s-stats);
kfree(s);
@@ -892,7 +894,7 @@ static struct proc_dir_entry *proc_jbd2_
 
 static void jbd2_stats_proc_init(journal_t *journal)
 {
-   char name[64];
+   char name[BDEVNAME_SIZE];
 
snprintf(name, sizeof(name) - 1, %s, bdevname(journal-j_dev, name));
journal-j_proc_entry = proc_mkdir(name, proc_jbd2_stats);
@@ -915,7 +917,7 @@ static void jbd2_stats_proc_init(journal
 
 static void jbd2_stats_proc_exit(journal_t *journal)
 {
-   char name[64];
+   char name[BDEVNAME_SIZE];
 
snprintf(name, sizeof(name) - 1, %s, bdevname(journal-j_dev, name));
remove_proc_entry(info, journal-j_proc_entry);
@@ -2207,7 +2209,7 @@ static void __exit jbd2_remove_debugfs_e
 
 #endif
 
-#if defined(CONFIG_PROC_FS)
+#ifdef CONFIG_PROC_FS
 
 #define JBD2_STATS_PROC_NAME fs/jbd2
 

 


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] support multiple max offset limits for a single superblock

2007-11-30 Thread Eric Sandeen
Reiserfs, and now ext4, both have the possibility of encountering older on-disk
format files which do not support the s_maxbytes of the newer formats.  Reiserfs
currently has spots in various places to catch these too-large offsets and
reject them, but it's replicating a bit of code in the process.

I could do the same for ext4, but Chris Mason prodded me to think of something
more generic... this is what I came up with.  A filesystem could then define
a maxbytes i_op, and if present, it would return the max offset for that 
particular inode, based on format.

I'm not wedded to this, but thought I'd send it out for comment.

(BTW another option would be to convert old-format files when accessed, but
that has its own set of tradeoffs...)

Thanks,

-Eric

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc1/fs/buffer.c
===
--- linux-2.6.24-rc1.orig/fs/buffer.c
+++ linux-2.6.24-rc1/fs/buffer.c
@@ -2175,7 +2175,7 @@ int generic_cont_expand_simple(struct in
send_sig(SIGXFSZ, current, 0);
goto out;
}
-   if (size  inode-i_sb-s_maxbytes)
+   if (size  inode_maxbytes(inode))
goto out;
 
err = pagecache_write_begin(NULL, mapping, size, 0,
Index: linux-2.6.24-rc1/fs/open.c
===
--- linux-2.6.24-rc1.orig/fs/open.c
+++ linux-2.6.24-rc1/fs/open.c
@@ -399,7 +399,7 @@ asmlinkage long sys_fallocate(int fd, in
 
ret = -EFBIG;
/* Check for wrap through zero too */
-   if (((offset + len)  inode-i_sb-s_maxbytes) || ((offset + len)  0))
+   if (((offset + len)  inode_maxbytes(inode)) || ((offset + len)  0))
goto out_fput;
 
if (inode-i_op  inode-i_op-fallocate)
Index: linux-2.6.24-rc1/fs/read_write.c
===
--- linux-2.6.24-rc1.orig/fs/read_write.c
+++ linux-2.6.24-rc1/fs/read_write.c
@@ -45,7 +45,7 @@ loff_t generic_file_llseek(struct file *
offset += file-f_pos;
}
retval = -EINVAL;
-   if (offset=0  offset=inode-i_sb-s_maxbytes) {
+   if (offset = 0  offset = inode_maxbytes(inode)) {
if (offset != file-f_pos) {
file-f_pos = offset;
file-f_version = 0;
@@ -71,7 +71,7 @@ loff_t remote_llseek(struct file *file, 
offset += file-f_pos;
}
retval = -EINVAL;
-   if (offset=0  
offset=file-f_path.dentry-d_inode-i_sb-s_maxbytes) {
+   if (offset=0  offset=inode_maxbytes(file-f_path.dentry-d_inode)) {
if (offset != file-f_pos) {
file-f_pos = offset;
file-f_version = 0;
@@ -764,7 +764,7 @@ static ssize_t do_sendfile(int out_fd, i
goto fput_out;
 
if (!max)
-   max = min(in_inode-i_sb-s_maxbytes, 
out_inode-i_sb-s_maxbytes);
+   max = min(inode_maxbytes(in_inode), inode_maxbytes(out_inode));
 
pos = *ppos;
retval = -EINVAL;
Index: linux-2.6.24-rc1/include/linux/fs.h
===
--- linux-2.6.24-rc1.orig/include/linux/fs.h
+++ linux-2.6.24-rc1/include/linux/fs.h
@@ -1217,8 +1217,17 @@ struct inode_operations {
void (*truncate_range)(struct inode *, loff_t, loff_t);
long (*fallocate)(struct inode *inode, int mode, loff_t offset,
  loff_t len);
+   unsigned long long (*maxbytes)(struct inode *inode);
 };
 
+static inline unsigned long long inode_maxbytes(struct inode *inode)
+{
+   if (unlikely(inode-i_op-maxbytes)) {
+   return inode-i_op-maxbytes(inode);
+   } else
+   return inode-i_sb-s_maxbytes;
+}
+
 struct seq_file;
 
 ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
Index: linux-2.6.24-rc1/mm/filemap.c
===
--- linux-2.6.24-rc1.orig/mm/filemap.c
+++ linux-2.6.24-rc1/mm/filemap.c
@@ -1827,15 +1827,15 @@ inline int generic_write_checks(struct f
 * Linus frestrict idea will clean these up nicely..
 */
if (likely(!isblk)) {
-   if (unlikely(*pos = inode-i_sb-s_maxbytes)) {
-   if (*count || *pos  inode-i_sb-s_maxbytes) {
+   if (unlikely(*pos = inode_maxbytes(inode))) {
+   if (*count || *pos  inode_maxbytes(inode)) {
return -EFBIG;
}
-   /* zero-length writes at -s_maxbytes are OK */
+   /* zero-length writes at inode_maxbytes are OK */
}
 
-   if (unlikely(*pos + *count  inode-i_sb-s_maxbytes))
-   *count = inode-i_sb-s_maxbytes - *pos;
+   if (unlikely(*pos + *count

Re: Coverity testing freely available

2007-11-30 Thread Eric Sandeen
Andreas Dilger wrote:
 Hi,
 I've recently found scan.coverity.com, which allows OSS-licensed projects
 to be run under Coverity to find defects available in the code.  I think
 that e2fsprogs should be included there, even if some people run e2fsprogs
 under Coverity separately.  This allows the e2fsprogs developers to find
 and fix the bugs more easily.


I don't see e2fsprogs on the website yet.  Anyone care if I go ahead 
ask that e2fsprogs be included?

-Eric

 Cheers, Andreas
 --
 Andreas Dilger
 Principal Software Engineer
 Cluster File Systems, Inc.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-11-30 Thread Eric Sandeen
Mingming Cao wrote:
 [PATCH] jbd2 stats through procfs
 
 The patch below updates the jbd stats patch to 2.6.20/jbd2.
 The initial patch was posted by Alex Tomas in December 2005
 (http://marc.info/?l=linux-ext4m=113538565128617w=2).
 It provides statistics via procfs such as transaction lifetime and size.
 
 [ This probably should be rewritten to use debugfs?   -- Ted]
 
 Signed-off-by: Johann Lombardi [EMAIL PROTECTED]

I've started going through this one to clean it up to the point where it
can go forward.  It's been sitting at the top of the unstable portion of
the patch queue for long enough, I think :)

I've converted the msecs to jiffies until the user boundary, changed the
union #defines as suggested by Andrew, and various other little issues etc.

Remaining to do is a generic time-difference calculator (instead of
jbd2_time_diff), and looking into whether it should be made a config
option; I tend to think it should, but it's fairly well sprinkled
through the code, so I'll see how well that works.

Also did we ever decided if this should go to debugfs?

Thanks,

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix up vfs inode version patch for modules

2007-11-28 Thread Eric Sandeen
ext4 calls inode_inc_iversion(), but it's not exported, so modular ext4
has trouble.  Any reason not to just make it an inline, as below?

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc1/fs/inode.c
===
--- linux-2.6.24-rc1.orig/fs/inode.c
+++ linux-2.6.24-rc1/fs/inode.c
@@ -1243,23 +1243,6 @@ void touch_atime(struct vfsmount *mnt, s
 EXPORT_SYMBOL(touch_atime);
 
 /**
- * inode_inc_iversion  -   increments i_version
- * @inode: inode that need to be updated
- *
- * Every time the inode is modified, the i_version field
- * will be incremented.
- * The filesystem has to be mounted with i_version flag
- *
- */
-
-void inode_inc_iversion(struct inode *inode)
-{
-   spin_lock(inode-i_lock);
-   inode-i_version++;
-   spin_unlock(inode-i_lock);
-}
-
-/**
  * file_update_time-   update mtime and ctime time
  * @file: file accessed
  *
Index: linux-2.6.24-rc1/include/linux/fs.h
===
--- linux-2.6.24-rc1.orig/include/linux/fs.h
+++ linux-2.6.24-rc1/include/linux/fs.h
@@ -1396,7 +1396,21 @@ static inline void inode_dec_link_count(
mark_inode_dirty(inode);
 }
 
-extern void inode_inc_iversion(struct inode *inode);
+/**
+ * inode_inc_iversion - increments i_version
+ * @inode: inode that need to be updated
+ *
+ * Every time the inode is modified, the i_version field will be incremented.
+ * The filesystem has to be mounted with i_version flag
+ */
+
+static inline void inode_inc_iversion(struct inode *inode)
+{
+   spin_lock(inode-i_lock);
+   inode-i_version++;
+   spin_unlock(inode-i_lock);
+}
+
 extern void touch_atime(struct vfsmount *mnt, struct dentry *dentry);
 static inline void file_accessed(struct file *file)
 {

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: fix up EXT4FS_DEBUG builds

2007-11-26 Thread Eric Sandeen
Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail
without these changes.  Clean up some format warnings too.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc1/fs/ext4/balloc.c
===
--- linux-2.6.24-rc1.orig/fs/ext4/balloc.c
+++ linux-2.6.24-rc1/fs/ext4/balloc.c
@@ -1579,7 +1579,7 @@ ext4_fsblk_t ext4_new_blocks_old(handle_
 
sbi = EXT4_SB(sb);
es = EXT4_SB(sb)-s_es;
-   ext4_debug(goal=%lu.\n, goal);
+   ext4_debug(goal=%llu.\n, goal);
/*
 * Allocate a block from reservation only when
 * filesystem is mounted with reservation(default,-o reservation), and
@@ -1689,7 +1689,7 @@ retry_alloc:
 
 allocated:
 
-   ext4_debug(using block group %d(%d)\n,
+   ext4_debug(using block group %lu(%d)\n,
group_no, gdp-bg_free_blocks_count);
 
BUFFER_TRACE(gdp_bh, get_write_access);
@@ -1880,7 +1880,7 @@ ext4_fsblk_t ext4_count_free_blocks(stru
brelse(bitmap_bh);
printk(ext4_count_free_blocks: stored = %llu
, computed = %llu, %llu\n,
-  EXT4_FREE_BLOCKS_COUNT(es),
+   ext4_free_blocks_count(es),
desc_count, bitmap_count);
return bitmap_count;
 #else
Index: linux-2.6.24-rc1/fs/ext4/ialloc.c
===
--- linux-2.6.24-rc1.orig/fs/ext4/ialloc.c
+++ linux-2.6.24-rc1/fs/ext4/ialloc.c
@@ -857,7 +857,7 @@ unsigned long ext4_count_free_inodes (st
continue;
 
x = ext4_count_free(bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8);
-   printk(group %d: stored = %d, counted = %lu\n,
+   printk(group %lu: stored = %d, counted = %lu\n,
i, le16_to_cpu(gdp-bg_free_inodes_count), x);
bitmap_count += x;
}
Index: linux-2.6.24-rc1/fs/ext4/mballoc.c
===
--- linux-2.6.24-rc1.orig/fs/ext4/mballoc.c
+++ linux-2.6.24-rc1/fs/ext4/mballoc.c
@@ -2953,9 +2953,6 @@ static int ext4_mb_mark_diskspace_used(s
sbi = EXT4_SB(sb);
es = sbi-s_es;
 
-   ext4_debug(using block group %d(%d)\n, ac-ac_b_group.group,
-   gdp-bg_free_blocks_count);
-
err = -EIO;
bitmap_bh = read_block_bitmap(sb, ac-ac_b_ex.fe_group);
if (!bitmap_bh)
@@ -2970,6 +2967,9 @@ static int ext4_mb_mark_diskspace_used(s
if (!gdp)
goto out_err;
 
+   ext4_debug(using block group %lu(%d)\n, ac-ac_b_ex.fe_group,
+   gdp-bg_free_blocks_count);
+
err = ext4_journal_get_write_access(handle, gdp_bh);
if (err)
goto out_err;
Index: linux-2.6.24-rc1/fs/ext4/resize.c
===
--- linux-2.6.24-rc1.orig/fs/ext4/resize.c
+++ linux-2.6.24-rc1/fs/ext4/resize.c
@@ -206,7 +206,7 @@ static int setup_new_group_blocks(struct
}
 
if (ext4_bg_has_super(sb, input-group)) {
-   ext4_debug(mark backup superblock %#04lx (+0)\n, start);
+   ext4_debug(mark backup superblock %#04llx (+0)\n, start);
ext4_set_bit(0, bh-b_data);
}
 
@@ -215,7 +215,7 @@ static int setup_new_group_blocks(struct
 i  gdblocks; i++, block++, bit++) {
struct buffer_head *gdb;
 
-   ext4_debug(update backup group %#04lx (+%d)\n, block, bit);
+   ext4_debug(update backup group %#04llx (+%d)\n, block, bit);
 
if ((err = extend_or_restart_transaction(handle, 1, bh)))
goto exit_bh;
@@ -243,7 +243,7 @@ static int setup_new_group_blocks(struct
 i  reserved_gdb; i++, block++, bit++) {
struct buffer_head *gdb;
 
-   ext4_debug(clear reserved block %#04lx (+%d)\n, block, bit);
+   ext4_debug(clear reserved block %#04llx (+%d)\n, block, bit);
 
if ((err = extend_or_restart_transaction(handle, 1, bh)))
goto exit_bh;
@@ -256,10 +256,10 @@ static int setup_new_group_blocks(struct
ext4_set_bit(bit, bh-b_data);
brelse(gdb);
}
-   ext4_debug(mark block bitmap %#04x (+%ld)\n, input-block_bitmap,
+   ext4_debug(mark block bitmap %#04llx (+%llu)\n, input-block_bitmap,
   input-block_bitmap - start);
ext4_set_bit(input-block_bitmap - start, bh-b_data);
-   ext4_debug(mark inode bitmap %#04x (+%ld)\n, input-inode_bitmap,
+   ext4_debug(mark inode bitmap %#04llx (+%llu)\n, input-inode_bitmap,
   input-inode_bitmap - start);
ext4_set_bit(input-inode_bitmap - start, bh-b_data);
 
@@ -268,7 +268,7 @@ static int setup_new_group_blocks(struct
 i  sbi-s_itb_per_group; i++, bit++, block++) {
struct buffer_head

Re: e2fsprogs-interim scm tree?

2007-11-20 Thread Eric Sandeen
Theodore Tso wrote:
 On Tue, Nov 20, 2007 at 11:24:03AM +0100, Christian Kujau wrote:
 Note that the userspace code still needs a lot of work.
 It's for this reason that I haven't been recommending people use it for
 production systems just yet.
 Understood. But the more ppl testing your stuff the better, right?
 
 Oh, absolutely.  I just don't think it's fair to encourage people to
 use something that might cause them to risk their data unless they are
 going into it with their eyes wide open.

Please do report any problems you find, though.  Ted has said that the
sourceforge bugtracker is the right place to do this for now.

-Eric (remembering he has a bug to report, too)
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] types fixup for mballoc

2007-11-15 Thread Eric Sandeen
I ran into a potential overflow in ext4_mb_scan_aligned, 
and went looking for others in mballoc.  This patch hits a 
few spots, compile-tested only at this point, comments welcome.

This patch:

changes fe_len to an int, I don't think we need it to be a long,
looking at how it's used (should it be a grpblk_t?)  Also change
anything assigned to return value of mb_find_extent, since it returns
fe_len.

changes anything that does groupno * EXT4_BLOCKS_PER_GROUP
or pa-pa_pstart + whatever to an ext4_fsblk_t

fixes up any related formats

The change to ext4_mb_scan_aligned to add a modulo to avoid an overflow
may be too clever... it could use an extra look I think.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

---

Index: linux-2.6.24-rc1/fs/ext4/mballoc.c
===
--- linux-2.6.23.orig/fs/ext4/mballoc.c
+++ linux-2.6.23/fs/ext4/mballoc.c
@@ -363,7 +363,7 @@ struct ext4_free_extent {
ext4_lblk_t fe_logical;
ext4_grpblk_t fe_start;
ext4_grpnum_t fe_group;
-   unsigned long fe_len;
+   int fe_len;
 };
 
 /*
@@ -586,14 +586,14 @@ static void mb_free_blocks_double(struct
BUG_ON(!ext4_is_group_locked(sb, e4b-bd_group));
for (i = 0; i  count; i++) {
if (!mb_test_bit(first + i, e4b-bd_info-bb_bitmap)) {
-   unsigned long blocknr;
+   ext4_fsblk_t blocknr;
blocknr = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb);
blocknr += first + i;
blocknr +=
le32_to_cpu(EXT4_SB(sb)-s_es-s_first_data_block);
 
ext4_error(sb, __FUNCTION__, double-free of inode
-   %lu's block %lu(bit %u in group %lu)\n,
+   %lu's block %llu(bit %u in group %lu)\n,
   inode ? inode-i_ino : 0, blocknr,
   first + i, e4b-bd_group);
}
@@ -1226,14 +1226,14 @@ static int mb_free_blocks(struct inode *
order = 0;
 
if (!mb_test_bit(block, EXT4_MB_BITMAP(e4b))) {
-   unsigned long blocknr;
+   ext4_fsblk_t blocknr;
blocknr = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb);
blocknr += block;
blocknr +=
le32_to_cpu(EXT4_SB(sb)-s_es-s_first_data_block);
 
ext4_error(sb, __FUNCTION__, double-free of inode
-   %lu's block %lu(bit %u in group %lu)\n,
+   %lu's block %llu(bit %u in group %lu)\n,
   inode ? inode-i_ino : 0, blocknr, block,
   e4b-bd_group);
}
@@ -1416,7 +1416,7 @@ static void ext4_mb_use_best_found(struc
struct ext4_buddy *e4b)
 {
struct ext4_sb_info *sbi = EXT4_SB(ac-ac_sb);
-   unsigned long ret;
+   int ret;
 
BUG_ON(ac-ac_b_ex.fe_group != e4b-bd_group);
BUG_ON(ac-ac_status == AC_STATUS_FOUND);
@@ -1609,7 +1609,7 @@ static int ext4_mb_find_by_goal(struct e
 ac-ac_g_ex.fe_len, ex);
 
if (max = ac-ac_g_ex.fe_len  ac-ac_g_ex.fe_len == sbi-s_stripe) {
-   unsigned long start;
+   ext4_fsblk_t start;
start = (e4b-bd_group * EXT4_BLOCKS_PER_GROUP(ac-ac_sb) +
ex.fe_start + le32_to_cpu(es-s_first_data_block));
if (start % sbi-s_stripe == 0) {
@@ -1732,13 +1732,14 @@ static void ext4_mb_scan_aligned(struct 
struct ext4_sb_info *sbi = EXT4_SB(sb);
void *bitmap = EXT4_MB_BITMAP(e4b);
struct ext4_free_extent ex;
-   unsigned long i;
-   unsigned long max;
+   ext4_grpblk_t i;
+   int max;
 
BUG_ON(sbi-s_stripe == 0);
 
-   /* find first stripe-aligned block */
-   i = e4b-bd_group * EXT4_BLOCKS_PER_GROUP(sb)
+   /* find first stripe-aligned block in group */
+   /* early modulo here to avoid 32-bit overflow */
+   i = (e4b-bd_group % sbi-s_stripe)  * EXT4_BLOCKS_PER_GROUP(sb)
+ le32_to_cpu(sbi-s_es-s_first_data_block);
i = ((i + sbi-s_stripe - 1) / sbi-s_stripe) * sbi-s_stripe;
i = (i - le32_to_cpu(sbi-s_es-s_first_data_block))
@@ -2026,35 +2027,35 @@ static int ext4_mb_seq_history_show(stru
if (hs-op == EXT4_MB_HISTORY_ALLOC) {
fmt = %-5u %-8u %-23s %-23s %-23s %-5u %-5u %-2u 
%-5u %-5s %-5u %-6u\n;
-   sprintf(buf2, %lu/%d/[EMAIL PROTECTED], hs-result.fe_group,
+   sprintf(buf2, %lu/%d/[EMAIL PROTECTED], hs-result.fe_group,
hs-result.fe_start, hs-result.fe_len,
-   (unsigned long)hs-result.fe_logical

Re: [PATCH] types fixup for mballoc

2007-11-15 Thread Eric Sandeen
Eric Sandeen wrote:
 I ran into a potential overflow in ext4_mb_scan_aligned, 
 and went looking for others in mballoc.  This patch hits a 
 few spots, compile-tested only at this point, comments welcome.
 
 This patch:
 

... introduces a 64-bit divide.  Oops... will fix that up  resend.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


delalloc space accounting problem.

2007-11-08 Thread Eric Sandeen
It appears that delalloc lets me copy 50M of data onto a 30M filesystem;
at least I never get ENOSPC back, although I wind up with several files
that have 1M length but 0 blocks.

I've filed a bug in the kernel bug tracker, I think we could use a
central place to track issues:

http://bugzilla.kernel.org/show_bug.cgi?id=9329

I'll try to find time to look into this unless someone knows offhand
where the problem is...

Thanks,
-Eric

(p.s. should I get ext[234] bug mail routed to this list, or would that
be annoying?)
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix check_mntent_file() to pass mode for open(O_CREAT)

2007-11-08 Thread Eric Sandeen
Andreas Dilger wrote:
 On my FC8 install, ismounted.c fails to build because open(O_CREAT) is
 used without passing a mode.  The following trivial patch fixes it.

You can add:

Acked-by: Eric Sandeen [EMAIL PROTECTED]

'cause it's an awful lot like the patch I sent for the same issue back
on 8/16  ;-)  Guess I should have followed that up with a ping.  (though
your 0600 mode is probably better than my 0644 was)

Andreas, did you also run into trouble with struct_io_manager's -open
calls triggering this test?

I sent a patch for that,

[PATCH] rename -open and -close ops in struct_io_manager

too... maybe the glibc #define tricks got smarter and don't trigger that
now?

-Eric

 Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
 
 Index: e2fsprogs-1.40.2/lib/ext2fs/ismounted.c
 ===
 --- e2fsprogs-1.40.2.orig/lib/ext2fs/ismounted.c
 +++ e2fsprogs-1.40.2/lib/ext2fs/ismounted.c
 @@ -147,7 +147,7 @@ static errcode_t check_mntent_file(const
  is_root:
  #define TEST_FILE /.ismount-test-file  
   *mount_flags |= EXT2_MF_ISROOT;
 - fd = open(TEST_FILE, O_RDWR|O_CREAT);
 + fd = open(TEST_FILE, O_RDWR|O_CREAT, 0600);
   if (fd  0) {
   if (errno == EROFS)
   *mount_flags |= EXT2_MF_READONLY;
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Software Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-07 Thread Eric Sandeen
Andreas Dilger wrote:
 On Nov 06, 2007  13:54 -0600, Eric Sandeen wrote:
 Hmm bad news is when I add uninit_groups into the mix, it goes a little
 south again, with some out-of-order extents.  Not the end of the world,
 but a little unexpected?

 I think part of the issue is that by default the groups marked BLOCK_UNINIT
 are skipped, to avoid dirtying those groups if they have never been used
 before.  This policy could be changed in the mballoc code pretty easily if
 you think it is a net loss.  Note that the size of the extents is large
 enough (120MB or more) that some small reordering is probably not going
 to affect the performance in any meaningful way.

You're probably right; on the other hand, this is about the simplest
test an allocator could wish for - a single-threaded large linear write
in big IO chunks.

In this case it's probably not a big deal; I do wonder how it might
affect the bigger picture though, with more writing threads, aged
filesystems, and the like.  Just thought it was worth pointing out, as I
started looking at allocator behavior in the simple/isolated/unrealistic
:) cases.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix oops on corrupted ext4 mount

2007-11-07 Thread Eric Sandeen
Eric Sandeen wrote:
 When mounting an ext4 filesystem with corrupted s_first_data_block, things 
 can go very wrong and oops.
 
 Because blocks_count in ext4_fill_super is a u64, and we must use do_div, 
 the calculation of db_count is done differently than on ext4.

Urgh... than on ext3

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
I tried ext4 vs. xfs doing 4 parallel 2G IO writes in 1M units to 4
different subdirectories of the root of the filesystem:

http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_threads.png

and then read them back sequentially:

http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads_read.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads_read.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_read_threads.png

At the end of the write, ext4 had on the order of 400 extents/file, xfs
had on the order of 30 extents/file.  It's clear especially from the
read graph that ext4 is interleaving the 4 files, in about 5M chunks on
average.  Throughput seems comparable between ext4  xfs nonetheless.

Again this was on a decent HW raid so seek penalties are probably not
too bad.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
Andreas Dilger wrote:

 The question is what the best result is for this kind of workload?
 In HPC applications the common case is that you will also have the data
 files read back in parallel instead of serially.

Agreed, I'm not trying to argue what's better or worse, I'm just seeing
what it's doing.

The main reason I did sequential reads back is that it more clearly
shows the file layout for each file on the graph.  :)  I'm just getting
a handle on how the allocations are going for various types of writes.

 The test shows ext4 finishing marginally faster in the write case, and
 marginally slower in the read case.  What happens if you have 4 parallel
 readers?

I'll test that a bit later (have to run now); I expect parallel readers
may go faster, since the blocks are interleaved, and it might be able to
suck them up pretty much in order across all 4 files.

I'd also like to test some of this under a single head, rather than on
HW raid...

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
Shapor Naghibzadeh wrote:
 On Wed, Nov 07, 2007 at 04:42:59PM -0600, Eric Sandeen wrote:
 Again this was on a decent HW raid so seek penalties are probably not
 too bad.
 
 You may want to verify that by doing a benchmark on the raw device.  I
 recently did some benchmarks doing random I/O on a Dell 2850 w/ a PERC
 (megaraid) RAID5 w/ 128MB onboard writeback cache and 6x 15krpm drives
 and noticed appoximately one order of magnitude throughput drop on
 small (stripe-sized) random reads versus linear.  It maxed out at ~100
 random read IOPs or seeks/sec (suprisingly low).
 
 Out of curiousity, how are you counting the seeks?

Chris Mason's seekwatcher (google can find it for you) is doing the
graphing, it uses blocktrace for the raw data.

-Eric

 Shapor

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
Andreas Dilger wrote:

 The test shows ext4 finishing marginally faster in the write case, and
 marginally slower in the read case.  What happens if you have 4 parallel
 readers?

http://people.redhat.com/esandeen/seekwatcher/ext4_4_thread_par_read.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_thread_par_read.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_thread_par_read.png

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-06 Thread Eric Sandeen
Eric Sandeen wrote:
 Alex Tomas wrote:
 thanks for the data. the attached patch should fix couple issues:
 broken history output and policy in that smaller usable chunk to
 be used. can you give it a spin, please?
 
 Hi Alex -
 
 This looks *much* better:

Hmm bad news is when I add uninit_groups into the mix, it goes a little
south again, with some out-of-order extents.  Not the end of the world,
but a little unexpected?


Discontinuity: Block 1430784 is at 24183810 (was 24181761)
Discontinuity: Block 1461760 is at 24216578 (was 24214785)
Discontinuity: Block 1492480 is at 37888 (was 24247297)
Discontinuity: Block 1519616 is at 850944 (was 65023)
Discontinuity: Block 1520640 is at 883712 (was 851967)
Discontinuity: Block 1521664 is at 1670144 (was 884735)
Discontinuity: Block 1522688 is at 2685952 (was 1671167)
Discontinuity: Block 1523712 is at 4226048 (was 2686975)
Discontinuity: Block 1524736 is at 11271168 (was 4227071)
Discontinuity: Block 1525760 is at 23952384 (was 11272191)
...

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-05 Thread Eric Sandeen
Alex Tomas wrote:
 thanks for the data. the attached patch should fix couple issues:
 broken history output and policy in that smaller usable chunk to
 be used. can you give it a spin, please?

Hi Alex -

This looks *much* better:

First block: 100355
Last block: 2342657
Discontinuity: Block 30208 is at 133120 (was 130562)
Discontinuity: Block 60928 is at 165891 (was 163839)
Discontinuity: Block 90368 is at 197634 (was 195330)
Discontinuity: Block 121856 is at 232448 (was 229121)
Discontinuity: Block 150528 is at 263170 (was 261119)
Discontinuity: Block 181504 is at 296963 (was 294145)
Discontinuity: Block 210944 is at 328706 (was 326402)
Discontinuity: Block 241664 is at 361474 (was 359425)
Discontinuity: Block 272896 is at 395264 (was 392705)
Discontinuity: Block 303360 is at 427010 (was 425727)
Discontinuity: Block 334080 is at 459778 (was 457729)
Discontinuity: Block 365568 is at 493568 (was 491265)
Discontinuity: Block 395264 is at 525314 (was 523263)
Discontinuity: Block 426240 is at 558082 (was 556289)


for an 8192x1M (8G) buffered dd, I now get 88 extents, in order.

Do you want the mballoc history?

Now I can try some multithreaded tests :)

Thanks,
-Eric

 thanks, Alex
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-05 Thread Eric Sandeen
Eric Sandeen wrote:
 Alex Tomas wrote:
 thanks for the data. the attached patch should fix couple issues:
 broken history output and policy in that smaller usable chunk to
 be used. can you give it a spin, please?
 
 Hi Alex -
 
 This looks *much* better:

(nb: that's running w/ both of the patches you sent on this thread)

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-05 Thread Eric Sandeen
Andreas Dilger wrote:

 On a related note - the FIEMAP patches to filefrag also include a new
 output format that is much more useful, IMHO.  The new format is like:
 
 {filename}
 ext: [logical start.. end kB]: phys start..end kB:   kB:lun: flags
   0: [ 0.. 30207]: 401416..522251:   120828: 0 :
   1: [ 30208.. 60927]: 532480..655359:   122880: 0 :
   2: [ 60928..121855]: 790536..916484:   125948: 0 :
 
 Hopefully Kalpak will be able to post the updated patches here soon.

yep, I hacked existing filefrag to do something like this, the existing
format is pretty hard to glance over :)

One thing I like about xfs_bmap is that it can tell you which Allocation
Group the blocks are in; most filesystems have some concept of
sub-regions of the filesystem, such as BGs or resource groups or whatnot
- do you think there is room for this in the FIEMAP interface?  Hm, or
should this just be calculated from knowing the size of the sub-regions...

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] allow tune2fs to set/clear resize_inode

2007-11-05 Thread Eric Sandeen
(this patch has been carried in Red Hat / Fedora rpms for a while, not 
sure why it never got sent upstream... clearing this allows for mounting 
filesystems with a resize_inode on older systems)

-

Allow tune2fs to set  clear resize_inode; requires fsck afterwards.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
Addresses-Red-Hat-Bugzilla: #167816

Index: e2fsprogs-git/misc/tune2fs.c
===
--- e2fsprogs-git.orig/misc/tune2fs.c
+++ e2fsprogs-git/misc/tune2fs.c
@@ -96,6 +96,7 @@ static void usage(void)
 
 static __u32 ok_features[3] = {
EXT3_FEATURE_COMPAT_HAS_JOURNAL |
+   EXT2_FEATURE_COMPAT_RESIZE_INODE |
EXT2_FEATURE_COMPAT_DIR_INDEX,  /* Compat */
EXT2_FEATURE_INCOMPAT_FILETYPE, /* Incompat */
EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER /* R/O compat */
@@ -283,6 +284,7 @@ static void update_feature_set(ext2_fils
 {
int sparse, old_sparse, filetype, old_filetype;
int journal, old_journal, dxdir, old_dxdir;
+   int resize_inode, old_resize_inode;
struct ext2_super_block *sb= fs-super;
__u32   old_compat, old_incompat, old_ro_compat;
 
@@ -298,6 +300,8 @@ static void update_feature_set(ext2_fils
EXT3_FEATURE_COMPAT_HAS_JOURNAL;
old_dxdir = sb-s_feature_compat 
EXT2_FEATURE_COMPAT_DIR_INDEX;
+   old_resize_inode = sb-s_feature_compat 
+   EXT2_FEATURE_COMPAT_RESIZE_INODE;
if (e2p_edit_feature(features, sb-s_feature_compat,
 ok_features)) {
fprintf(stderr, _(Invalid filesystem option set: %s\n),
@@ -312,6 +316,8 @@ static void update_feature_set(ext2_fils
EXT3_FEATURE_COMPAT_HAS_JOURNAL;
dxdir = sb-s_feature_compat 
EXT2_FEATURE_COMPAT_DIR_INDEX;
+   resize_inode = sb-s_feature_compat 
+   EXT2_FEATURE_COMPAT_RESIZE_INODE;
if (old_journal  !journal) {
if ((mount_flags  EXT2_MF_MOUNTED) 
!(mount_flags  EXT2_MF_READONLY)) {
@@ -358,7 +364,8 @@ static void update_feature_set(ext2_fils
 sb-s_feature_incompat))
ext2fs_update_dynamic_rev(fs);
if ((sparse != old_sparse) ||
-   (filetype != old_filetype)) {
+   (filetype != old_filetype) ||
+   (resize_inode != old_resize_inode)) {
sb-s_state = ~EXT2_VALID_FS;
printf(\n%s\n, _(please_fsck));
}
Index: e2fsprogs-git/misc/tune2fs.8.in
===
--- e2fsprogs-git.orig/misc/tune2fs.8.in
+++ e2fsprogs-git/misc/tune2fs.8.in
@@ -392,12 +392,16 @@ option.
 .TP
 .B sparse_super
 Limit the number of backup superblocks to save space on large filesystems.
+.TP
+.B resize_inode
+Reserve space so the block group descriptor table may grow in the future.
 .RE
 .IP
 After setting or clearing 
-.B sparse_super
-and 
-.B filetype 
+.BR sparse_super ,
+.BR filetype ,
+or
+.B resize_inode
 filesystem features,
 .BR e2fsck (8)
 must be run on the filesystem to return the filesystem to a consistent state.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH e2fsprogs] - remove timestamps from .po files

2007-11-05 Thread Eric Sandeen
Another one that's been in RH/Fedora specfiles a while.
Remove timestamps from .po files to avoid multilib conflicts.
It ain't pretty but it works.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
Addresses-Red-Hat-Bugzilla: #245653

Index: e2fsprogs-git/po/Makefile.in.in
===
--- e2fsprogs-git.orig/po/Makefile.in.in
+++ e2fsprogs-git/po/Makefile.in.in
@@ -118,10 +118,12 @@ $(DOMAIN).pot-update: $(POTFILES) $(srcd
  rm -f $(DOMAIN).1po $(DOMAIN).2po $(DOMAIN).po; \
else \
  rm -f $(DOMAIN).1po $(DOMAIN).2po $(srcdir)/$(DOMAIN).pot  \
- mv $(DOMAIN).po $(srcdir)/$(DOMAIN).pot; \
+ sed -f remove-potcdate.sed  $(DOMAIN).po  
$(srcdir)/$(DOMAIN).pot  \
+ rm -f $(DOMAIN).po; \
fi; \
  else \
-   mv $(DOMAIN).po $(srcdir)/$(DOMAIN).pot; \
+   sed -f remove-potcdate.sed  $(DOMAIN).po  $(srcdir)/$(DOMAIN).pot 
 \
+   rm -f $(DOMAIN).po; \
  fi; \
}
 



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC e2fsprogs] - replace MKINSTALLDIRS with MKDIR_P for newer autotools

2007-11-05 Thread Eric Sandeen
From: Thomas Woerner [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]

(Thomas was the original author of this patch)

I do not claim to be an autotools guru, hence the RFC.
RH bugzilla #220715 claims that newer autoconf/automake no longer
support MKINSTALLDIRS, and the changelog seems to confirm that, as
does testing.  :)

The following is the patch we've been carrying for Fedora:

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
Addresses-Red-Hat-Bugzilla: #220715

---

Index: e2fsprogs-git/debugfs/Makefile.in
===
--- e2fsprogs-git.orig/debugfs/Makefile.in
+++ e2fsprogs-git/debugfs/Makefile.in
@@ -48,8 +48,8 @@ debugfs.8: $(DEP_SUBSTITUTE) $(srcdir)/d
@$(SUBSTITUTE_UPTIME) $(srcdir)/debugfs.8.in debugfs.8
 
 installdirs:
-   @echo  MKINSTALLDIRS $(root_sbindir) $(man8dir)
-   @$(MKINSTALLDIRS) $(DESTDIR)$(root_sbindir) \
+   @echo  MKDIR_P $(root_sbindir) $(man8dir)
+   @$(MKDIR_P) $(DESTDIR)$(root_sbindir) \
$(DESTDIR)$(man8dir)
 
 install: $(PROGS) $(MANPAGES) installdirs
Index: e2fsprogs-git/intl/Makefile.in
===
--- e2fsprogs-git.orig/intl/Makefile.in
+++ e2fsprogs-git/intl/Makefile.in
@@ -40,8 +40,8 @@ subdir = intl
 
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
-MKINSTALLDIRS = @MKINSTALLDIRS@
-mkinstalldirs = $(SHELL) $(MKINSTALLDIRS)
+MKDIR_P = @MKDIR_P@
+mkdir_p = @MKDIR_P@
 
 l = @INTL_LIBTOOL_SUFFIX_PREFIX@
 
Index: e2fsprogs-git/po/Makefile.in.in
===
--- e2fsprogs-git.orig/po/Makefile.in.in
+++ e2fsprogs-git/po/Makefile.in.in
@@ -26,11 +26,10 @@ datarootdir = @datarootdir@
 datadir = @datadir@
 localedir = $(datadir)/locale
 gettextsrcdir = $(datadir)/gettext/po
+mkdir_p = @MKDIR_P@
 
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
-MKINSTALLDIRS = @MKINSTALLDIRS@
-mkinstalldirs = $(SHELL) $(MKINSTALLDIRS)
 
 GMSGFMT = @GMSGFMT@
 MSGFMT = @MSGFMT@
@@ -150,7 +149,7 @@ install: install-exec install-data
 install-exec:
 install-data: [EMAIL PROTECTED]@
if test $(PACKAGE) = gettext-tools; then \
- $(mkinstalldirs) $(DESTDIR)$(gettextsrcdir); \
+ $(mkdir_p) $(DESTDIR)$(gettextsrcdir); \
  for file in $(DISTFILES.common) Makevars.template; do \
$(INSTALL_DATA) $(srcdir)/$$file \
$(DESTDIR)$(gettextsrcdir)/$$file; \
@@ -163,13 +162,13 @@ install-data: [EMAIL PROTECTED]@
fi
 install-data-no: all
 install-data-yes: all
-   $(mkinstalldirs) $(DESTDIR)$(datadir)
+   $(mkdir_p) $(DESTDIR)$(datadir)
@catalogs='$(CATALOGS)'; \
for cat in $$catalogs; do \
  cat=`basename $$cat`; \
  lang=`echo $$cat | sed -e 's/\.gmo$$//'`; \
  dir=$(localedir)/$$lang/LC_MESSAGES; \
- $(mkinstalldirs) $(DESTDIR)$$dir; \
+ $(mkdir_p) $(DESTDIR)$$dir; \
  if test -r $$cat; then realcat=$$cat; else realcat=$(srcdir)/$$cat; 
fi; \
  $(INSTALL_DATA) $$realcat $(DESTDIR)$$dir/$(DOMAIN).mo; \
  echo installing $$realcat as $(DESTDIR)$$dir/$(DOMAIN).mo; \
@@ -209,19 +208,19 @@ installdirs: installdirs-exec installdir
 installdirs-exec:
 installdirs-data: [EMAIL PROTECTED]@
if test $(PACKAGE) = gettext-tools; then \
- $(mkinstalldirs) $(DESTDIR)$(gettextsrcdir); \
+ $(mkdir_p) $(DESTDIR)$(gettextsrcdir); \
else \
  : ; \
fi
 installdirs-data-no:
 installdirs-data-yes:
-   $(mkinstalldirs) $(DESTDIR)$(datadir)
+   $(mkdir_p) $(DESTDIR)$(datadir)
@catalogs='$(CATALOGS)'; \
for cat in $$catalogs; do \
  cat=`basename $$cat`; \
  lang=`echo $$cat | sed -e 's/\.gmo$$//'`; \
  dir=$(localedir)/$$lang/LC_MESSAGES; \
- $(mkinstalldirs) $(DESTDIR)$$dir; \
+ $(mkdir_p) $(DESTDIR)$$dir; \
  for lc in '' $(EXTRA_LOCALE_CATEGORIES); do \
if test -n $$lc; then \
  if (cd $(DESTDIR)$(localedir)/$$lang  LC_ALL=C ls -l -d $$lc 
2/dev/null) | grep ' - ' /dev/null; then \
Index: e2fsprogs-git/misc/Makefile.in
===
--- e2fsprogs-git.orig/misc/Makefile.in
+++ e2fsprogs-git/misc/Makefile.in
@@ -234,8 +234,8 @@ filefrag.8: $(DEP_SUBSTITUTE) $(srcdir)/
@$(SUBSTITUTE_UPTIME) $(srcdir)/filefrag.8.in filefrag.8
 
 installdirs:
-   @echo  MKINSTALLDIRS $(sbindir) $(root_sbindir) $(bindir) $(man1dir) 
$(man8dir) $(libdir) $(root_sysconfdir)
-   @$(MKINSTALLDIRS) $(DESTDIR)$(sbindir) \
+   @echo  MKDIR_P $(sbindir) $(root_sbindir) $(bindir) $(man1dir) 
$(man8dir) $(libdir) $(root_sysconfdir)
+   @$(MKDIR_P) $(DESTDIR)$(sbindir) \
$(DESTDIR)$(root_sbindir) $(DESTDIR)$(bindir) \
$(DESTDIR)$(man1dir) $(DESTDIR)$(man8dir) \
$(DESTDIR)$(man1dir) $(DESTDIR)$(man5dir) \
Index: e2fsprogs-git/e2fsck

Re: [PATCH] allow tune2fs to set/clear resize_inode

2007-11-05 Thread Eric Sandeen
Andreas Dilger wrote:
 On Nov 05, 2007  14:45 -0600, Eric Sandeen wrote:
 (this patch has been carried in Red Hat / Fedora rpms for a while, not 
 sure why it never got sent upstream... clearing this allows for mounting 
 filesystems with a resize_inode on older systems)

 Allow tune2fs to set  clear resize_inode; requires fsck afterwards.
 
 Two things that are a bit confusing:
 - since RESIZE_INODE is a COMPAT feature, I don't see how this affects
   mounting the filesystem on older systems?  Is there a bug somewhere?

Hmm there is a RHEL3 bug, long since closed with a patch like this one
(actually less fleshed out)... but you're right, this should work.

(quick test)

...and it does.  Grrr, sorry, assumed the patch was there for a reason!

I'll retract this patch, I don't think it's needed.  Sorry for the noise.

-Eric


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-02 Thread Eric Sandeen
Alex Tomas wrote:
 Eric,
 
 would you mind to repeat the run and then grab /proc/fs/ext4/dev/mb_history?
 
 thanks in advance, Alex

Sure thing, attached; this is from a 1024x1M run, wound up with 32
fragments, out of order:

First block: 122880
Last block: 491519
Discontinuity: Block 7424 is at 114688 (was 130303)
Discontinuity: Block 15616 is at 106496 (was 122879)
Discontinuity: Block 23552 is at 102400 (was 114431)
Discontinuity: Block 26624 is at 155648 (was 105471)
Discontinuity: Block 33792 is at 147456 (was 162815)
Discontinuity: Block 41472 is at 162816 (was 155135)
Discontinuity: Block 42496 is at 188416 (was 163839)
Discontinuity: Block 50176 is at 180224 (was 196095)
Discontinuity: Block 58368 is at 172032 (was 188415)
Discontinuity: Block 66560 is at 167936 (was 180223)


pid 3695 is pdflush, I assume 7634 was the dd itself.

-Eric


mb_history.bz2
Description: application/bzip


  1   2   3   >