Re: [RFC][take 2] e2fsprogs: Add ext4migrate

2007-04-04 Thread Aneesh Kumar K.V



Andreas Dilger wrote:

On Apr 03, 2007  15:37 +0530, Aneesh Kumar K.V wrote:

The extent insert code is derived out of the latest ext4 kernel
source. I have tried to keep the code as close as possible to the
kernel sources. This makes sure that any fixes for the tree building
code in kernel should be easily applied to ext4migrate.  The ext3_ext
naming convention instead of ext4_ext found in kernel is to make sure
we are in sync with rest of e2fsprogs source.


Of course, the other way to do this would be to temporarily mount the
filesystem as ext4, copy non-extent files via cp (can use lsattr to
check for extent flag) and then rename new file over old one.  Care
must be taken to not mount filesystem on visible mountpoint, so that
users cannot be changing the filesystem while copy is being done.

This can be done to convert an ext4 filesystem back to ext3 also, if
the ext4 filesystem is mounted with noextents (to disable creation
of new files with extent mapping).

The only minor issue is that the inode numbers of the files will change.



One also need to make sure that the hard links are not broken by such
copy. Also when using copy we are touching the data blocks. And for
large files the copy operations could take quiet a lot of time. With the
patches I sent we are not touching/relocating the data blocks. We are
only converting the meta data. This results in a faster migration.





The inode modification is done only at the last stage. This is to make
sure that if we fail at any intermediate stage, we exit without touching
the disk.

The inode update is done as below
a) Walk the extent index blocks and write them to the disk. If failed exit
b) Write the inode. if failed exit.
c) Write the updated block bitmap. if failed exit ( This could be a problem
   because we have already updated the inode i_block filed to point to new
   blocks.). But such inconsistancy between inode i_block and block bitmap
   can be fixed by fsck IIUC.


Why not mark all the relevant blocks in use (for both exent- and block-mapped
copies) until the copy is done, then write everything out, and only mark the
block-mapped file blocks free after the inode is written to disk?  This avoids
the danger that the new extent-mapped file's blocks are marked free and get
double-allocated (corrupting the file data, possibly the whole filesystem).



Will do this .




I don't think there is a guarantee that an impatient user will run a lengthy
e2fsck after interrupting the migrate.  Also, you should mark the filesystem
unclean at first change unless everything completes successfully.  That way
e2fsck will at least run automatically on the next boot.





Will do this



Other general notes:
- wrap lines at 80 columns
- would be good to have a -R mode that walked the whole filesystem,
  since startup time is very long for large filesystems
- also allow specifying multiple files on the command-line
- changing the operation to be multi-file allows avoiding sync of bitmaps
  two times (once after extents are allocated and inode written, once after
  indirect blocks are freed).  There only needs to be one sync per file.




Will do this in the next patch




-aneesh

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(un)lock_kernel() ?

2007-04-04 Thread John Anthony Kazos Jr.
Why does ext4_fill_super release the BKL on entry and take it on both 
normal and abnormal exit? As far as I can see, ext4_fill_super is called 
by get_sb_bdev, which calls the -get_sb method without the BKL, and 
ext4_get_sb calls get_sb_bdev without the BKL. And the ext2 code does not 
touch the BKL in ext2_fill_super.

Is the VFS code going to be changed somewhere in the future and it's being 
anticipated, or is this a bug?
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (un)lock_kernel() ?

2007-04-04 Thread Dave Kleikamp
On Wed, 2007-04-04 at 08:57 -0400, John Anthony Kazos Jr. wrote:
 Why does ext4_fill_super release the BKL on entry and take it on both 
 normal and abnormal exit? As far as I can see, ext4_fill_super is called 
 by get_sb_bdev, which calls the -get_sb method without the BKL, and 
 ext4_get_sb calls get_sb_bdev without the BKL. And the ext2 code does not 
 touch the BKL in ext2_fill_super.
 
 Is the VFS code going to be changed somewhere in the future and it's being 
 anticipated, or is this a bug?

According to Documentation/filesystems/Locking, -get_sb() is called
with the BKL held, but looking through the code, I'm not able to find
where it is being taken.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (un)lock_kernel() ?

2007-04-04 Thread John Anthony Kazos Jr.
 According to Documentation/filesystems/Locking, -get_sb() is called
 with the BKL held, but looking through the code, I'm not able to find
 where it is being taken.

I noticed that too. Unless I'm just dumb and can't see it, I'm not able to 
find any BKL references during filesystem mounting until you get into 
FS-specific code. I looked through everything from sys_mount through to 
vfs_kern_mount. Documentation/filesystems/porting talks about several 
situations where the VFS code was modified to not take the BKL, and BLK 
calls were added by FS non-maintainers for safety until each FS could be 
audited independently, but that wouldn't be the case, would it?

The ext2 code takes the BKL in three places: ext2_update_inode, 
write_super, and ext2_compat_ioctl. Starting with ext3, it's in 
ext3_compat_ioctl and ext3_fill_super, and the same with ext4.

I suppose the BKL does have to be held, somehow, somewhere, during 
mounting, or anybody using ext3 on a multiprocessor box would lock their 
system from unmatched locking calls. Unless the first unlock_kernel() 
would make the count -1 and the lock would bring it back to zero?
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext4 benchmarks

2007-04-04 Thread Cordenner jean noel

Hi,

here is the first results of the round:
http://www.bullopensource.org/ext4/20070404/

FFSB tests:
http://www.bullopensource.org/ext4/20070404/ffsb-write.html
Iozone:
http://www.bullopensource.org/ext4/20070404/iozone.html
Kernbuild:
http://www.bullopensource.org/ext4/20070404/kernbuild.html

regards,
Jean Noel
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext4 benchmarks

2007-04-04 Thread Andreas Dilger
On Apr 04, 2007  19:06 +0200, Cordenner jean noel wrote:
 here is the first results of the round:
 http://www.bullopensource.org/ext4/20070404/

Jean Noel,
thank you for the test results.  It is always nice to see that ext4 is
doing so well compared to ext3 and XFS.

Ming Ming,
it should be possible to just include the mballoc+delalloc patches that
Jean Noel used into the upstream ext4 patch series.  When Alex or Christoph
get a chance to do the VFS delalloc rewrite we can move to that new patch,
but until then it seems pointless to not include this functionality which
improves the performance so much.

Also, if we include those patches the mballoc and delalloc features (along
with extents) should be enabled by default if INCOMPAT_EXTENTS is in the
superblock unless:
- noextents, nomballoc, or nodelalloc mount options are given
- delalloc needs to be disabled if blocksize != PAGE_SIZE

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext4 benchmarks

2007-04-04 Thread Mingming Cao
On Wed, 2007-04-04 at 13:21 -0600, Andreas Dilger wrote:
 On Apr 04, 2007  19:06 +0200, Cordenner jean noel wrote:
  here is the first results of the round:
  http://www.bullopensource.org/ext4/20070404/
 
 Jean Noel,
 thank you for the test results.  It is always nice to see that ext4 is
 doing so well compared to ext3 and XFS.
 
 Ming Ming,
 it should be possible to just include the mballoc+delalloc patches that
 Jean Noel used into the upstream ext4 patch series.  When Alex or Christoph
 get a chance to do the VFS delalloc rewrite we can move to that new patch,
 but until then it seems pointless to not include this functionality which
 improves the performance so much.
 
From the bull website it said the test is based on 2.6.21-rc4 kernel +
delalloc patch. I don't think that includes the mballoc patch.

 Also, if we include those patches the mballoc and delalloc features (along
 with extents) should be enabled by default if INCOMPAT_EXTENTS is in the
 superblock unless:
 - noextents, nomballoc, or nodelalloc mount options are given

I just added noextents and nodelalloc mount options in the 2.6.21-rc5
version ext4 patch queue.

But we should keep delalloc with nomballoc.  The current delalloc patch
in ext4 tree plays well without mballoc.  We still could do multiple
block allocations with delayed allocation, though not as smart as Alex's
mballoc.

 - delalloc needs to be disabled if blocksize != PAGE_SIZE
 
I believe the current ext4 delalloc code turns off delalloc in this case
already.

 Cheers, Andreas
 --
 Andreas Dilger
 Principal Software Engineer
 Cluster File Systems, Inc.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ext4 patch queue update

2007-04-04 Thread Mingming Cao
Ted,

I Just rebased the ext4 patch queue to 2.6.21-rc5,

# Added Reserve high 32 bit for 64 bit inode version
i_version_hi.patch

# Add mount option to turn off extents
ext4_noextent_mount_opt.patch

# Add mount option to turn off delayed allocation
ext4_nodelalloc_mount_opt.patch

And move nanosecond patch before the delayed allocation patches.

fsx test passed on ppc64,x86 and x86_64.
http://repo.or.cz/w/ext4-patch-queue.git


# Rebased the patches to 2.6.21-rc4

# New patch to fix whitespace before applying new patches
whitespace.patch

# Replaced truncated beginning comments
extent-overlap-bugfix

persistent_allocation_1_ioctl_and_unitialized_extents

# Fixed an endian error
persistent_allocation_2_support_for_writing_to_unitialized_extent

# updated to latest version
nanosecond_timestamps.patch

# i_verion
# Reserve high 32 bit for 64 bit inode version
# Missing the full inode version patch
i_version_hi.patch


# Add mount option to turn off extents
ext4_noextent_mount_opt.patch

##
# Unstable patches
# Note: still lots of outstanding comments from linux-ext4 list, 12/2006
# Missing signed-off-by:
booked-page-flag.patch

# Missing signed-off-by:
ext4-block-reservation.patch

# fixed a bunch of endianness errors reported by sparse
# Needs a signed-off-by from Alex, then can add shaggy's
ext4-delayed-allocation.patch

ext4-delalloc-extents-48bit.patch

# Add mount option to turn off delayed allocation
ext4_nodelalloc_mount_opt.patch


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html