Re: [RFC][take 2] e2fsprogs: Add ext4migrate
Andreas Dilger wrote: On Apr 03, 2007 15:37 +0530, Aneesh Kumar K.V wrote: The extent insert code is derived out of the latest ext4 kernel source. I have tried to keep the code as close as possible to the kernel sources. This makes sure that any fixes for the tree building code in kernel should be easily applied to ext4migrate. The ext3_ext naming convention instead of ext4_ext found in kernel is to make sure we are in sync with rest of e2fsprogs source. Of course, the other way to do this would be to temporarily mount the filesystem as ext4, copy non-extent files via cp (can use lsattr to check for extent flag) and then rename new file over old one. Care must be taken to not mount filesystem on visible mountpoint, so that users cannot be changing the filesystem while copy is being done. This can be done to convert an ext4 filesystem back to ext3 also, if the ext4 filesystem is mounted with noextents (to disable creation of new files with extent mapping). The only minor issue is that the inode numbers of the files will change. One also need to make sure that the hard links are not broken by such copy. Also when using copy we are touching the data blocks. And for large files the copy operations could take quiet a lot of time. With the patches I sent we are not touching/relocating the data blocks. We are only converting the meta data. This results in a faster migration. The inode modification is done only at the last stage. This is to make sure that if we fail at any intermediate stage, we exit without touching the disk. The inode update is done as below a) Walk the extent index blocks and write them to the disk. If failed exit b) Write the inode. if failed exit. c) Write the updated block bitmap. if failed exit ( This could be a problem because we have already updated the inode i_block filed to point to new blocks.). But such inconsistancy between inode i_block and block bitmap can be fixed by fsck IIUC. Why not mark all the relevant blocks in use (for both exent- and block-mapped copies) until the copy is done, then write everything out, and only mark the block-mapped file blocks free after the inode is written to disk? This avoids the danger that the new extent-mapped file's blocks are marked free and get double-allocated (corrupting the file data, possibly the whole filesystem). Will do this . I don't think there is a guarantee that an impatient user will run a lengthy e2fsck after interrupting the migrate. Also, you should mark the filesystem unclean at first change unless everything completes successfully. That way e2fsck will at least run automatically on the next boot. Will do this Other general notes: - wrap lines at 80 columns - would be good to have a -R mode that walked the whole filesystem, since startup time is very long for large filesystems - also allow specifying multiple files on the command-line - changing the operation to be multi-file allows avoiding sync of bitmaps two times (once after extents are allocated and inode written, once after indirect blocks are freed). There only needs to be one sync per file. Will do this in the next patch -aneesh - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
(un)lock_kernel() ?
Why does ext4_fill_super release the BKL on entry and take it on both normal and abnormal exit? As far as I can see, ext4_fill_super is called by get_sb_bdev, which calls the -get_sb method without the BKL, and ext4_get_sb calls get_sb_bdev without the BKL. And the ext2 code does not touch the BKL in ext2_fill_super. Is the VFS code going to be changed somewhere in the future and it's being anticipated, or is this a bug? - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: (un)lock_kernel() ?
On Wed, 2007-04-04 at 08:57 -0400, John Anthony Kazos Jr. wrote: Why does ext4_fill_super release the BKL on entry and take it on both normal and abnormal exit? As far as I can see, ext4_fill_super is called by get_sb_bdev, which calls the -get_sb method without the BKL, and ext4_get_sb calls get_sb_bdev without the BKL. And the ext2 code does not touch the BKL in ext2_fill_super. Is the VFS code going to be changed somewhere in the future and it's being anticipated, or is this a bug? According to Documentation/filesystems/Locking, -get_sb() is called with the BKL held, but looking through the code, I'm not able to find where it is being taken. Shaggy -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: (un)lock_kernel() ?
According to Documentation/filesystems/Locking, -get_sb() is called with the BKL held, but looking through the code, I'm not able to find where it is being taken. I noticed that too. Unless I'm just dumb and can't see it, I'm not able to find any BKL references during filesystem mounting until you get into FS-specific code. I looked through everything from sys_mount through to vfs_kern_mount. Documentation/filesystems/porting talks about several situations where the VFS code was modified to not take the BKL, and BLK calls were added by FS non-maintainers for safety until each FS could be audited independently, but that wouldn't be the case, would it? The ext2 code takes the BKL in three places: ext2_update_inode, write_super, and ext2_compat_ioctl. Starting with ext3, it's in ext3_compat_ioctl and ext3_fill_super, and the same with ext4. I suppose the BKL does have to be held, somehow, somewhere, during mounting, or anybody using ext3 on a multiprocessor box would lock their system from unmatched locking calls. Unless the first unlock_kernel() would make the count -1 and the lock would bring it back to zero? - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext4 benchmarks
Hi, here is the first results of the round: http://www.bullopensource.org/ext4/20070404/ FFSB tests: http://www.bullopensource.org/ext4/20070404/ffsb-write.html Iozone: http://www.bullopensource.org/ext4/20070404/iozone.html Kernbuild: http://www.bullopensource.org/ext4/20070404/kernbuild.html regards, Jean Noel - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext4 benchmarks
On Apr 04, 2007 19:06 +0200, Cordenner jean noel wrote: here is the first results of the round: http://www.bullopensource.org/ext4/20070404/ Jean Noel, thank you for the test results. It is always nice to see that ext4 is doing so well compared to ext3 and XFS. Ming Ming, it should be possible to just include the mballoc+delalloc patches that Jean Noel used into the upstream ext4 patch series. When Alex or Christoph get a chance to do the VFS delalloc rewrite we can move to that new patch, but until then it seems pointless to not include this functionality which improves the performance so much. Also, if we include those patches the mballoc and delalloc features (along with extents) should be enabled by default if INCOMPAT_EXTENTS is in the superblock unless: - noextents, nomballoc, or nodelalloc mount options are given - delalloc needs to be disabled if blocksize != PAGE_SIZE Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext4 benchmarks
On Wed, 2007-04-04 at 13:21 -0600, Andreas Dilger wrote: On Apr 04, 2007 19:06 +0200, Cordenner jean noel wrote: here is the first results of the round: http://www.bullopensource.org/ext4/20070404/ Jean Noel, thank you for the test results. It is always nice to see that ext4 is doing so well compared to ext3 and XFS. Ming Ming, it should be possible to just include the mballoc+delalloc patches that Jean Noel used into the upstream ext4 patch series. When Alex or Christoph get a chance to do the VFS delalloc rewrite we can move to that new patch, but until then it seems pointless to not include this functionality which improves the performance so much. From the bull website it said the test is based on 2.6.21-rc4 kernel + delalloc patch. I don't think that includes the mballoc patch. Also, if we include those patches the mballoc and delalloc features (along with extents) should be enabled by default if INCOMPAT_EXTENTS is in the superblock unless: - noextents, nomballoc, or nodelalloc mount options are given I just added noextents and nodelalloc mount options in the 2.6.21-rc5 version ext4 patch queue. But we should keep delalloc with nomballoc. The current delalloc patch in ext4 tree plays well without mballoc. We still could do multiple block allocations with delayed allocation, though not as smart as Alex's mballoc. - delalloc needs to be disabled if blocksize != PAGE_SIZE I believe the current ext4 delalloc code turns off delalloc in this case already. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ext4 patch queue update
Ted, I Just rebased the ext4 patch queue to 2.6.21-rc5, # Added Reserve high 32 bit for 64 bit inode version i_version_hi.patch # Add mount option to turn off extents ext4_noextent_mount_opt.patch # Add mount option to turn off delayed allocation ext4_nodelalloc_mount_opt.patch And move nanosecond patch before the delayed allocation patches. fsx test passed on ppc64,x86 and x86_64. http://repo.or.cz/w/ext4-patch-queue.git # Rebased the patches to 2.6.21-rc4 # New patch to fix whitespace before applying new patches whitespace.patch # Replaced truncated beginning comments extent-overlap-bugfix persistent_allocation_1_ioctl_and_unitialized_extents # Fixed an endian error persistent_allocation_2_support_for_writing_to_unitialized_extent # updated to latest version nanosecond_timestamps.patch # i_verion # Reserve high 32 bit for 64 bit inode version # Missing the full inode version patch i_version_hi.patch # Add mount option to turn off extents ext4_noextent_mount_opt.patch ## # Unstable patches # Note: still lots of outstanding comments from linux-ext4 list, 12/2006 # Missing signed-off-by: booked-page-flag.patch # Missing signed-off-by: ext4-block-reservation.patch # fixed a bunch of endianness errors reported by sparse # Needs a signed-off-by from Alex, then can add shaggy's ext4-delayed-allocation.patch ext4-delalloc-extents-48bit.patch # Add mount option to turn off delayed allocation ext4_nodelalloc_mount_opt.patch - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html