SOLVED: Re: files mapped funny? (related to online defragmentation)
> > I'm getting strange results when I map out the blocks used in files > > larger than a several thousand KB. I never seem to get any more than > > 1024 contiguous data blocks in a row. > > > filefrag command comes with e2fsprogs will print the file fragmentation > info. I guess you can try filefrag -v command and see if that matches > what your scripts reported. Thanks for the pointer. It seems my program only counts data blocks, so it sees the indirect block for each 1024 data blocks as a space between extents. I will have to read the filefrag source and account for that. Cheers, Eric signature.asc Description: This is a digitally signed message part
[RFC][PATCH] Multiple mount protection
Hi, There have been reported instances of a filesystem having been mounted at 2 places at the same time causing a lot of damage to the filesystem. This patch reserves superblock fields and an INCOMPAT flag for adding multiple mount protection(MMP) support within the ext4 filesystem itself. The superblock will have a block number (s_mmp_block) which will hold a MMP structure which has a sequence number which will be periodically updated every 5 seconds by a mounted filesystem. Whenever a filesystem will be mounted it will wait for s_mmp_interval seconds to make sure that the MMP sequence does not change. To further make sure, we write a random sequence number into the MMP block and wait for another s_mmp_interval secs. If the sequence no. doesn't change then the mount will succeed. In case of failure, the nodename, bdevname and the time at which the MMP block was last updated will be displayed. tune2fs can be used to set s_mmp_interval as desired. It will also protect against running e2fsck on a mounted filesystem by adding similar logic to ext2fs_open(). Any comments or views are welcome! Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> Index: e2fsprogs-1.40/lib/ext2fs/ext2_fs.h === --- e2fsprogs-1.40.orig/lib/ext2fs/ext2_fs.h +++ e2fsprogs-1.40/lib/ext2fs/ext2_fs.h @@ -568,8 +568,9 @@ struct ext2_super_block { __u16 s_want_extra_isize; /* New inodes should reserve # bytes */ __u32 s_flags;/* Miscellaneous flags */ __u16 s_raid_stride; /* RAID stride */ - __u16 s_pad; /* Padding */ - __u32 s_reserved[166];/* Padding to the end of the block */ + __u16 s_mmp_interval; /* Wait for # seconds in MMP checking */ + __u64 s_mmp_block;/* Block for multi-mount protection */ + __u32 s_reserved[164];/* Padding to the end of the block */ }; /* @@ -631,10 +632,12 @@ struct ext2_super_block { #define EXT2_FEATURE_INCOMPAT_META_BG 0x0010 #define EXT3_FEATURE_INCOMPAT_EXTENTS 0x0040 #define EXT4_FEATURE_INCOMPAT_64BIT0x0080 +#define EXT4_FEATURE_INCOMPAT_MMP 0x0100 #define EXT2_FEATURE_COMPAT_SUPP 0 -#define EXT2_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE) +#define EXT2_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE| \ +EXT4_FEATURE_INCOMPAT_MMP) #define EXT2_FEATURE_RO_COMPAT_SUPP(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \ EXT2_FEATURE_RO_COMPAT_BTREE_DIR) Thanks, Kalpak. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files mapped funny? (related to online defragmentation)
On Mon, 2007-05-21 at 06:49 -0700, Eric wrote: > Hi, > > I'm getting strange results when I map out the blocks used in files > larger than a several thousand KB. I never seem to get any more than > 1024 contiguous data blocks in a row. > > Here's a portion of the output of my script when I run it on a 176MB > file in my home directory: > ... > Contiguous chunk 67: 2385568 - 2385591 (24 blocks) > Contiguous chunk 68: 2385608 - 2386448 (841 blocks) > Contiguous chunk 69: 2386450 - 2387473 (1024 blocks) > Contiguous chunk 70: 2387475 - 2388498 (1024 blocks) > Contiguous chunk 71: 2388500 - 2389523 (1024 blocks) > ... > > Maybe this is a bug in my script? Can anyone explain why this would > happen? > filefrag command comes with e2fsprogs will print the file fragmentation info. I guess you can try filefrag -v command and see if that matches what your scripts reported. Mingming > I'm attaching my script in case other ext2/3/4 newbies can get any use > out of it, and in case anyone needs to see it in order to answer my > question. It's pretty self-explanatory, though. > > Cheers, > > Eric > - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add stack I/O manager.
On Mon, May 21, 2007 at 03:57:26PM +0530, Aneesh Kumar K.V wrote: > Can't we address the FIXME by calling the respective close routine in > the failure case. You can but it doesn't address the biggest problem, which is if you're going to add all of this extra complexity, we might as well deal with the name parsing issue. In your stacking I/O layer, you pass the same name to all of the stacked modules. That's not necessarily the right thing to do. It works for test_io because it doesn't need a name parameter (it just passes the name straight down to the its lower-layer module). It mostly works for the undo_io layer because while the tdb pathname could be the passed in open argument, it's ext2fs_openfs() is openfs, and you don't have to give it its tdb name until after the ext2fs_openfs() succeeds, and you can pass it down as an option. In general, though, you need to worry about what sub-parts of the name you need to pass down to each of the stacked modules. So the stack_io layer adds a lot of complexity, but it's not a fully general solution. So if it's not fully general, maybe something that's simpler is sufficient, since in truth we probably don't need unlimited levels of stacking. > One thing i was confused about was the usage of read_blk, write_blk etc > pointers in test_private_data. With respect to undo I/O manager do i > need to provide them ?. If we really need them, then i was thinking a > generic stacking layer as i send in the patches would be better. That > means any pluggable functionality that we achieve right now by setting > test_io_cb_read_blk etc will be implemented as a I/O manager that does > the particular task. Later we stack all these I/O manager to get the > full functionality. The read_blk, write_blk, etc. pointers in test_private_data() are specific to the test_io layer. That's part of the value-add which the test_io module provides, and no, the undo I/O manager doesn't need them. It's just there as part of what the user of the test i/o manager might want to use, by allowing some arbitrary callback function to be called for each test i/o. The main place I've used it is when I want to set effectively a watchpoint on e2fsck, because I'm trying to figure out which part of e2fsck is reading or writing a particular block, and I want to dump out the contents of that block when it reads/writes it. The fastest way to do that is to add a callback function that tests for the block number, and when it is triggered, I can have the debugging function print out some or all of the contents the block. So that's strictly a test i/o manager thing; the undo i/o manager wouldn't need this at all! Regards, - Ted - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
files mapped funny? (related to online defragmentation)
Hi, I'm getting strange results when I map out the blocks used in files larger than a several thousand KB. I never seem to get any more than 1024 contiguous data blocks in a row. Here's a portion of the output of my script when I run it on a 176MB file in my home directory: ... Contiguous chunk 67: 2385568 - 2385591 (24 blocks) Contiguous chunk 68: 2385608 - 2386448 (841 blocks) Contiguous chunk 69: 2386450 - 2387473 (1024 blocks) Contiguous chunk 70: 2387475 - 2388498 (1024 blocks) Contiguous chunk 71: 2388500 - 2389523 (1024 blocks) ... Maybe this is a bug in my script? Can anyone explain why this would happen? I'm attaching my script in case other ext2/3/4 newbies can get any use out of it, and in case anyone needs to see it in order to answer my question. It's pretty self-explanatory, though. Cheers, Eric #!/usr/bin/python # Eric's first python program :) import sys #for sys.argv import fcntl #for fcntl.ioctl import os #for os.access import struct # for struct.pack and struct.unpack print "WARNING: This script may not work for files with holes." if len(sys.argv) != 2: print "Usage: " + sys.argv[0] + " " print "Note: This program uses the FIBMAP ioctl, so you must be root." sys.exit(1) if os.access(sys.argv[1], os.F_OK) == False: print "File " + sys.argv[1] + " doesn't exist." sys.exit(1) f = open(sys.argv[1],'r') fsBlockSize = struct.unpack('i',fcntl.ioctl(f,2,''))[0] numFileBlocks = os.stat(sys.argv[1])[6] / fsBlockSize if (os.stat(sys.argv[1])[6] % fsBlockSize) != 0: numFileBlocks += 1 blockIterator = 0 blockmap = [] for blockIterator in range(numFileBlocks): h=struct.pack('i',blockIterator) blockmap += struct.unpack('i',fcntl.ioctl(f,1,h)) print "Filesystem block size: " + str(fsBlockSize) print "Number of filesystem blocks in file: " + str(numFileBlocks) extentBegin = [] extentEnd = [] extentBegin += [blockmap[0]] for blockIterator in range(1,len(blockmap)): if blockmap[blockIterator]-blockmap[blockIterator-1] == 1: blockIterator += 1 continue else: extentEnd += [blockmap[blockIterator-1]] extentBegin += [blockmap[blockIterator]] blockIterator += 1 extentEnd += [blockmap[blockIterator-1]] for n in range(0,len(extentBegin)): print "Contiguous chunk " + str(n) + ": " + str(extentBegin[n]) + " - " + str(extentEnd[n]) + " (" + str(extentEnd[n] - extentBegin[n]+1) + " blocks)" sys.exit(0) signature.asc Description: This is a digitally signed message part
Re: Online defragmentation and ext4migrate
On Mon, 2007-05-21 at 12:38 +0200, Jan Kara wrote: > Yes. On the other hand I believe that some people would like to use > defragmentation but stay with ext3. For them conversion to extents is > no-go. > [...] > I've written a patch that defragments non-extent files but after > discussion with XFS guys I've decided that the interfaces should be made > more generic, so that XFS and other filesystems can use them too... I see no reason why the ioctl to convert a file to extents and then defragment it should be different from the ioctl to defragment a non-extent file. After all, whether a file's blocks are tracked as lists of blocks or a set of extents is just bookkeeping, right? The set of data blocks that make up the file and their order is the same regardless of whether the extent flag is set in the inode. If the user is running the ext2/3 driver or the ext4 driver with the noextents option, just defragment the file. If the user is running ext4 without the noextents option, convert to extents and then defragment. The only problem that I can think of is that defragmenting metadata (including indirect block and/or whatever the equivalent is in extent-land) presumably has performance benefits too, so maybe a defragmenter in userspace would want to have some knowledge/control over this process. Cheers, Eric signature.asc Description: This is a digitally signed message part
Re: Online defragmentation and ext4migrate
Takashi Sato wrote: Hi Aneesh san, In my opinion, to keep the ioctl simple and small is very important for ease of maintenance. So I would rather not support indirect block files in the ioctl. Instead, I can add the call of the migration ioctl to my defrag tool in order to defragment indirect block files. How do you think of it? That should be fine. So i will start moving the ext4migrate code as a ext4 ioctl and later will send you a patch to the defrag tool that will migrate and defrag. -aneesh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Online defragmentation and ext4migrate
> On 5/19/07, Eric <[EMAIL PROTECTED]> wrote: > >On Fri, 2007-05-18 at 18:36 +0530, Aneesh Kumar K.V wrote: > >> The reason why i am asking this is to understand the > >> usefulness of doing a ext4migrate followed by defrag. > >> [...] > >> Also looking at the version 0.4 I see that defrag ioctl only work if we > >> have EXT4_EXTENTS_FL flag set. > > > >ext4migrate is necessary because the current ext4 defrag routines will > >only defragment files stored as extents. AFAIK, converting a file to > >extents does not allow the defrag routine to defragment it "better" than > >an indirect block map inode, but converting any file to extents has > >performance benefits regardless of whether it is later defragmented. > > > >> What are the plans for making defrag work > >> with indirect block map inode ? > > > >I think there is a second set of patches to defragment non-extent > >files. > > > > I was looking at this and didn't find the changes needed to defrag the > non extent files. > > http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg01522.html I've written a patch that defragments non-extent files but after discussion with XFS guys I've decided that the interfaces should be made more generic, so that XFS and other filesystems can use them too... Honza -- Jan Kara <[EMAIL PROTECTED]> SuSE CR Labs - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Online defragmentation and ext4migrate
Hello, > >While doing online defragmentation do we move the blocks corresponding to > >extent index ? The reason why i am asking this is to understand the > >usefulness of doing a ext4migrate followed by defrag. I understand that > >defragmentation in general will improve the performance. But with respect > >to ext4migrate we are not touching the data blocks. Instead we build the > >extent map and if that requires to have an extent index block then we > >allocate one. I am trying to understand what would be the performance > >impact of this and whether doing a defrag really improve the performance. > > I think converting a file to extents has the benefit for the performance of > block searching. If we want to improve also the performance of reading > file data, we have to run the defrag after that. Yes. On the other hand I believe that some people would like to use defragmentation but stay with ext3. For them conversion to extents is no-go. > >Also looking at the version 0.4 I see that defrag ioctl only work if we > >have EXT4_EXTENTS_FL flag set. What are the plans for making defrag work > >with indirect block map inode ? > > Unfortunately, my defrag doesn't support an indirect block file. > But we can reduce fragments in the file with the defrag just after > ext4migrate. > > In my opinion, to keep the ioctl simple and small is very important > for ease of maintenance. So I would rather not support indirect block > files in the ioctl. Instead, I can add the call of the migration > ioctl to my defrag tool in order to defragment indirect block files. > How do you think of it? Yes that could be useful but I don't think it's a complete solution for people that don't want to migrate. Honza -- Jan Kara <[EMAIL PROTECTED]> SuSE CR Labs - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext2/3/4 online defrag
> On Thu, 2007-05-17 at 18:11 +0200, Jan Kara wrote: > > But me (and several other people > > independently as I've learnt recently) have written some tools which > > should result in something useful. If you're interested, you can join > > [EMAIL PROTECTED] - it's led by one guy who is doing > > defrag and stuff as his google summer of code project. > > Is this different from the ext4/extent-based defrag patch that's been > mentioned on this list? Yes, it is different. In particular, it's offline only tool so far... > > > *An implementation of an ext* filesystem driver can work with any > > > ext2/3/4 filesystem as long as it supports the necessary revision > > > (GOOD_OLD_REV or DYNAMIC_REV) and feature flags set in the filesystem. > > Not sure what you mean here... > > The "ext2 filesystem"/"ext3 filesystem"/"ext4 filesystem" terminology > was confusing to me when I first started reading about them. In my mind, > it implied that those three filesystems were more different than they > actually are. > > I think it would be more accurate to say that they are all essentially > the same filesystem, and that any filesystem driver that can mount a > given filesystem can mount any other ext2/3/4 filesystem of the same > revision with the same feature flags set. > > I was asking for confirmation of this assumption, but I've since found a > lot of really good documentation that has cleared up a lot of things. Yes, basically it's just a question of a feature set. But for example current online defrag from Takashi requires extents, which are not available for ext2 or ext3. Honza -- Jan Kara <[EMAIL PROTECTED]> SuSE CR Labs - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Online defragmentation and ext4migrate
Hi Aneesh san, While doing online defragmentation do we move the blocks corresponding to extent index ? The reason why i am asking this is to understand the usefulness of doing a ext4migrate followed by defrag. I understand that defragmentation in general will improve the performance. But with respect to ext4migrate we are not touching the data blocks. Instead we build the extent map and if that requires to have an extent index block then we allocate one. I am trying to understand what would be the performance impact of this and whether doing a defrag really improve the performance. I think converting a file to extents has the benefit for the performance of block searching. If we want to improve also the performance of reading file data, we have to run the defrag after that. Also looking at the version 0.4 I see that defrag ioctl only work if we have EXT4_EXTENTS_FL flag set. What are the plans for making defrag work with indirect block map inode ? Unfortunately, my defrag doesn't support an indirect block file. But we can reduce fragments in the file with the defrag just after ext4migrate. In my opinion, to keep the ioctl simple and small is very important for ease of maintenance. So I would rather not support indirect block files in the ioctl. Instead, I can add the call of the migration ioctl to my defrag tool in order to defragment indirect block files. How do you think of it? Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add stack I/O manager.
Theodore Tso wrote: On Wed, May 09, 2007 at 01:42:17PM +0530, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V <[EMAIL PROTECTED]> This I/O manager helps in stacking different I/O managers. For example one can stack the undo I/O manager on top of Unix I/O manager to achieve the undo functionality. This is probably more generality than is strictly necessary; and the place where the excess generality gets messy is the fact that you make the stacking layer responsible for calling all of the io manager's open routines (which is still a FIXME). So I would flush the stack_io layer entirely. Can't we address the FIXME by calling the respective close routine in the failure case. One thing i didn't like with the stack_io was, we will be opening the device at each stacked I/O manager layer; which i also think is okey provided we expect to use these I/O managers independently What I would recommend as the fast and dirty approach. Basically, ape the approach used by test_io layer _exactly_, except instead of using a global variable test_io_backing_manager, you provide a function which sets the static variable, undo_io_backing_manager. This variable is used only by the subsequent call to the ->open method, which just like test_io simply passes the name down to the backing manager specified in the static variable. Then just make the undo_io manager work the way test_io does, where does its thing, and then it calls the appropriate function in its private->real io_channel. Basically, make undo_io responsible for calling the next io_manager down in the chain, This is workable because we don't need to initialize the tdb file until we first try to write to the io_channel, and ext2fs_open() only needs to do read operations, so we can set the tdb filename via an optoin after ext2fs_open() returns. One thing i was confused about was the usage of read_blk, write_blk etc pointers in test_private_data. With respect to undo I/O manager do i need to provide them ?. If we really need them, then i was thinking a generic stacking layer as i send in the patches would be better. That means any pluggable functionality that we achieve right now by setting test_io_cb_read_blk etc will be implemented as a I/O manager that does the particular task. Later we stack all these I/O manager to get the full functionality. -aneesh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Online defragmentation and ext4migrate
On 5/19/07, Eric <[EMAIL PROTECTED]> wrote: On Fri, 2007-05-18 at 18:36 +0530, Aneesh Kumar K.V wrote: > The reason why i am asking this is to understand the > usefulness of doing a ext4migrate followed by defrag. > [...] > Also looking at the version 0.4 I see that defrag ioctl only work if we > have EXT4_EXTENTS_FL flag set. ext4migrate is necessary because the current ext4 defrag routines will only defragment files stored as extents. AFAIK, converting a file to extents does not allow the defrag routine to defragment it "better" than an indirect block map inode, but converting any file to extents has performance benefits regardless of whether it is later defragmented. > What are the plans for making defrag work > with indirect block map inode ? I think there is a second set of patches to defragment non-extent files. I was looking at this and didn't find the changes needed to defrag the non extent files. http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg01522.html An even better defragmentation routine knows how to balance the time lost to defragmentation with the performance gained from a defragmented filesystem. IMHO, this requires detailed knowledge of the layout of a file's blocks on the disk. Right now, we get this information by looping over the FIBMAP ioctl, which I understand can take quite a long time. With the takashi's code we use ext4_ext_alloc_blocks and see if the number of extents that we got is less than the number of extents that we have with the original file that we intent to defrag. I am not sure an ioctl is involved here. Well the intent of my mail was to find the advantage of doing an online migration. If we are not relocating the blocks corresponding to extent index then doing a online migration doesn't bring any specific performance bonus. But yes i agree that there is a performance impact with defrag by moving the data blocks closer. -aneesh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html