SOLVED: Re: files mapped funny? (related to online defragmentation)

2007-05-21 Thread Eric
> > I'm getting strange results when I map out the blocks used in files
> > larger than a several thousand KB. I never seem to get any more than
> > 1024 contiguous data blocks in a row. 
> > 
> filefrag command comes with e2fsprogs will print the file fragmentation
> info. I guess you can try filefrag -v command and see if that matches
> what your scripts reported.

Thanks for the pointer. It seems my program only counts data blocks, so
it sees the indirect block for each 1024 data blocks as a space between
extents. I will have to read the filefrag source and account for that.

Cheers,

Eric



signature.asc
Description: This is a digitally signed message part


[RFC][PATCH] Multiple mount protection

2007-05-21 Thread Kalpak Shah
Hi,

There have been reported instances of a filesystem having been mounted at 2 
places at the same time causing a lot of damage to the filesystem. This patch 
reserves superblock fields and an INCOMPAT flag for adding multiple mount 
protection(MMP) support within the ext4 filesystem itself. The superblock will 
have a block number (s_mmp_block) which will hold a MMP structure which has a 
sequence number which will be periodically updated every 5 seconds by a mounted 
filesystem. Whenever a filesystem will be mounted it will wait for 
s_mmp_interval seconds to make sure that the MMP sequence does not change. To 
further make sure, we write a random sequence number into the MMP block and 
wait for another s_mmp_interval secs. If the sequence no. doesn't change then 
the mount will succeed. In case of failure, the nodename, bdevname and the time 
at which the MMP block was last updated will be displayed. tune2fs can be used 
to set s_mmp_interval as desired.

It will also protect against running e2fsck on a mounted filesystem by adding 
similar logic to ext2fs_open().

Any comments or views are welcome!

Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>

Index: e2fsprogs-1.40/lib/ext2fs/ext2_fs.h
===
--- e2fsprogs-1.40.orig/lib/ext2fs/ext2_fs.h
+++ e2fsprogs-1.40/lib/ext2fs/ext2_fs.h
@@ -568,8 +568,9 @@ struct ext2_super_block {
__u16   s_want_extra_isize; /* New inodes should reserve # bytes */
__u32   s_flags;/* Miscellaneous flags */
__u16   s_raid_stride;  /* RAID stride */
-   __u16   s_pad;  /* Padding */
-   __u32   s_reserved[166];/* Padding to the end of the block */
+   __u16   s_mmp_interval; /* Wait for # seconds in MMP checking */
+   __u64   s_mmp_block;/* Block for multi-mount protection */
+   __u32   s_reserved[164];/* Padding to the end of the block */
 };

 /*
@@ -631,10 +632,12 @@ struct ext2_super_block {
 #define EXT2_FEATURE_INCOMPAT_META_BG  0x0010
 #define EXT3_FEATURE_INCOMPAT_EXTENTS  0x0040
 #define EXT4_FEATURE_INCOMPAT_64BIT0x0080
+#define EXT4_FEATURE_INCOMPAT_MMP  0x0100


 #define EXT2_FEATURE_COMPAT_SUPP   0
-#define EXT2_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE)
+#define EXT2_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE| \
+EXT4_FEATURE_INCOMPAT_MMP)
 #define EXT2_FEATURE_RO_COMPAT_SUPP(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
 EXT2_FEATURE_RO_COMPAT_BTREE_DIR)


Thanks,
Kalpak.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files mapped funny? (related to online defragmentation)

2007-05-21 Thread Mingming Cao
On Mon, 2007-05-21 at 06:49 -0700, Eric wrote:
> Hi,
> 
> I'm getting strange results when I map out the blocks used in files
> larger than a several thousand KB. I never seem to get any more than
> 1024 contiguous data blocks in a row. 
> 
> Here's a portion of the output of my script when I run it on a 176MB
> file in my home directory:
> ...
> Contiguous chunk 67: 2385568 - 2385591  (24 blocks)
> Contiguous chunk 68: 2385608 - 2386448  (841 blocks)
> Contiguous chunk 69: 2386450 - 2387473  (1024 blocks)
> Contiguous chunk 70: 2387475 - 2388498  (1024 blocks)
> Contiguous chunk 71: 2388500 - 2389523  (1024 blocks)
> ...
>
> Maybe this is a bug in my script? Can anyone explain why this would
> happen?
> 
filefrag command comes with e2fsprogs will print the file fragmentation
info. I guess you can try filefrag -v command and see if that matches
what your scripts reported.

Mingming

> I'm attaching my script in case other ext2/3/4 newbies can get any use
> out of it, and in case anyone needs to see it in order to answer my
> question. It's pretty self-explanatory, though.
> 
> Cheers,
> 
> Eric
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Add stack I/O manager.

2007-05-21 Thread Theodore Tso
On Mon, May 21, 2007 at 03:57:26PM +0530, Aneesh Kumar K.V wrote:
> Can't we address the FIXME by calling the respective close routine in 
> the failure case.

You can but it doesn't address the biggest problem, which is if you're
going to add all of this extra complexity, we might as well deal with
the name parsing issue.  In your stacking I/O layer, you pass the same
name to all of the stacked modules.  That's not necessarily the right
thing to do.  It works for test_io because it doesn't need a name
parameter (it just passes the name straight down to the its
lower-layer module).  It mostly works for the undo_io layer because
while the tdb pathname could be the passed in open argument, it's
ext2fs_openfs() is openfs, and you don't have to give it its tdb name
until after the ext2fs_openfs() succeeds, and you can pass it down as
an option.

In general, though, you need to worry about what sub-parts of the name
you need to pass down to each of the stacked modules.  So the stack_io
layer adds a lot of complexity, but it's not a fully general solution.
So if it's not fully general, maybe something that's simpler is
sufficient, since in truth we probably don't need unlimited levels of
stacking.

> One thing i was confused about was the usage of read_blk, write_blk etc 
> pointers in test_private_data. With respect to undo I/O manager do i 
> need to provide them ?. If we really need them, then i was thinking a 
> generic stacking layer as i send in the patches would be better. That 
> means any pluggable functionality that we achieve right now by setting
> test_io_cb_read_blk etc will be implemented as a I/O manager that does 
> the particular task. Later we stack all these I/O manager to get the 
> full functionality.

The read_blk, write_blk, etc. pointers in test_private_data() are
specific to the test_io layer.  That's part of the value-add which the
test_io module provides, and no, the undo I/O manager doesn't need
them.  It's just there as part of what the user of the test i/o
manager might want to use, by allowing some arbitrary callback
function to be called for each test i/o.  The main place I've used it
is when I want to set effectively a watchpoint on e2fsck, because I'm
trying to figure out which part of e2fsck is reading or writing a
particular block, and I want to dump out the contents of that block
when it reads/writes it.  The fastest way to do that is to add a
callback function that tests for the block number, and when it is
triggered, I can have the debugging function print out some or all of
the contents the block.  So that's strictly a test i/o manager thing;
the undo i/o manager wouldn't need this at all!

Regards,

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


files mapped funny? (related to online defragmentation)

2007-05-21 Thread Eric
Hi,

I'm getting strange results when I map out the blocks used in files
larger than a several thousand KB. I never seem to get any more than
1024 contiguous data blocks in a row. 

Here's a portion of the output of my script when I run it on a 176MB
file in my home directory:
...
Contiguous chunk 67: 2385568 - 2385591  (24 blocks)
Contiguous chunk 68: 2385608 - 2386448  (841 blocks)
Contiguous chunk 69: 2386450 - 2387473  (1024 blocks)
Contiguous chunk 70: 2387475 - 2388498  (1024 blocks)
Contiguous chunk 71: 2388500 - 2389523  (1024 blocks)
...

Maybe this is a bug in my script? Can anyone explain why this would
happen?

I'm attaching my script in case other ext2/3/4 newbies can get any use
out of it, and in case anyone needs to see it in order to answer my
question. It's pretty self-explanatory, though.

Cheers,

Eric

#!/usr/bin/python

# Eric's first python program :)

import sys #for sys.argv
import fcntl #for fcntl.ioctl
import os #for os.access
import struct # for struct.pack and struct.unpack

print "WARNING: This script may not work for files with holes."

if len(sys.argv) != 2:
	print "Usage: " + sys.argv[0] + " "
	print "Note: This program uses the FIBMAP ioctl, so you must be root."
	sys.exit(1)

if os.access(sys.argv[1], os.F_OK) == False:
	print "File " + sys.argv[1] + " doesn't exist."
	sys.exit(1)

f = open(sys.argv[1],'r')

fsBlockSize = struct.unpack('i',fcntl.ioctl(f,2,''))[0]
numFileBlocks = os.stat(sys.argv[1])[6] / fsBlockSize
if (os.stat(sys.argv[1])[6] % fsBlockSize) != 0:
	numFileBlocks += 1

blockIterator = 0
blockmap = []
for blockIterator in range(numFileBlocks):
	h=struct.pack('i',blockIterator)
	blockmap += struct.unpack('i',fcntl.ioctl(f,1,h))

print "Filesystem block size: " + str(fsBlockSize)
print "Number of filesystem blocks in file: " + str(numFileBlocks)

extentBegin = []
extentEnd = []
extentBegin += [blockmap[0]]
for blockIterator in range(1,len(blockmap)):
	if blockmap[blockIterator]-blockmap[blockIterator-1] == 1:
		blockIterator += 1
		continue
	else:
		extentEnd += [blockmap[blockIterator-1]]
		extentBegin += [blockmap[blockIterator]]
		blockIterator += 1
extentEnd += [blockmap[blockIterator-1]]

for n in range(0,len(extentBegin)):
	print "Contiguous chunk " + str(n) + ": " + str(extentBegin[n]) + " - " + str(extentEnd[n]) + "  (" + str(extentEnd[n] - extentBegin[n]+1) + " blocks)"
sys.exit(0)


signature.asc
Description: This is a digitally signed message part


Re: Online defragmentation and ext4migrate

2007-05-21 Thread Eric
On Mon, 2007-05-21 at 12:38 +0200, Jan Kara wrote:
>   Yes. On the other hand I believe that some people would like to use
> defragmentation but stay with ext3. For them conversion to extents is
> no-go.
> [...]
>  I've written a patch that defragments non-extent files but after
> discussion with XFS guys I've decided that the interfaces should be made
> more generic, so that XFS and other filesystems can use them too...

I see no reason why the ioctl to convert a file to extents and then
defragment it should be different from the ioctl to defragment a
non-extent file.

After all, whether a file's blocks are tracked as lists of blocks or a
set of extents is just bookkeeping, right? The set of data blocks that
make up the file and their order is the same regardless of whether the
extent flag is set in the inode.

If the user is running the ext2/3 driver or the ext4 driver with the
noextents option, just defragment the file. If the user is running ext4
without the noextents option, convert to extents and then defragment.

The only problem that I can think of is that defragmenting metadata
(including indirect block and/or whatever the equivalent is in
extent-land) presumably has performance benefits too, so maybe a
defragmenter in userspace would want to have some knowledge/control over
this process.

Cheers,

Eric



signature.asc
Description: This is a digitally signed message part


Re: Online defragmentation and ext4migrate

2007-05-21 Thread Aneesh Kumar K.V



Takashi Sato wrote:

Hi Aneesh san,


In my opinion, to keep the ioctl simple and small is very important
for ease of maintenance.  So I would rather not support indirect
block files in the ioctl.
Instead, I can add the call of the migration ioctl to my defrag tool in 
order

to defragment indirect block files.  How do you think of it?




That should be fine. So i will start moving the ext4migrate code as a 
ext4 ioctl and later will send you a patch to the defrag tool that will 
migrate and defrag.



-aneesh

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Online defragmentation and ext4migrate

2007-05-21 Thread Jan Kara
> On 5/19/07, Eric <[EMAIL PROTECTED]> wrote:
> >On Fri, 2007-05-18 at 18:36 +0530, Aneesh Kumar K.V wrote:
> >> The reason why i am asking this is to understand the
> >> usefulness of doing a ext4migrate followed by defrag.
> >> [...]
> >> Also looking at the version 0.4 I see that defrag ioctl only work if we
> >> have EXT4_EXTENTS_FL flag set.
> >
> >ext4migrate is necessary because the current ext4 defrag routines will
> >only defragment files stored as extents. AFAIK, converting a file to
> >extents does not allow the defrag routine to defragment it "better" than
> >an indirect block map inode, but converting any file to extents has
> >performance benefits regardless of whether it is later defragmented.
> >
> >> What are the plans for making defrag work
> >> with indirect block map inode ?
> >
> >I think there is a second set of patches to defragment non-extent
> >files.
> >
> 
> I was looking at this and didn't find the changes needed to defrag the
> non extent files.
> 
> http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg01522.html
  I've written a patch that defragments non-extent files but after
discussion with XFS guys I've decided that the interfaces should be made
more generic, so that XFS and other filesystems can use them too...

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Online defragmentation and ext4migrate

2007-05-21 Thread Jan Kara
  Hello,

> >While doing online defragmentation do we move the blocks corresponding to 
> >extent index ? The reason why i am asking this is to understand the
> >usefulness of doing a ext4migrate followed by defrag. I understand that 
> >defragmentation in general will improve the performance. But with respect 
> >to ext4migrate we are not touching the data blocks. Instead we build the 
> >extent map and if that requires to have an extent index block then we 
> >allocate one. I am trying to understand what would be the performance 
> >impact of this and whether doing a defrag really improve the performance.
> 
> I think converting a file to extents has the benefit for the performance of
> block searching. If we want to improve also the performance of  reading
> file data, we have to run the defrag after that.
  Yes. On the other hand I believe that some people would like to use
defragmentation but stay with ext3. For them conversion to extents is
no-go.

> >Also looking at the version 0.4 I see that defrag ioctl only work if we 
> >have EXT4_EXTENTS_FL flag set. What are the plans for making defrag work 
> >with indirect block map inode ?
> 
> Unfortunately, my defrag doesn't support an indirect block file.
> But we can reduce fragments in the file with the defrag just after
> ext4migrate.
> 
> In my opinion, to keep the ioctl simple and small is very important
> for ease of maintenance.  So I would rather not support indirect block
> files in the ioctl.  Instead, I can add the call of the migration
> ioctl to my defrag tool in order to defragment indirect block files.
> How do you think of it?
  Yes that could be useful but I don't think it's a complete solution
for people that don't want to migrate.

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext2/3/4 online defrag

2007-05-21 Thread Jan Kara
> On Thu, 2007-05-17 at 18:11 +0200, Jan Kara wrote:
> > But me (and several other people
> > independently as I've learnt recently) have written some tools which
> > should result in something useful. If you're interested, you can join
> > [EMAIL PROTECTED] - it's led by one guy who is doing
> > defrag and stuff as his google summer of code project.
> 
> Is this different from the ext4/extent-based defrag patch that's been
> mentioned on this list?
  Yes, it is different. In particular, it's offline only tool so far...

> > > *An implementation of an ext* filesystem driver can work with any
> > > ext2/3/4 filesystem as long as it supports the necessary revision
> > > (GOOD_OLD_REV or DYNAMIC_REV) and feature flags set in the filesystem.
> >   Not sure what you mean here...
> 
> The "ext2 filesystem"/"ext3 filesystem"/"ext4 filesystem" terminology
> was confusing to me when I first started reading about them. In my mind,
> it implied that those three filesystems were more different than they
> actually are.
> 
> I think it would be more accurate to say that they are all essentially
> the same filesystem, and that any filesystem driver that can mount a
> given filesystem can mount any other ext2/3/4 filesystem of the same
> revision with the same feature flags set.
> 
> I was asking for confirmation of this assumption, but I've since found a
> lot of really good documentation that has cleared up a lot of things.
  Yes, basically it's just a question of a feature set. But for example
current online defrag from Takashi requires extents, which are not
available for ext2 or ext3.

Honza

-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Online defragmentation and ext4migrate

2007-05-21 Thread Takashi Sato

Hi Aneesh san,

While doing online defragmentation do we move the blocks corresponding to extent index ? 
The reason why i am asking this is to understand the
usefulness of doing a ext4migrate followed by defrag. I understand that defragmentation 
in general will improve the performance. But with respect to ext4migrate we are not 
touching the data blocks. Instead we build the extent map and if that requires to have 
an extent index block then we allocate one. I am trying to understand what would be the 
performance impact of this and whether doing a defrag really improve the performance.


I think converting a file to extents has the benefit for the performance of
block searching. If we want to improve also the performance of  reading
file data, we have to run the defrag after that.

Also looking at the version 0.4 I see that defrag ioctl only work if we have 
EXT4_EXTENTS_FL flag set. What are the plans for making defrag work with indirect block 
map inode ?


Unfortunately, my defrag doesn't support an indirect block file.
But we can reduce fragments in the file with the defrag just after
ext4migrate.

In my opinion, to keep the ioctl simple and small is very important
for ease of maintenance.  So I would rather not support indirect
block files in the ioctl.
Instead, I can add the call of the migration ioctl to my defrag tool in order
to defragment indirect block files.  How do you think of it?

Cheers, Takashi 


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Add stack I/O manager.

2007-05-21 Thread Aneesh Kumar K.V



Theodore Tso wrote:

On Wed, May 09, 2007 at 01:42:17PM +0530, Aneesh Kumar K.V wrote:

From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

This I/O manager helps in stacking different I/O managers.
For example one can stack the undo I/O manager on top
of Unix I/O manager to achieve the undo functionality.


This is probably more generality than is strictly necessary; and the
place where the excess generality gets messy is the fact that you make
the stacking layer responsible for calling all of the io manager's
open routines (which is still a FIXME).  So I would flush the stack_io
layer entirely.



Can't we address the FIXME by calling the respective close routine in 
the failure case.


One thing i didn't like with the stack_io was, we will be opening the 
device at each stacked I/O manager layer; which i also think is okey 
provided we expect to use these I/O managers independently




What I would recommend as the fast and dirty approach.  Basically, ape
the approach used by test_io layer _exactly_, except instead of using
a global variable test_io_backing_manager, you provide a function
which sets the static variable, undo_io_backing_manager.  This
variable is used only by the subsequent call to the ->open method,
which just like test_io simply passes the name down to the backing
manager specified in the static variable.  Then just make the undo_io
manager work the way test_io does, where does its thing, and then it
calls the appropriate function in its private->real io_channel.
Basically, make undo_io responsible for calling the next io_manager
down in the chain, This is workable because we don't need to
initialize the tdb file until we first try to write to the io_channel,
and ext2fs_open() only needs to do read operations, so we can set the
tdb filename via an optoin after ext2fs_open() returns.



One thing i was confused about was the usage of read_blk, write_blk etc 
pointers in test_private_data. With respect to undo I/O manager do i 
need to provide them ?. If we really need them, then i was thinking a 
generic stacking layer as i send in the patches would be better. That 
means any pluggable functionality that we achieve right now by setting
test_io_cb_read_blk etc will be implemented as a I/O manager that does 
the particular task. Later we stack all these I/O manager to get the 
full functionality.


-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Online defragmentation and ext4migrate

2007-05-21 Thread Aneesh Kumar

On 5/19/07, Eric <[EMAIL PROTECTED]> wrote:

On Fri, 2007-05-18 at 18:36 +0530, Aneesh Kumar K.V wrote:
> The reason why i am asking this is to understand the
> usefulness of doing a ext4migrate followed by defrag.
> [...]
> Also looking at the version 0.4 I see that defrag ioctl only work if we
> have EXT4_EXTENTS_FL flag set.

ext4migrate is necessary because the current ext4 defrag routines will
only defragment files stored as extents. AFAIK, converting a file to
extents does not allow the defrag routine to defragment it "better" than
an indirect block map inode, but converting any file to extents has
performance benefits regardless of whether it is later defragmented.

> What are the plans for making defrag work
> with indirect block map inode ?

I think there is a second set of patches to defragment non-extent
files.



I was looking at this and didn't find the changes needed to defrag the
non extent files.

http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg01522.html




An even better
defragmentation routine knows how to balance the time lost to
defragmentation with the performance gained from a defragmented
filesystem. IMHO, this requires detailed knowledge of the layout of a
file's blocks on the disk. Right now, we get this information by looping
over the FIBMAP ioctl, which I understand can take quite a long time.



With the takashi's code we use ext4_ext_alloc_blocks and see if the
number of extents that we got is less than the number of extents
that we have with the original file that we intent to defrag. I am not sure an
ioctl is involved here.


Well the intent of my mail was to find the advantage of doing an
online migration.
If we are not relocating the blocks corresponding to extent index then doing a
online migration doesn't bring any specific performance bonus.



But yes i agree that there is a performance impact with defrag by
moving the data
blocks closer.

-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html