Re: [RFC] [PATCH 3/3] Recursive mtime for ext3

2007-11-07 Thread Jan Kara
On Tue 06-11-07 10:04:47, H. Peter Anvin wrote:
 Arjan van de Ven wrote:
 On Tue, 6 Nov 2007 18:19:45 +0100
 Jan Kara [EMAIL PROTECTED] wrote:
 
 Implement recursive mtime (rtime) feature for ext3. The feature works
 as follows: In each directory we keep a flag EXT3_RTIME_FL
 (modifiable by a user) whether rtime should be updated. In case a
 directory or a file in it is modified and when the flag is set,
 directory's rtime is updated, the flag is cleared, and we move to the
 parent. If the flag is set there, we clear it, update rtime and
 continue upwards upto the root of the filesystem. In case a regular
 file or symlink is modified, we pick arbitrary of its parents
 (actually the one that happens to be at the head of i_dentry list)
 and start the rtime update algorith there.
 
 Ok since mtime (and rtime) are part of the inode and not the dentry...
 how do you deal with hardlinks? And with cases of files that have been
 unlinked? (ok the later is a wash obviously other than not crashing)
  Unlinked files are easy - you just don't propagate the rtime anywhere.
For hardlinks see below.

 There is only one possible answer... he only updates the directory path 
 that was used to touch the particular file involved.  Thus, the 
 semantics gets grotty not just in the presence of hard links, but also 
 in the presence of bind- and other non-root mounts.
  Update of recursive mtime does not pass filesystem boundaries (i.e.
mountpoints) so bind mounts and such are non-issue (hmm, at least that was
my original idea but as I'm looking now I don't handle bind mounts properly
so that needs to be fixed). With hardlinks, you are right that the
behaviour is undeterministic - I tried to argue in the text of the mail
that this does not actually matter - there are not many hardlinks on usual
system and so the application can check hardlinked files in the old way -
i.e. look at mtime.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH 3/3] Recursive mtime for ext3

2007-11-07 Thread Jan Kara
On Tue 06-11-07 18:01:00, Al Viro wrote:
 On Tue, Nov 06, 2007 at 06:19:45PM +0100, Jan Kara wrote:
  Implement recursive mtime (rtime) feature for ext3. The feature works as
  follows: In each directory we keep a flag EXT3_RTIME_FL (modifiable by a 
  user)
  whether rtime should be updated. In case a directory or a file in it is
  modified and when the flag is set, directory's rtime is updated, the flag is
  cleared, and we move to the parent. If the flag is set there, we clear it,
  update rtime and continue upwards upto the root of the filesystem. In case a
  regular file or symlink is modified, we pick arbitrary of its parents 
  (actually
  the one that happens to be at the head of i_dentry list) and start the rtime
  update algorith there.
 
 *e*
 
 Nothing like undeterministic behaviour, is there?
  Oh yes, there is :) But I tried to argue it does not really matter -
application would have to handle hardlinks in a special way but I find that
acceptable given how rare they are...

  Intended use case is that application which wants to watch any modification 
  in
  a subtree scans the subtree and sets flags for all inodes there. Next time, 
  it
  just needs to recurse in directories having rtime newer than the start of 
  the
  previous scan. There it can handle modifications and set the flag again. It 
  is
  up to application to watch out for hardlinked files. It can e.g.  build 
  their
  list and check their mtime separately (when a hardlink to a file is created 
  its
  inode is modified and rtimes properly updated and thus any application has 
  an
  effective way of finding new hardlinked files).
 
 You know, you can do that with aush^H^Hdit right now...
  Interesting idea, no I have not thought about this. I guess you mean
watching all the VFS modification events and then do the checking and 
propagation
from user space... My first feeling is that the performance penalty would be
considerably higher (currently I am at 1% performance penalty for quite
pessimistic test case) but in case the current patch would be considered
unacceptable, I can try how large the penalty would be. Thanks for
suggestion.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: Handle rec_len correctly for 64KB blocksize

2007-11-07 Thread Jan Kara
  Hello,

  sorry for replying to myself but I've just found out that the patch I've
sent was and old version of the patch which had some problems. Attached is
a new version.

On Tue 06-11-07 12:31:42, Jan Kara wrote:
   it seems attached patch still did not get your attention. It makes
 e2fsprogs properly handle filesystems with 64KB block size. Could you put
 it into e2fsprogs git? Thanks.

Honza

-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
Subject: Support for 64KB blocksize in ext2-4 directories.

When block size is 64KB, we have to take care that rec_len does not overflow.
Kernel stores 0x in case 0x1 should be stored - perform appropriate
conversion when reading from / writing to disk.

Signed-off-by: Jan Kara [EMAIL PROTECTED]

diff --git a/lib/ext2fs/dirblock.c b/lib/ext2fs/dirblock.c
index fb20fa0..db73edd 100644
--- a/lib/ext2fs/dirblock.c
+++ b/lib/ext2fs/dirblock.c
@@ -38,9 +38,9 @@ errcode_t ext2fs_read_dir_block2(ext2_fi
 		dirent = (struct ext2_dir_entry *) p;
 #ifdef WORDS_BIGENDIAN
 		dirent-inode = ext2fs_swab32(dirent-inode);
-		dirent-rec_len = ext2fs_swab16(dirent-rec_len);
 		dirent-name_len = ext2fs_swab16(dirent-name_len);
 #endif
+		dirent-rec_len = ext2fs_rec_len_from_disk(dirent-rec_len);
 		name_len = dirent-name_len;
 #ifdef WORDS_BIGENDIAN
 		if (flags  EXT2_DIRBLOCK_V2_STRUCT)
@@ -68,12 +68,15 @@ errcode_t ext2fs_read_dir_block(ext2_fil
 errcode_t ext2fs_write_dir_block2(ext2_filsys fs, blk_t block,
   void *inbuf, int flags EXT2FS_ATTR((unused)))
 {
-#ifdef WORDS_BIGENDIAN
 	errcode_t	retval;
 	char		*p, *end;
 	char		*buf = 0;
 	struct ext2_dir_entry *dirent;
 
+#ifndef WORDS_BIGENDIAN
+	if (fs-blocksize  EXT2_MAX_REC_LEN)
+		goto just_write;
+#endif
 	retval = ext2fs_get_mem(fs-blocksize, buf);
 	if (retval)
 		return retval;
@@ -88,19 +91,18 @@ errcode_t ext2fs_write_dir_block2(ext2_f
 			return (EXT2_ET_DIR_CORRUPTED);
 		}
 		p += dirent-rec_len;
+		dirent-rec_len = ext2fs_rec_len_to_disk(dirent-rec_len);
+#ifdef WORDS_BIGENDIAN
 		dirent-inode = ext2fs_swab32(dirent-inode);
-		dirent-rec_len = ext2fs_swab16(dirent-rec_len);
-		dirent-name_len = ext2fs_swab16(dirent-name_len);
-
-		if (flags  EXT2_DIRBLOCK_V2_STRUCT)
+		if (!(flags  EXT2_DIRBLOCK_V2_STRUCT))
 			dirent-name_len = ext2fs_swab16(dirent-name_len);
+#endif
 	}
  	retval = io_channel_write_blk(fs-io, block, 1, buf);
 	ext2fs_free_mem(buf);
 	return retval;
-#else
+just_write:
  	return io_channel_write_blk(fs-io, block, 1, (char *) inbuf);
-#endif
 }
 
 
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index a316665..21747c2 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -717,6 +718,32 @@ struct ext2_dir_entry_2 {
 #define EXT2_DIR_ROUND			(EXT2_DIR_PAD - 1)
 #define EXT2_DIR_REC_LEN(name_len)	(((name_len) + 8 + EXT2_DIR_ROUND)  \
 	 ~EXT2_DIR_ROUND)
+#define EXT2_MAX_REC_LEN		((116)-1)
+
+static inline unsigned ext2fs_rec_len_from_disk(unsigned len)
+{
+#ifdef WORDS_BIGENDIAN
+	len = ext2fs_swab16(dlen);
+#endif
+	if (len == EXT2_MAX_REC_LEN)
+		return 1  16;
+	return len;
+}
+
+static inline unsigned ext2fs_rec_len_to_disk(unsigned len)
+{
+	if (len == (1  16))
+#ifdef WORDS_BIGENDIAN
+		return ext2fs_swab16(EXT2_MAX_REC_LEN);
+#else
+		return EXT2_MAX_REC_LEN;
+#endif
+#ifdef WORDS_BIGENDIAN
+	return ext2fs_swab_16(len);
+#else
+	return len;
+#endif
+}
 
 /*
  * This structure will be used for multiple mount protection. It will be
diff --git a/misc/e2image.c b/misc/e2image.c
index 1fbb267..4e2c9fb 100644
--- a/misc/e2image.c
+++ b/misc/e2image.c
@@ -345,10 +345,7 @@ static void scramble_dir_block(ext2_fils
 	end = buf + fs-blocksize;
 	for (p = buf; p  end-8; p += rec_len) {
 		dirent = (struct ext2_dir_entry_2 *) p;
-		rec_len = dirent-rec_len;
-#ifdef WORDS_BIGENDIAN
-		rec_len = ext2fs_swab16(rec_len);
-#endif
+		rec_len = ext2fs_rec_len_from_disk(dirent-rec_len);
 #if 0
 		printf(rec_len = %d, name_len = %d\n, rec_len, dirent-name_len);
 #endif
@@ -358,9 +355,7 @@ static void scramble_dir_block(ext2_fils
 			   bad rec_len (%d)\n, (unsigned long) blk, 
 			   rec_len);
 			rec_len = end - p;
-#ifdef WORDS_BIGENDIAN
-dirent-rec_len = ext2fs_swab16(rec_len);
-#endif
+			dirent-rec_len = ext2fs_rec_len_to_disk(rec_len);
 			continue;
 		}
 		if (dirent-name_len + 8  rec_len) {


Re: [PATCH] allow tune2fs to set/clear resize_inode

2007-11-07 Thread Andreas Dilger
On Nov 06, 2007  13:51 -0500, Theodore Tso wrote:
 On Tue, Nov 06, 2007 at 09:12:55AM +0800, Andreas Dilger wrote:
  What is needed is an ext2prepare-like step that involves resize2fs code
  to move the file/dir blocks and then the move inode table, as if the
  filesystem were going to be resized to the new maximum resize limit,
  and then create the resize inode but do not actually add new blocks/groups
  at the end of the filesystem.
 
 Yeah, the plan was to eventually add ext2prepare-like code into
 tune2fs, using the undo I/O manager for safety.  But that's been
 relatively low priority.
 
 BTW, I've gotten ~2 bug reports from Debian users claiming that
 ext2prepare had trashed their filesystem.  I don't have any clean
 evidence about whether it was a userspace error or some kind of bug in
 ext2prepare, possibly conflicting with some new ext3 feature that
 we've since added that ext2prepare doesn't properly account for
 (extended attributes, maybe?).  
 
 I have not had time to look into it, but thought has crossed my mind
 that a quick hack would be to splice the undo manager into
 ext2prepare, have it run e2fsck, and if it fails, do a rollback,
 create an e2image file, and then instruct the user to send in a bug
 report.  :-)

I don't think it would be very easy to splice undo manager into ext2prepare.
I'd rather see time spent to make resize2fs handle the prepare functionality
and then ext2resize can be entirely obsoleted.

Aneesh, adding undo manager to resize2fs would be an excellent use of that
library, and going from resize2fs to resize2fs --prepare-only (or whatever)
would be trivial I think.  Is that something you'd be interested to work on?

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-07 Thread Eric Sandeen
Andreas Dilger wrote:
 On Nov 06, 2007  13:54 -0600, Eric Sandeen wrote:
 Hmm bad news is when I add uninit_groups into the mix, it goes a little
 south again, with some out-of-order extents.  Not the end of the world,
 but a little unexpected?

 I think part of the issue is that by default the groups marked BLOCK_UNINIT
 are skipped, to avoid dirtying those groups if they have never been used
 before.  This policy could be changed in the mballoc code pretty easily if
 you think it is a net loss.  Note that the size of the extents is large
 enough (120MB or more) that some small reordering is probably not going
 to affect the performance in any meaningful way.

You're probably right; on the other hand, this is about the simplest
test an allocator could wish for - a single-threaded large linear write
in big IO chunks.

In this case it's probably not a big deal; I do wonder how it might
affect the bigger picture though, with more writing threads, aged
filesystems, and the like.  Just thought it was worth pointing out, as I
started looking at allocator behavior in the simple/isolated/unrealistic
:) cases.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-07 Thread Andreas Dilger
On Nov 06, 2007  13:54 -0600, Eric Sandeen wrote:
 Hmm bad news is when I add uninit_groups into the mix, it goes a little
 south again, with some out-of-order extents.  Not the end of the world,
 but a little unexpected?
 
 
 Discontinuity: Block 1430784 is at 24183810 (was 24181761)
 Discontinuity: Block 1461760 is at 24216578 (was 24214785)
 Discontinuity: Block 1492480 is at 37888 (was 24247297)
 Discontinuity: Block 1519616 is at 850944 (was 65023)
 Discontinuity: Block 1520640 is at 883712 (was 851967)
 Discontinuity: Block 1521664 is at 1670144 (was 884735)
 Discontinuity: Block 1522688 is at 2685952 (was 1671167)
 Discontinuity: Block 1523712 is at 4226048 (was 2686975)
 Discontinuity: Block 1524736 is at 11271168 (was 4227071)
 Discontinuity: Block 1525760 is at 23952384 (was 11272191)

I think part of the issue is that by default the groups marked BLOCK_UNINIT
are skipped, to avoid dirtying those groups if they have never been used
before.  This policy could be changed in the mballoc code pretty easily if
you think it is a net loss.  Note that the size of the extents is large
enough (120MB or more) that some small reordering is probably not going
to affect the performance in any meaningful way.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix oops on corrupted ext4 mount

2007-11-07 Thread Eric Sandeen
Eric Sandeen wrote:
 When mounting an ext4 filesystem with corrupted s_first_data_block, things 
 can go very wrong and oops.
 
 Because blocks_count in ext4_fill_super is a u64, and we must use do_div, 
 the calculation of db_count is done differently than on ext4.

Urgh... than on ext3

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
I tried ext4 vs. xfs doing 4 parallel 2G IO writes in 1M units to 4
different subdirectories of the root of the filesystem:

http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_threads.png

and then read them back sequentially:

http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads_read.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads_read.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_read_threads.png

At the end of the write, ext4 had on the order of 400 extents/file, xfs
had on the order of 30 extents/file.  It's clear especially from the
read graph that ext4 is interleaving the 4 files, in about 5M chunks on
average.  Throughput seems comparable between ext4  xfs nonetheless.

Again this was on a decent HW raid so seek penalties are probably not
too bad.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Andreas Dilger
On Nov 07, 2007  16:42 -0600, Eric Sandeen wrote:
 I tried ext4 vs. xfs doing 4 parallel 2G IO writes in 1M units to 4
 different subdirectories of the root of the filesystem:
 
 http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads.png
 http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads.png
 http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_threads.png
 
 and then read them back sequentially:
 
 http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads_read.png
 http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads_read.png
 http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_read_threads.png
 
 At the end of the write, ext4 had on the order of 400 extents/file, xfs
 had on the order of 30 extents/file.  It's clear especially from the
 read graph that ext4 is interleaving the 4 files, in about 5M chunks on
 average.  Throughput seems comparable between ext4  xfs nonetheless.

The question is what the best result is for this kind of workload?
In HPC applications the common case is that you will also have the data
files read back in parallel instead of serially.

The test shows ext4 finishing marginally faster in the write case, and
marginally slower in the read case.  What happens if you have 4 parallel
readers?

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Alex Tomas

Hi,

could you try to larger preallocation? like 512/1024/2048 blocks, please?

thanks, Alex

Eric Sandeen wrote:

I tried ext4 vs. xfs doing 4 parallel 2G IO writes in 1M units to 4
different subdirectories of the root of the filesystem:

http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_threads.png

and then read them back sequentially:

http://people.redhat.com/esandeen/seekwatcher/ext4_4_threads_read.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_threads_read.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_read_threads.png

At the end of the write, ext4 had on the order of 400 extents/file, xfs
had on the order of 30 extents/file.  It's clear especially from the
read graph that ext4 is interleaving the 4 files, in about 5M chunks on
average.  Throughput seems comparable between ext4  xfs nonetheless.

Again this was on a decent HW raid so seek penalties are probably not
too bad.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
Andreas Dilger wrote:

 The question is what the best result is for this kind of workload?
 In HPC applications the common case is that you will also have the data
 files read back in parallel instead of serially.

Agreed, I'm not trying to argue what's better or worse, I'm just seeing
what it's doing.

The main reason I did sequential reads back is that it more clearly
shows the file layout for each file on the graph.  :)  I'm just getting
a handle on how the allocations are going for various types of writes.

 The test shows ext4 finishing marginally faster in the write case, and
 marginally slower in the read case.  What happens if you have 4 parallel
 readers?

I'll test that a bit later (have to run now); I expect parallel readers
may go faster, since the blocks are interleaved, and it might be able to
suck them up pretty much in order across all 4 files.

I'd also like to test some of this under a single head, rather than on
HW raid...

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH 3/3] Recursive mtime for ext3

2007-11-07 Thread Theodore Tso
On Wed, Nov 07, 2007 at 03:36:05PM +0100, Jan Kara wrote:
  What if more than one application wants to use this facility?

   That should be fine - let's see: Each application keeps somewhere a time 
 when
 it started a scan of a subtree (or it can actually remember a time when it
 set the flag for each directory), during the scan, it sets the flag on
 each directory. When it wakes up to recheck the subtree it just compares
 the rtime against the stored time - if rtime is greater, subtree has been
 modified since the last scan and we recurse in it and when we are finished
 with it we set the flag. Now notice that we don't care about the flag when
 we check for changes - we care only for rtime - so if there are several
 applications interested in the same subtree, the flag just gets set more
 often and thus the update of rtime happens more often but the same scheme
 still works fine.

OK, so in this case you don't need to set rtime on the every single
file inode, but only directory inode, right?  Because you're only
using checking the rtime at the directory level, and not the flag.
And it's just as easy for you to check the rtime flag for the file's
containing directory (modulo magic vis-a-vis hard links) as the file's
inode.

I'm just really wishing that rtime and the rtime flag didn't have live
on disk, but could rather be in memory.  If you only needed to save
the directory flags and rtimes, that might actually be doable.

Note by the way that since you need to own the file/directory to set
flags, this means that only programs that are running as root or
running as the uid who owns the entire subtree will be able to use
this scheme.  One advantage of doing in kernel memory is that you
might be able to support watching a tree that is not owned by the
watcher.

   I don't get it here - you need to scan the whole subtree and set the flag
 only during the initial scan. Later, you need to scan and set the flag only
 for directories in whose subtree something changed. Similarty rtime needs
 to be updated for each inode at most once after the scan. 

OK, so in the worst case every single file in a kernel source tree
might change after doing an extreme git checkout.  That means around
36k of files get updated.  So if you have to set/clear the rtime flag
during the checkout process 36k file inodes would have to have their
rtime flag cleared, plus 2k worth of directory inodes; but those would
probably be folded into other changes made to the inodes anyway.  But
then when trackerd goes back and scans the subtree, if you are
actually setting rtime flags for every single file inode, then that's
38k of indoes that need updating.  If you only need to set the rtime
flags for directories, that's only 2k worth of extra gratuitous inode
updates.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
Shapor Naghibzadeh wrote:
 On Wed, Nov 07, 2007 at 04:42:59PM -0600, Eric Sandeen wrote:
 Again this was on a decent HW raid so seek penalties are probably not
 too bad.
 
 You may want to verify that by doing a benchmark on the raw device.  I
 recently did some benchmarks doing random I/O on a Dell 2850 w/ a PERC
 (megaraid) RAID5 w/ 128MB onboard writeback cache and 6x 15krpm drives
 and noticed appoximately one order of magnitude throughput drop on
 small (stripe-sized) random reads versus linear.  It maxed out at ~100
 random read IOPs or seeks/sec (suprisingly low).
 
 Out of curiousity, how are you counting the seeks?

Chris Mason's seekwatcher (google can find it for you) is doing the
graphing, it uses blocktrace for the raw data.

-Eric

 Shapor

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Shapor Naghibzadeh
On Wed, Nov 07, 2007 at 04:42:59PM -0600, Eric Sandeen wrote:
 Again this was on a decent HW raid so seek penalties are probably not
 too bad.

You may want to verify that by doing a benchmark on the raw device.  I
recently did some benchmarks doing random I/O on a Dell 2850 w/ a PERC
(megaraid) RAID5 w/ 128MB onboard writeback cache and 6x 15krpm drives
and noticed appoximately one order of magnitude throughput drop on
small (stripe-sized) random reads versus linear.  It maxed out at ~100
random read IOPs or seeks/sec (suprisingly low).

Out of curiousity, how are you counting the seeks?

Shapor
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More testing: 4x parallel 2G writes, sequential reads

2007-11-07 Thread Eric Sandeen
Andreas Dilger wrote:

 The test shows ext4 finishing marginally faster in the write case, and
 marginally slower in the read case.  What happens if you have 4 parallel
 readers?

http://people.redhat.com/esandeen/seekwatcher/ext4_4_thread_par_read.png
http://people.redhat.com/esandeen/seekwatcher/xfs_4_thread_par_read.png
http://people.redhat.com/esandeen/seekwatcher/ext4_xfs_4_thread_par_read.png

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html