Re: delalloc fragmenting files?

2007-11-02 Thread Alex Tomas

Eric,

would you mind to repeat the run and then grab /proc/fs/ext4/dev/mb_history?

thanks in advance, Alex

Eric Sandeen wrote:

Eric Sandeen wrote:

Alex Tomas wrote:

please, try the patch attached.

Looks quite a bit better:

http://people.redhat.com/esandeen/seekwatcher/ext4-alex.png
http://people.redhat.com/esandeen/seekwatcher/ext4-alex-ext4-dd-write.png
http://people.redhat.com/esandeen/seekwatcher/ext4-alex-ext4-xfs-dd-write.png

It is much less fragmented, although still not exactly the nice linear
allocation I'd expect from a single threaded large write on a fresh fs...


Note, we're still getting out-of-order extents, too:

First block: 122880
Last block: 2694143
Discontinuity: Block 7424 is at 101376 (was 130303)
Discontinuity: Block 28160 is at 133120 (was 122111)
Discontinuity: Block 58368 is at 188416 (was 163327)
Discontinuity: Block 66304 is at 180224 (was 196351)
Discontinuity: Block 73984 is at 172032 (was 187903)
Discontinuity: Block 81664 is at 167936 (was 179711)
Discontinuity: Block 84736 is at 221184 (was 171007)
Discontinuity: Block 92416 is at 212992 (was 228863)
Discontinuity: Block 100096 is at 204800 (was 220671)
Discontinuity: Block 107776 is at 198656 (was 212479)
...

I'm trying to find time to look into this but other things are knocking
at my door so no promises...

-Eric



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delalloc fragmenting files?

2007-11-02 Thread Eric Sandeen
Alex Tomas wrote:
 Eric,
 
 would you mind to repeat the run and then grab /proc/fs/ext4/dev/mb_history?
 
 thanks in advance, Alex

Sure thing, attached; this is from a 1024x1M run, wound up with 32
fragments, out of order:

First block: 122880
Last block: 491519
Discontinuity: Block 7424 is at 114688 (was 130303)
Discontinuity: Block 15616 is at 106496 (was 122879)
Discontinuity: Block 23552 is at 102400 (was 114431)
Discontinuity: Block 26624 is at 155648 (was 105471)
Discontinuity: Block 33792 is at 147456 (was 162815)
Discontinuity: Block 41472 is at 162816 (was 155135)
Discontinuity: Block 42496 is at 188416 (was 163839)
Discontinuity: Block 50176 is at 180224 (was 196095)
Discontinuity: Block 58368 is at 172032 (was 188415)
Discontinuity: Block 66560 is at 167936 (was 180223)


pid 3695 is pdflush, I assume 7634 was the dd itself.

-Eric


mb_history.bz2
Description: application/bzip


Re: [PATCH][RFC]JBD2: Fix journal checksum kernel oops on NUMA

2007-11-02 Thread Badari Pulavarty
On Fri, 2007-11-02 at 13:20 +0800, Andreas Dilger wrote:
 On Nov 01, 2007  17:40 -0700, Mingming Cao wrote:
  Current journal checksumming patch failed fsstress test on NUMA. The 
  bh-b_data passed to the crc32_be () function could be NULL pointer, 
  which caused kernel oops immediately when running fsstress with -o 
  journal_checksum. It is because the page is part of highmem on NUMA box.
  We need to kmap the page before access the bh-b_data to calculate
  the checksums.
 
 I have no objection to the patch, per-se, but I'm surprised that there
 would ever be a buffer head pointing at a page in high memory?  That
 seems contrary to what I would expect...

I was surprised to see that too while helping Mingming/Avantika track
this issue. I was under impression that we are checksumming only
metadata and it should be lowmem. But only buffer_heads are in lowmem.
Pages that point to can be in Highmem.

Thanks,
Badari

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC]Ext4: Use get_cpu()/put_cpu() in preemptible context

2007-11-02 Thread Mingming Cao
Fix the warning of:

printk: 369 messages suppressed.
BUG: using smp_processor_id() in preemptible [0001] code: fsx-linux/31702
caller is ext4_mb_new_blocks+0x2aa/0x1319

Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/ext4/mballoc.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.24-rc1/fs/ext4/mballoc.c
===
--- linux-2.6.24-rc1.orig/fs/ext4/mballoc.c 2007-11-02 17:22:18.0 
-0700
+++ linux-2.6.24-rc1/fs/ext4/mballoc.c  2007-11-02 17:23:02.0 -0700
@@ -4006,7 +4006,8 @@ static void ext4_mb_group_or_file(struct
return;
 
BUG_ON(ac-ac_lg != NULL);
-   ac-ac_lg = sbi-s_locality_groups[smp_processor_id()];
+   ac-ac_lg = sbi-s_locality_groups[get_cpu()];
+   put_cpu();
 
/* we're going to use group allocation */
ac-ac_flags |= EXT4_MB_HINT_GROUP_ALLOC;


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC]JBD2: Fix journal checksum kernel oops on NUMA

2007-11-02 Thread Andreas Dilger
On Nov 02, 2007  08:31 -0800, Badari Pulavarty wrote:
 On Fri, 2007-11-02 at 13:20 +0800, Andreas Dilger wrote:
  On Nov 01, 2007  17:40 -0700, Mingming Cao wrote:
   Current journal checksumming patch failed fsstress test on NUMA. The 
   bh-b_data passed to the crc32_be () function could be NULL pointer, 
   which caused kernel oops immediately when running fsstress with -o 
   journal_checksum. It is because the page is part of highmem on NUMA box.
   We need to kmap the page before access the bh-b_data to calculate
   the checksums.
  
  I have no objection to the patch, per-se, but I'm surprised that there
  would ever be a buffer head pointing at a page in high memory?  That
  seems contrary to what I would expect...
 
 I was surprised to see that too while helping Mingming/Avantika track
 this issue. I was under impression that we are checksumming only
 metadata and it should be lowmem. But only buffer_heads are in lowmem.
 Pages that point to can be in Highmem.

But...  this implies that every user of bh-b_data needs to kmap, and I
don't see that in the code anywhere else.  That makes me think something
else is going wrong here.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC]Ext4: Use get_cpu()/put_cpu() in preemptible context

2007-11-02 Thread Andreas Dilger
On Nov 02, 2007  17:35 -0700, Mingming Cao wrote:
 Index: linux-2.6.24-rc1/fs/ext4/mballoc.c
 ===
 --- linux-2.6.24-rc1.orig/fs/ext4/mballoc.c   2007-11-02 17:22:18.0 
 -0700
 +++ linux-2.6.24-rc1/fs/ext4/mballoc.c2007-11-02 17:23:02.0 
 -0700
 @@ -4006,7 +4006,8 @@ static void ext4_mb_group_or_file(struct
   return;
  
   BUG_ON(ac-ac_lg != NULL);
 - ac-ac_lg = sbi-s_locality_groups[smp_processor_id()];
 + ac-ac_lg = sbi-s_locality_groups[get_cpu()];
 + put_cpu();
  
   /* we're going to use group allocation */
   ac-ac_flags |= EXT4_MB_HINT_GROUP_ALLOC;

Shouldn't the put_cpu() be after ac-ac_lg is no longer being used?
I guess there would otherwise be a danger of other processes using
the same s_locality_groups[] struct?

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html