Re: delalloc fragmenting files?
Eric, would you mind to repeat the run and then grab /proc/fs/ext4/dev/mb_history? thanks in advance, Alex Eric Sandeen wrote: Eric Sandeen wrote: Alex Tomas wrote: please, try the patch attached. Looks quite a bit better: http://people.redhat.com/esandeen/seekwatcher/ext4-alex.png http://people.redhat.com/esandeen/seekwatcher/ext4-alex-ext4-dd-write.png http://people.redhat.com/esandeen/seekwatcher/ext4-alex-ext4-xfs-dd-write.png It is much less fragmented, although still not exactly the nice linear allocation I'd expect from a single threaded large write on a fresh fs... Note, we're still getting out-of-order extents, too: First block: 122880 Last block: 2694143 Discontinuity: Block 7424 is at 101376 (was 130303) Discontinuity: Block 28160 is at 133120 (was 122111) Discontinuity: Block 58368 is at 188416 (was 163327) Discontinuity: Block 66304 is at 180224 (was 196351) Discontinuity: Block 73984 is at 172032 (was 187903) Discontinuity: Block 81664 is at 167936 (was 179711) Discontinuity: Block 84736 is at 221184 (was 171007) Discontinuity: Block 92416 is at 212992 (was 228863) Discontinuity: Block 100096 is at 204800 (was 220671) Discontinuity: Block 107776 is at 198656 (was 212479) ... I'm trying to find time to look into this but other things are knocking at my door so no promises... -Eric - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: delalloc fragmenting files?
Alex Tomas wrote: Eric, would you mind to repeat the run and then grab /proc/fs/ext4/dev/mb_history? thanks in advance, Alex Sure thing, attached; this is from a 1024x1M run, wound up with 32 fragments, out of order: First block: 122880 Last block: 491519 Discontinuity: Block 7424 is at 114688 (was 130303) Discontinuity: Block 15616 is at 106496 (was 122879) Discontinuity: Block 23552 is at 102400 (was 114431) Discontinuity: Block 26624 is at 155648 (was 105471) Discontinuity: Block 33792 is at 147456 (was 162815) Discontinuity: Block 41472 is at 162816 (was 155135) Discontinuity: Block 42496 is at 188416 (was 163839) Discontinuity: Block 50176 is at 180224 (was 196095) Discontinuity: Block 58368 is at 172032 (was 188415) Discontinuity: Block 66560 is at 167936 (was 180223) pid 3695 is pdflush, I assume 7634 was the dd itself. -Eric mb_history.bz2 Description: application/bzip
Re: [PATCH][RFC]JBD2: Fix journal checksum kernel oops on NUMA
On Fri, 2007-11-02 at 13:20 +0800, Andreas Dilger wrote: On Nov 01, 2007 17:40 -0700, Mingming Cao wrote: Current journal checksumming patch failed fsstress test on NUMA. The bh-b_data passed to the crc32_be () function could be NULL pointer, which caused kernel oops immediately when running fsstress with -o journal_checksum. It is because the page is part of highmem on NUMA box. We need to kmap the page before access the bh-b_data to calculate the checksums. I have no objection to the patch, per-se, but I'm surprised that there would ever be a buffer head pointing at a page in high memory? That seems contrary to what I would expect... I was surprised to see that too while helping Mingming/Avantika track this issue. I was under impression that we are checksumming only metadata and it should be lowmem. But only buffer_heads are in lowmem. Pages that point to can be in Highmem. Thanks, Badari - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RFC]Ext4: Use get_cpu()/put_cpu() in preemptible context
Fix the warning of: printk: 369 messages suppressed. BUG: using smp_processor_id() in preemptible [0001] code: fsx-linux/31702 caller is ext4_mb_new_blocks+0x2aa/0x1319 Signed-off-by: Mingming Cao [EMAIL PROTECTED] --- fs/ext4/mballoc.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6.24-rc1/fs/ext4/mballoc.c === --- linux-2.6.24-rc1.orig/fs/ext4/mballoc.c 2007-11-02 17:22:18.0 -0700 +++ linux-2.6.24-rc1/fs/ext4/mballoc.c 2007-11-02 17:23:02.0 -0700 @@ -4006,7 +4006,8 @@ static void ext4_mb_group_or_file(struct return; BUG_ON(ac-ac_lg != NULL); - ac-ac_lg = sbi-s_locality_groups[smp_processor_id()]; + ac-ac_lg = sbi-s_locality_groups[get_cpu()]; + put_cpu(); /* we're going to use group allocation */ ac-ac_flags |= EXT4_MB_HINT_GROUP_ALLOC; - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC]JBD2: Fix journal checksum kernel oops on NUMA
On Nov 02, 2007 08:31 -0800, Badari Pulavarty wrote: On Fri, 2007-11-02 at 13:20 +0800, Andreas Dilger wrote: On Nov 01, 2007 17:40 -0700, Mingming Cao wrote: Current journal checksumming patch failed fsstress test on NUMA. The bh-b_data passed to the crc32_be () function could be NULL pointer, which caused kernel oops immediately when running fsstress with -o journal_checksum. It is because the page is part of highmem on NUMA box. We need to kmap the page before access the bh-b_data to calculate the checksums. I have no objection to the patch, per-se, but I'm surprised that there would ever be a buffer head pointing at a page in high memory? That seems contrary to what I would expect... I was surprised to see that too while helping Mingming/Avantika track this issue. I was under impression that we are checksumming only metadata and it should be lowmem. But only buffer_heads are in lowmem. Pages that point to can be in Highmem. But... this implies that every user of bh-b_data needs to kmap, and I don't see that in the code anywhere else. That makes me think something else is going wrong here. Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC]Ext4: Use get_cpu()/put_cpu() in preemptible context
On Nov 02, 2007 17:35 -0700, Mingming Cao wrote: Index: linux-2.6.24-rc1/fs/ext4/mballoc.c === --- linux-2.6.24-rc1.orig/fs/ext4/mballoc.c 2007-11-02 17:22:18.0 -0700 +++ linux-2.6.24-rc1/fs/ext4/mballoc.c2007-11-02 17:23:02.0 -0700 @@ -4006,7 +4006,8 @@ static void ext4_mb_group_or_file(struct return; BUG_ON(ac-ac_lg != NULL); - ac-ac_lg = sbi-s_locality_groups[smp_processor_id()]; + ac-ac_lg = sbi-s_locality_groups[get_cpu()]; + put_cpu(); /* we're going to use group allocation */ ac-ac_flags |= EXT4_MB_HINT_GROUP_ALLOC; Shouldn't the put_cpu() be after ac-ac_lg is no longer being used? I guess there would otherwise be a danger of other processes using the same s_locality_groups[] struct? Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html