Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Wed, 22 Aug 2012 13:47:07 -0700 Dan Williams wrote: > On Tue, Aug 21, 2012 at 11:00 PM, NeilBrown wrote: > > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu > > wrote: > > > >> > >> -#define NR_STRIPES 256 > >> +#define NR_STRIPES 1024 > > > > Changing one magic number into another magic number might help your case, > > but > > it not really a general solution. > > > > Possibly making sure that max_nr_stripes is at least some multiple of the > > chunk size might make sense, but I wouldn't want to see a very large > > multiple. > > > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > > out exactly what the best fix is soon - I'm trying to look into it. > > > > I don't think the size of the cache is a big part of the solution. I think > > correct scheduling of IO is the real answer. > > Not sure if this is what we are seeing here, but we still have the > unresolved fast parity effect whereby slower parity calculation gives > a larger time to coalesce writes. I saw this effect when playing with > xor offload. I did find a case where inserting a printk made it go faster again. Replacing that with msleep(2) worked as well. :-) I'm looking for a most robust solution though. Thanks for the reminder. NeilBrown signature.asc Description: PGP signature
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Tue, Aug 21, 2012 at 11:00 PM, NeilBrown wrote: > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu > wrote: > >> >> -#define NR_STRIPES 256 >> +#define NR_STRIPES 1024 > > Changing one magic number into another magic number might help your case, but > it not really a general solution. > > Possibly making sure that max_nr_stripes is at least some multiple of the > chunk size might make sense, but I wouldn't want to see a very large multiple. > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > out exactly what the best fix is soon - I'm trying to look into it. > > I don't think the size of the cache is a big part of the solution. I think > correct scheduling of IO is the real answer. Not sure if this is what we are seeing here, but we still have the unresolved fast parity effect whereby slower parity calculation gives a larger time to coalesce writes. I saw this effect when playing with xor offload. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On 2012-08-22, at 12:00 AM, NeilBrown wrote: > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu > wrote: >> >> -#define NR_STRIPES 256 >> +#define NR_STRIPES 1024 > > Changing one magic number into another magic number might help your case, but > it not really a general solution. We've actually been carrying a patch for a few years in Lustre to increase the NR_STRIPES to 2048, and made it a configurable module parameter. This made a noticeable improvement to the performance for fast systems. > Possibly making sure that max_nr_stripes is at least some multiple of the > chunk size might make sense, but I wouldn't want to see a very large multiple. > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > out exactly what the best fix is soon - I'm trying to look into it. The other MD RAID-5/6 patches that we have change the page submission order to avoid the need to merge pages in the elevator so much, and a patch to allow zero-copy IO submission if the caller marks the page for direct IO (indicating it will not be modified until after IO completes). This avoids a lot of overhead on fast systems. This isn't really my area of expertise, but patches against RHEL6 could be seen at http://review.whamcloud.com/1142 if you want to take a look. I don't know if that code is at all relevant to what is in 3.x today. > I don't think the size of the cache is a big part of the solution. I think > correct scheduling of IO is the real answer. My experience is that on fast systems the IO scheduler just gets in the way. Submitting larger contiguous IOs to each disk in the first place is far better than trying to merge small IOs again at the back end. Cheers, Andreas PGP.sig Description: This is a digitally signed message part
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Wed, Aug 22, 2012 at 04:00:25PM +1000, NeilBrown wrote: > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu > wrote: > > > > > -#define NR_STRIPES 256 > > +#define NR_STRIPES 1024 > > Changing one magic number into another magic number might help your case, but > it not really a general solution. Agreed. > > Possibly making sure that max_nr_stripes is at least some multiple of the > chunk size might make sense, but I wouldn't want to see a very large multiple. > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > out exactly what the best fix is soon - I'm trying to look into it. > > I don't think the size of the cache is a big part of the solution. I think > correct scheduling of IO is the real answer. Yes, it should not be. But with less max_nr_stripes, the chance to get a full strip write is less, and maybe that's the reason why the chance to block at get_active_strip() is more; and also, the reading is more. The perfect case would be there are no reading; setting max_nr_stripes to 32768(the max we get set now), you will find the reading is quite less(almost zero, please see the iostat I attached in former email). Anyway, I do agree this should not be the big part of the solution. If we can handle those stripes faster, I guess 256 would be enough. Thanks, Yuanhan Liu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu wrote: > > -#define NR_STRIPES 256 > +#define NR_STRIPES 1024 Changing one magic number into another magic number might help your case, but it not really a general solution. Possibly making sure that max_nr_stripes is at least some multiple of the chunk size might make sense, but I wouldn't want to see a very large multiple. I thing the problems with RAID5 are deeper than that. Hopefully I'll figure out exactly what the best fix is soon - I'm trying to look into it. I don't think the size of the cache is a big part of the solution. I think correct scheduling of IO is the real answer. Thanks, NeilBrown signature.asc Description: PGP signature
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu yuanhan@linux.intel.com wrote: -#define NR_STRIPES 256 +#define NR_STRIPES 1024 Changing one magic number into another magic number might help your case, but it not really a general solution. Possibly making sure that max_nr_stripes is at least some multiple of the chunk size might make sense, but I wouldn't want to see a very large multiple. I thing the problems with RAID5 are deeper than that. Hopefully I'll figure out exactly what the best fix is soon - I'm trying to look into it. I don't think the size of the cache is a big part of the solution. I think correct scheduling of IO is the real answer. Thanks, NeilBrown signature.asc Description: PGP signature
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Wed, Aug 22, 2012 at 04:00:25PM +1000, NeilBrown wrote: On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu yuanhan@linux.intel.com wrote: -#define NR_STRIPES 256 +#define NR_STRIPES 1024 Changing one magic number into another magic number might help your case, but it not really a general solution. Agreed. Possibly making sure that max_nr_stripes is at least some multiple of the chunk size might make sense, but I wouldn't want to see a very large multiple. I thing the problems with RAID5 are deeper than that. Hopefully I'll figure out exactly what the best fix is soon - I'm trying to look into it. I don't think the size of the cache is a big part of the solution. I think correct scheduling of IO is the real answer. Yes, it should not be. But with less max_nr_stripes, the chance to get a full strip write is less, and maybe that's the reason why the chance to block at get_active_strip() is more; and also, the reading is more. The perfect case would be there are no reading; setting max_nr_stripes to 32768(the max we get set now), you will find the reading is quite less(almost zero, please see the iostat I attached in former email). Anyway, I do agree this should not be the big part of the solution. If we can handle those stripes faster, I guess 256 would be enough. Thanks, Yuanhan Liu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On 2012-08-22, at 12:00 AM, NeilBrown wrote: On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu yuanhan@linux.intel.com wrote: -#define NR_STRIPES 256 +#define NR_STRIPES 1024 Changing one magic number into another magic number might help your case, but it not really a general solution. We've actually been carrying a patch for a few years in Lustre to increase the NR_STRIPES to 2048, and made it a configurable module parameter. This made a noticeable improvement to the performance for fast systems. Possibly making sure that max_nr_stripes is at least some multiple of the chunk size might make sense, but I wouldn't want to see a very large multiple. I thing the problems with RAID5 are deeper than that. Hopefully I'll figure out exactly what the best fix is soon - I'm trying to look into it. The other MD RAID-5/6 patches that we have change the page submission order to avoid the need to merge pages in the elevator so much, and a patch to allow zero-copy IO submission if the caller marks the page for direct IO (indicating it will not be modified until after IO completes). This avoids a lot of overhead on fast systems. This isn't really my area of expertise, but patches against RHEL6 could be seen at http://review.whamcloud.com/1142 if you want to take a look. I don't know if that code is at all relevant to what is in 3.x today. I don't think the size of the cache is a big part of the solution. I think correct scheduling of IO is the real answer. My experience is that on fast systems the IO scheduler just gets in the way. Submitting larger contiguous IOs to each disk in the first place is far better than trying to merge small IOs again at the back end. Cheers, Andreas PGP.sig Description: This is a digitally signed message part
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Tue, Aug 21, 2012 at 11:00 PM, NeilBrown ne...@suse.de wrote: On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu yuanhan@linux.intel.com wrote: -#define NR_STRIPES 256 +#define NR_STRIPES 1024 Changing one magic number into another magic number might help your case, but it not really a general solution. Possibly making sure that max_nr_stripes is at least some multiple of the chunk size might make sense, but I wouldn't want to see a very large multiple. I thing the problems with RAID5 are deeper than that. Hopefully I'll figure out exactly what the best fix is soon - I'm trying to look into it. I don't think the size of the cache is a big part of the solution. I think correct scheduling of IO is the real answer. Not sure if this is what we are seeing here, but we still have the unresolved fast parity effect whereby slower parity calculation gives a larger time to coalesce writes. I saw this effect when playing with xor offload. -- Dan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Wed, 22 Aug 2012 13:47:07 -0700 Dan Williams d...@fb.com wrote: On Tue, Aug 21, 2012 at 11:00 PM, NeilBrown ne...@suse.de wrote: On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu yuanhan@linux.intel.com wrote: -#define NR_STRIPES 256 +#define NR_STRIPES 1024 Changing one magic number into another magic number might help your case, but it not really a general solution. Possibly making sure that max_nr_stripes is at least some multiple of the chunk size might make sense, but I wouldn't want to see a very large multiple. I thing the problems with RAID5 are deeper than that. Hopefully I'll figure out exactly what the best fix is soon - I'm trying to look into it. I don't think the size of the cache is a big part of the solution. I think correct scheduling of IO is the real answer. Not sure if this is what we are seeing here, but we still have the unresolved fast parity effect whereby slower parity calculation gives a larger time to coalesce writes. I saw this effect when playing with xor offload. I did find a case where inserting a printk made it go faster again. Replacing that with msleep(2) worked as well. :-) I'm looking for a most robust solution though. Thanks for the reminder. NeilBrown signature.asc Description: PGP signature
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On 8/22/12 11:57 AM, Yuanhan Liu wrote: On Fri, Aug 17, 2012 at 10:25:26PM +0800, Fengguang Wu wrote: > [CC md list] > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: >> On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: >>> Ted, >>> >>> I find ext4 write performance dropped by 3.3% on average in the >>> 3.6-rc1 merge window. xfs and btrfs are fine. >>> >>> Two machines are tested. The performance regression happens in the >>> lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does >>> not see regression, which is equipped with HDD drives. I'll continue >>> to repeat the tests and report variations. >> >> Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- >> fs/ext4 fs/jbd2" and I don't see anything that I would expect would >> cause that. The are the lock elimination changes for Direct I/O >> overwrites, but that shouldn't matter for your tests which are >> measuring buffered writes, correct? >> >> Is there any chance you could do me a favor and do a git bisect >> restricted to commits involving fs/ext4 and fs/jbd2? > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > So it may be some interactions between the RAID/ext4 code? > > I'll try to get some ext2/3 numbers, which should have less changes on the fs side. > > wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} > 3.5.0 3.6.0-rc1+ > > 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 702.41 -0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 > 704.21 +1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > 701.17 +0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 524.61 -0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 524.16 +0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > 470.40 -3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 > 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 Hi, About this issue, I did some investigation. And found we are blocked at get_active_stripes() in most times. It's reasonable, since max_nr_stripes is set to 256 now. It's a kind of small value, thus I tried with different value. Please see the following patch for detailed numbers. The test machine is same as above. From 85c27fca12b770da5bc8ec9f26a22cb414e84c68 Mon Sep 17 00:00:00 2001 From: Yuanhan Liu Date: Wed, 22 Aug 2012 10:51:48 +0800 Subject: [RFC PATCH] md/raid5: increase NR_STRIPES to 1024 Stripe head is a must held resource before doing any IO. And it's limited to 256 by default. With 10dd case, we found that it is blocked at get_active_stripes() in most times(please see the ps output attached). Thus I did some tries with different value set to NR_STRIPS, and here are some numbers(EXT4 only) I got with different NR_STRIPS set: write bandwidth: 3.5.0-rc1-256+: (Here 256 means with max strip head set to 256) write bandwidth: 280 3.5.0-rc1-1024+: write bandwidth: 421 (+50.4%) 3.5.0-rc1-4096+: write bandwidth: 506
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Tue, Aug 21, 2012 at 05:42:21PM +0800, Fengguang Wu wrote: > On Sat, Aug 18, 2012 at 06:44:57AM +1000, NeilBrown wrote: > > On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu > > wrote: > > > > > [CC md list] > > > > > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > > > > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > > > > Ted, > > > > > > > > > > I find ext4 write performance dropped by 3.3% on average in the > > > > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > > > > > > > Two machines are tested. The performance regression happens in the > > > > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > > > > not see regression, which is equipped with HDD drives. I'll continue > > > > > to repeat the tests and report variations. > > > > > > > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > > > > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > > > > cause that. The are the lock elimination changes for Direct I/O > > > > overwrites, but that shouldn't matter for your tests which are > > > > measuring buffered writes, correct? > > > > > > > > Is there any chance you could do me a favor and do a git bisect > > > > restricted to commits involving fs/ext4 and fs/jbd2? > > > > > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > > > So it may be some interactions between the RAID/ext4 code? > > > > I'm aware of some performance regression in RAID5 which I will be drilling > > down into next week. Some things are faster, but some are slower :-( > > > > RAID0 should be unchanged though - I don't think I've changed anything > > there. > > > > Looking at your numbers, JBOD ranges from +6.5% to -1.5% > > RAID0 ranges from +4.0% to -19.2% > > RAID5 ranges from +20.7% to -39.7% > > > > I'm guessing + is good and - is bad? > > Yes. > > > The RAID5 numbers don't surprise me. The RAID0 do. > > You are right. I did more tests and it's now obvious that RAID0 is > mostly fine. The major regressions are in the RAID5 10/100dd cases. > JBOD is performing better in 3.6.0-rc1 :-) > > > > > > > I'll try to get some ext2/3 numbers, which should have less changes on > > > the fs side. > > > > Thanks. That will be useful. > > Here are the more complete results. > >RAID5 ext4100dd-7.3% >RAID5 ext4 10dd-2.2% >RAID5 ext4 1dd +12.1% >RAID5 ext3100dd-3.1% >RAID5 ext3 10dd -11.5% >RAID5 ext3 1dd+8.9% >RAID5 ext2100dd -10.5% >RAID5 ext2 10dd-5.2% >RAID5 ext2 1dd +10.0% >RAID0 ext4100dd+1.7% >RAID0 ext4 10dd-0.9% >RAID0 ext4 1dd-1.1% >RAID0 ext3100dd-4.2% >RAID0 ext3 10dd-0.2% >RAID0 ext3 1dd-1.0% >RAID0 ext2100dd +11.3% >RAID0 ext2 10dd+4.7% >RAID0 ext2 1dd-1.6% > JBOD ext4100dd+5.9% > JBOD ext4 10dd+6.0% > JBOD ext4 1dd+0.6% > JBOD ext3100dd+6.1% > JBOD ext3 10dd+1.9% > JBOD ext3 1dd+1.7% > JBOD ext2100dd+9.9% > JBOD ext2 10dd+9.4% > JBOD ext2 1dd+0.5% And here are the xfs/btrfs results. Very impressive RAID5 improvements! RAID5btrfs100dd +25.8% RAID5btrfs 10dd +21.3% RAID5btrfs 1dd +14.3% RAID5 xfs100dd +32.8% RAID5 xfs 10dd +21.5% RAID5 xfs 1dd +25.2% RAID0btrfs100dd-7.4% RAID0btrfs 10dd-0.2% RAID0btrfs 1dd-2.8% RAID0 xfs100dd +18.8% RAID0 xfs 10dd+0.0% RAID0 xfs 1dd+3.8% JBODbtrfs100dd-0.0% JBODbtrfs 10dd+2.3% JBODbtrfs 1dd-0.1% JBOD xfs100dd+8.3% JBOD xfs 10dd+4.1% JBOD xfs 1dd+0.1% Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Sat, Aug 18, 2012 at 06:44:57AM +1000, NeilBrown wrote: > On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu > wrote: > > > [CC md list] > > > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > > > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > > > Ted, > > > > > > > > I find ext4 write performance dropped by 3.3% on average in the > > > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > > > > > Two machines are tested. The performance regression happens in the > > > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > > > not see regression, which is equipped with HDD drives. I'll continue > > > > to repeat the tests and report variations. > > > > > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > > > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > > > cause that. The are the lock elimination changes for Direct I/O > > > overwrites, but that shouldn't matter for your tests which are > > > measuring buffered writes, correct? > > > > > > Is there any chance you could do me a favor and do a git bisect > > > restricted to commits involving fs/ext4 and fs/jbd2? > > > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > > So it may be some interactions between the RAID/ext4 code? > > I'm aware of some performance regression in RAID5 which I will be drilling > down into next week. Some things are faster, but some are slower :-( > > RAID0 should be unchanged though - I don't think I've changed anything there. > > Looking at your numbers, JBOD ranges from +6.5% to -1.5% > RAID0 ranges from +4.0% to -19.2% > RAID5 ranges from +20.7% to -39.7% > > I'm guessing + is good and - is bad? Yes. > The RAID5 numbers don't surprise me. The RAID0 do. You are right. I did more tests and it's now obvious that RAID0 is mostly fine. The major regressions are in the RAID5 10/100dd cases. JBOD is performing better in 3.6.0-rc1 :-) > > > > I'll try to get some ext2/3 numbers, which should have less changes on the > > fs side. > > Thanks. That will be useful. Here are the more complete results. RAID5 ext4100dd-7.3% RAID5 ext4 10dd-2.2% RAID5 ext4 1dd +12.1% RAID5 ext3100dd-3.1% RAID5 ext3 10dd -11.5% RAID5 ext3 1dd+8.9% RAID5 ext2100dd -10.5% RAID5 ext2 10dd-5.2% RAID5 ext2 1dd +10.0% RAID0 ext4100dd+1.7% RAID0 ext4 10dd-0.9% RAID0 ext4 1dd-1.1% RAID0 ext3100dd-4.2% RAID0 ext3 10dd-0.2% RAID0 ext3 1dd-1.0% RAID0 ext2100dd +11.3% RAID0 ext2 10dd+4.7% RAID0 ext2 1dd-1.6% JBOD ext4100dd+5.9% JBOD ext4 10dd+6.0% JBOD ext4 1dd+0.6% JBOD ext3100dd+6.1% JBOD ext3 10dd+1.9% JBOD ext3 1dd+1.7% JBOD ext2100dd+9.9% JBOD ext2 10dd+9.4% JBOD ext2 1dd+0.5% wfg@bee /export/writeback% ./compare-groups 'RAID5 RAID0 JBOD' 'ext4 ext3 ext2' '100dd 10dd 1dd' lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} RAID5 ext4 100dd 3.5.03.6.0-rc1+ 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 130.42 -21.7% 102.06 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-2-3.5.0 83.45 +10.2%91.96 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-3-3.5.0 105.97 +11.5% 118.12 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-4-3.5.0 71.18 -34.2%46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 52.79+1.1%53.36 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-2-3.5.0 40.75-5.1%38.69 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-3-3.5.0 42.79 +14.5%48.99 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-4-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 176.21 +11.3% 196.16 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-2-3.5.0 158.12+3.7% 163.99 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-3-3.5.0 180.18+6.4% 191.74 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-4-3.5.0 1419.08-7.3% 1314.88 TOTAL write_bw RAID5 ext4 10dd 3.5.03.6.0-rc1+ 243.67-9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On 8/22/12 11:57 AM, Yuanhan Liu wrote: On Fri, Aug 17, 2012 at 10:25:26PM +0800, Fengguang Wu wrote: [CC md list] On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: Ted, I find ext4 write performance dropped by 3.3% on average in the 3.6-rc1 merge window. xfs and btrfs are fine. Two machines are tested. The performance regression happens in the lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does not see regression, which is equipped with HDD drives. I'll continue to repeat the tests and report variations. Hmm... I've checked out the commits in git log v3.5..v3.6-rc1 -- fs/ext4 fs/jbd2 and I don't see anything that I would expect would cause that. The are the lock elimination changes for Direct I/O overwrites, but that shouldn't matter for your tests which are measuring buffered writes, correct? Is there any chance you could do me a favor and do a git bisect restricted to commits involving fs/ext4 and fs/jbd2? I noticed that the regressions all happen in the RAID0/RAID5 cases. So it may be some interactions between the RAID/ext4 code? I'll try to get some ext2/3 numbers, which should have less changes on the fs side. wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 702.41 -0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 704.21 +1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 701.17 +0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 524.61 -0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 524.16 +0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 470.40 -3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 Hi, About this issue, I did some investigation. And found we are blocked at get_active_stripes() in most times. It's reasonable, since max_nr_stripes is set to 256 now. It's a kind of small value, thus I tried with different value. Please see the following patch for detailed numbers. The test machine is same as above. From 85c27fca12b770da5bc8ec9f26a22cb414e84c68 Mon Sep 17 00:00:00 2001 From: Yuanhan Liu yuanhan@linux.intel.com Date: Wed, 22 Aug 2012 10:51:48 +0800 Subject: [RFC PATCH] md/raid5: increase NR_STRIPES to 1024 Stripe head is a must held resource before doing any IO. And it's limited to 256 by default. With 10dd case, we found that it is blocked at get_active_stripes() in most times(please see the ps output attached). Thus I did some tries with different value set to NR_STRIPS, and here are some numbers(EXT4 only) I got with different NR_STRIPS set: write bandwidth: 3.5.0-rc1-256+: (Here 256 means with max strip head set to 256) write bandwidth: 280 3.5.0-rc1-1024+: write bandwidth: 421 (+50.4%) 3.5.0-rc1-4096+: write bandwidth: 506 (+80.7%) 3.5.0-rc1-32768+: write bandwidth: 615 (+119.6%) (Here 'sh'
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Sat, Aug 18, 2012 at 06:44:57AM +1000, NeilBrown wrote: On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu fengguang...@intel.com wrote: [CC md list] On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: Ted, I find ext4 write performance dropped by 3.3% on average in the 3.6-rc1 merge window. xfs and btrfs are fine. Two machines are tested. The performance regression happens in the lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does not see regression, which is equipped with HDD drives. I'll continue to repeat the tests and report variations. Hmm... I've checked out the commits in git log v3.5..v3.6-rc1 -- fs/ext4 fs/jbd2 and I don't see anything that I would expect would cause that. The are the lock elimination changes for Direct I/O overwrites, but that shouldn't matter for your tests which are measuring buffered writes, correct? Is there any chance you could do me a favor and do a git bisect restricted to commits involving fs/ext4 and fs/jbd2? I noticed that the regressions all happen in the RAID0/RAID5 cases. So it may be some interactions between the RAID/ext4 code? I'm aware of some performance regression in RAID5 which I will be drilling down into next week. Some things are faster, but some are slower :-( RAID0 should be unchanged though - I don't think I've changed anything there. Looking at your numbers, JBOD ranges from +6.5% to -1.5% RAID0 ranges from +4.0% to -19.2% RAID5 ranges from +20.7% to -39.7% I'm guessing + is good and - is bad? Yes. The RAID5 numbers don't surprise me. The RAID0 do. You are right. I did more tests and it's now obvious that RAID0 is mostly fine. The major regressions are in the RAID5 10/100dd cases. JBOD is performing better in 3.6.0-rc1 :-) I'll try to get some ext2/3 numbers, which should have less changes on the fs side. Thanks. That will be useful. Here are the more complete results. RAID5 ext4100dd-7.3% RAID5 ext4 10dd-2.2% RAID5 ext4 1dd +12.1% RAID5 ext3100dd-3.1% RAID5 ext3 10dd -11.5% RAID5 ext3 1dd+8.9% RAID5 ext2100dd -10.5% RAID5 ext2 10dd-5.2% RAID5 ext2 1dd +10.0% RAID0 ext4100dd+1.7% RAID0 ext4 10dd-0.9% RAID0 ext4 1dd-1.1% RAID0 ext3100dd-4.2% RAID0 ext3 10dd-0.2% RAID0 ext3 1dd-1.0% RAID0 ext2100dd +11.3% RAID0 ext2 10dd+4.7% RAID0 ext2 1dd-1.6% JBOD ext4100dd+5.9% JBOD ext4 10dd+6.0% JBOD ext4 1dd+0.6% JBOD ext3100dd+6.1% JBOD ext3 10dd+1.9% JBOD ext3 1dd+1.7% JBOD ext2100dd+9.9% JBOD ext2 10dd+9.4% JBOD ext2 1dd+0.5% wfg@bee /export/writeback% ./compare-groups 'RAID5 RAID0 JBOD' 'ext4 ext3 ext2' '100dd 10dd 1dd' lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} RAID5 ext4 100dd 3.5.03.6.0-rc1+ 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 130.42 -21.7% 102.06 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-2-3.5.0 83.45 +10.2%91.96 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-3-3.5.0 105.97 +11.5% 118.12 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-4-3.5.0 71.18 -34.2%46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 52.79+1.1%53.36 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-2-3.5.0 40.75-5.1%38.69 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-3-3.5.0 42.79 +14.5%48.99 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-4-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 176.21 +11.3% 196.16 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-2-3.5.0 158.12+3.7% 163.99 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-3-3.5.0 180.18+6.4% 191.74 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-4-3.5.0 1419.08-7.3% 1314.88 TOTAL write_bw RAID5 ext4 10dd 3.5.03.6.0-rc1+ 243.67-9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 212.84 +16.7% 248.39
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Tue, Aug 21, 2012 at 05:42:21PM +0800, Fengguang Wu wrote: On Sat, Aug 18, 2012 at 06:44:57AM +1000, NeilBrown wrote: On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu fengguang...@intel.com wrote: [CC md list] On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: Ted, I find ext4 write performance dropped by 3.3% on average in the 3.6-rc1 merge window. xfs and btrfs are fine. Two machines are tested. The performance regression happens in the lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does not see regression, which is equipped with HDD drives. I'll continue to repeat the tests and report variations. Hmm... I've checked out the commits in git log v3.5..v3.6-rc1 -- fs/ext4 fs/jbd2 and I don't see anything that I would expect would cause that. The are the lock elimination changes for Direct I/O overwrites, but that shouldn't matter for your tests which are measuring buffered writes, correct? Is there any chance you could do me a favor and do a git bisect restricted to commits involving fs/ext4 and fs/jbd2? I noticed that the regressions all happen in the RAID0/RAID5 cases. So it may be some interactions between the RAID/ext4 code? I'm aware of some performance regression in RAID5 which I will be drilling down into next week. Some things are faster, but some are slower :-( RAID0 should be unchanged though - I don't think I've changed anything there. Looking at your numbers, JBOD ranges from +6.5% to -1.5% RAID0 ranges from +4.0% to -19.2% RAID5 ranges from +20.7% to -39.7% I'm guessing + is good and - is bad? Yes. The RAID5 numbers don't surprise me. The RAID0 do. You are right. I did more tests and it's now obvious that RAID0 is mostly fine. The major regressions are in the RAID5 10/100dd cases. JBOD is performing better in 3.6.0-rc1 :-) I'll try to get some ext2/3 numbers, which should have less changes on the fs side. Thanks. That will be useful. Here are the more complete results. RAID5 ext4100dd-7.3% RAID5 ext4 10dd-2.2% RAID5 ext4 1dd +12.1% RAID5 ext3100dd-3.1% RAID5 ext3 10dd -11.5% RAID5 ext3 1dd+8.9% RAID5 ext2100dd -10.5% RAID5 ext2 10dd-5.2% RAID5 ext2 1dd +10.0% RAID0 ext4100dd+1.7% RAID0 ext4 10dd-0.9% RAID0 ext4 1dd-1.1% RAID0 ext3100dd-4.2% RAID0 ext3 10dd-0.2% RAID0 ext3 1dd-1.0% RAID0 ext2100dd +11.3% RAID0 ext2 10dd+4.7% RAID0 ext2 1dd-1.6% JBOD ext4100dd+5.9% JBOD ext4 10dd+6.0% JBOD ext4 1dd+0.6% JBOD ext3100dd+6.1% JBOD ext3 10dd+1.9% JBOD ext3 1dd+1.7% JBOD ext2100dd+9.9% JBOD ext2 10dd+9.4% JBOD ext2 1dd+0.5% And here are the xfs/btrfs results. Very impressive RAID5 improvements! RAID5btrfs100dd +25.8% RAID5btrfs 10dd +21.3% RAID5btrfs 1dd +14.3% RAID5 xfs100dd +32.8% RAID5 xfs 10dd +21.5% RAID5 xfs 1dd +25.2% RAID0btrfs100dd-7.4% RAID0btrfs 10dd-0.2% RAID0btrfs 1dd-2.8% RAID0 xfs100dd +18.8% RAID0 xfs 10dd+0.0% RAID0 xfs 1dd+3.8% JBODbtrfs100dd-0.0% JBODbtrfs 10dd+2.3% JBODbtrfs 1dd-0.1% JBOD xfs100dd+8.3% JBOD xfs 10dd+4.1% JBOD xfs 1dd+0.1% Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu wrote: > [CC md list] > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > > Ted, > > > > > > I find ext4 write performance dropped by 3.3% on average in the > > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > > > Two machines are tested. The performance regression happens in the > > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > > not see regression, which is equipped with HDD drives. I'll continue > > > to repeat the tests and report variations. > > > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > > cause that. The are the lock elimination changes for Direct I/O > > overwrites, but that shouldn't matter for your tests which are > > measuring buffered writes, correct? > > > > Is there any chance you could do me a favor and do a git bisect > > restricted to commits involving fs/ext4 and fs/jbd2? > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > So it may be some interactions between the RAID/ext4 code? I'm aware of some performance regression in RAID5 which I will be drilling down into next week. Some things are faster, but some are slower :-( RAID0 should be unchanged though - I don't think I've changed anything there. Looking at your numbers, JBOD ranges from +6.5% to -1.5% RAID0 ranges from +4.0% to -19.2% RAID5 ranges from +20.7% to -39.7% I'm guessing + is good and - is bad? The RAID5 numbers don't surprise me. The RAID0 do. > > I'll try to get some ext2/3 numbers, which should have less changes on the fs > side. Thanks. That will be useful. NeilBrown > > wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} > >3.5.03.6.0-rc1+ > > 720.62-1.5% 710.16 > lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 706.04-0.0% 705.86 > lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 702.86-0.2% 701.74 > lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 702.41-0.0% 702.06 > lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 779.52+6.5% 830.11 > lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 646.70+4.9% 678.59 > lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 704.49+2.6% 723.00 > lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 > 704.21+1.2% 712.47 > lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 705.26-1.2% 696.61 > lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 703.37+0.1% 703.76 > lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > 701.66-0.1% 700.83 > lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > 701.17+0.0% 701.36 > lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > 675.08 -10.5% 604.29 > lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 676.52-2.7% 658.38 > lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 512.70+4.0% 533.22 > lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 524.61-0.3% 522.90 > lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 709.76 -15.7% 598.44 > lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 681.39-2.1% 667.25 > lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 524.16+0.8% 528.25 > lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 699.77 -19.2% 565.54 > lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 675.79-1.9% 663.17 > lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > 484.84-7.4% 448.83 > lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > 470.40-3.2% 455.31 > lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > 167.97 -38.7% 103.03 > lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 243.67-9.1% 221.41 > lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 248.98 +12.2% 279.33 > lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 208.45 +14.1% 237.86 > lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 >71.18
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Fri, Aug 17, 2012 at 11:13:18PM +0800, Fengguang Wu wrote: > > Obviously the major regressions happen to the 100dd over raid cases. > Some 10dd cases are also impacted. > > The attached graphs show that everything becomes more fluctuated in > 3.6.0-rc1 for the lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1 case. Hmm... I'm not seeing any differences in the block allocation code, or in ext4's buffered writeback code paths, which would be the most likely cause of such problems. Maybe a quick eyeball of the blktrace to see if we're doing something pathalogically stupid? You could also try running a filefrag -v on a few of the dd files to see if there's any significant difference, although as I said, there doesn't look like there was any significant changes in the block allocation code between v3.5 and v3.6-rc1 --- although I suppose changes in timeing could have have caused the block allocation decisions to be different, so it's worth checking that out. Thanks, regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
[CC md list] On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > Ted, > > > > I find ext4 write performance dropped by 3.3% on average in the > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > Two machines are tested. The performance regression happens in the > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > not see regression, which is equipped with HDD drives. I'll continue > > to repeat the tests and report variations. > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > cause that. The are the lock elimination changes for Direct I/O > overwrites, but that shouldn't matter for your tests which are > measuring buffered writes, correct? > > Is there any chance you could do me a favor and do a git bisect > restricted to commits involving fs/ext4 and fs/jbd2? I noticed that the regressions all happen in the RAID0/RAID5 cases. So it may be some interactions between the RAID/ext4 code? I'll try to get some ext2/3 numbers, which should have less changes on the fs side. wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.03.6.0-rc1+ 720.62-1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 706.04-0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 702.86-0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 702.41-0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 779.52+6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 646.70+4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 704.49+2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 704.21+1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 705.26-1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 703.37+0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 701.66-0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 701.17+0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 676.52-2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 512.70+4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 524.61-0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 681.39-2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 524.16+0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 675.79-1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 484.84-7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 470.40-3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 243.67-9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 71.18 -34.2%46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 145.84-7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 255.22+6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 214.25+5.6% 226.32 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-1-3.5.0 207.16 +13.4% 234.98
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
[CC md list] On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: Ted, I find ext4 write performance dropped by 3.3% on average in the 3.6-rc1 merge window. xfs and btrfs are fine. Two machines are tested. The performance regression happens in the lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does not see regression, which is equipped with HDD drives. I'll continue to repeat the tests and report variations. Hmm... I've checked out the commits in git log v3.5..v3.6-rc1 -- fs/ext4 fs/jbd2 and I don't see anything that I would expect would cause that. The are the lock elimination changes for Direct I/O overwrites, but that shouldn't matter for your tests which are measuring buffered writes, correct? Is there any chance you could do me a favor and do a git bisect restricted to commits involving fs/ext4 and fs/jbd2? I noticed that the regressions all happen in the RAID0/RAID5 cases. So it may be some interactions between the RAID/ext4 code? I'll try to get some ext2/3 numbers, which should have less changes on the fs side. wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.03.6.0-rc1+ 720.62-1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 706.04-0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 702.86-0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 702.41-0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 779.52+6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 646.70+4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 704.49+2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 704.21+1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 705.26-1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 703.37+0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 701.66-0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 701.17+0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 676.52-2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 512.70+4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 524.61-0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 681.39-2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 524.16+0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 675.79-1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 484.84-7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 470.40-3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 243.67-9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 71.18 -34.2%46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 145.84-7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 255.22+6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 214.25+5.6% 226.32 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-1-3.5.0 207.16 +13.4% 234.98 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-2-3.5.0
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Fri, Aug 17, 2012 at 11:13:18PM +0800, Fengguang Wu wrote: Obviously the major regressions happen to the 100dd over raid cases. Some 10dd cases are also impacted. The attached graphs show that everything becomes more fluctuated in 3.6.0-rc1 for the lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1 case. Hmm... I'm not seeing any differences in the block allocation code, or in ext4's buffered writeback code paths, which would be the most likely cause of such problems. Maybe a quick eyeball of the blktrace to see if we're doing something pathalogically stupid? You could also try running a filefrag -v on a few of the dd files to see if there's any significant difference, although as I said, there doesn't look like there was any significant changes in the block allocation code between v3.5 and v3.6-rc1 --- although I suppose changes in timeing could have have caused the block allocation decisions to be different, so it's worth checking that out. Thanks, regards, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu fengguang...@intel.com wrote: [CC md list] On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: Ted, I find ext4 write performance dropped by 3.3% on average in the 3.6-rc1 merge window. xfs and btrfs are fine. Two machines are tested. The performance regression happens in the lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does not see regression, which is equipped with HDD drives. I'll continue to repeat the tests and report variations. Hmm... I've checked out the commits in git log v3.5..v3.6-rc1 -- fs/ext4 fs/jbd2 and I don't see anything that I would expect would cause that. The are the lock elimination changes for Direct I/O overwrites, but that shouldn't matter for your tests which are measuring buffered writes, correct? Is there any chance you could do me a favor and do a git bisect restricted to commits involving fs/ext4 and fs/jbd2? I noticed that the regressions all happen in the RAID0/RAID5 cases. So it may be some interactions between the RAID/ext4 code? I'm aware of some performance regression in RAID5 which I will be drilling down into next week. Some things are faster, but some are slower :-( RAID0 should be unchanged though - I don't think I've changed anything there. Looking at your numbers, JBOD ranges from +6.5% to -1.5% RAID0 ranges from +4.0% to -19.2% RAID5 ranges from +20.7% to -39.7% I'm guessing + is good and - is bad? The RAID5 numbers don't surprise me. The RAID0 do. I'll try to get some ext2/3 numbers, which should have less changes on the fs side. Thanks. That will be useful. NeilBrown wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.03.6.0-rc1+ 720.62-1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 706.04-0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 702.86-0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 702.41-0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 779.52+6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 646.70+4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 704.49+2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 704.21+1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 705.26-1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 703.37+0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 701.66-0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 701.17+0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 676.52-2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 512.70+4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 524.61-0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 681.39-2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 524.16+0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 675.79-1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 484.84-7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 470.40-3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 243.67-9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 71.18 -34.2%46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0