XFS alignment

Greg Freemyer Tue, 29 Jan 2008 12:44:27 -0800

On Jan 29, 2008 3:05 PM, Ciro Iriarte <[EMAIL PROTECTED]> wrote:
> 2008/1/28, Greg Freemyer <[EMAIL PROTECTED]>:
> > On Jan 28, 2008 6:41 PM, Ciro Iriarte <[EMAIL PROTECTED]> wrote:
> > >
> > Ok, I guess you know reads are not significantly impacted by the
> > tuning were talking about.  This is mostly about tuning for raid5
> > write performance.
> >
> > Anyway, are you planning to stripe together multiple md5 arrays via
> > LVM?  I believe that is what --stripes and --stripesize are for.  (ie.
> > If you have 8 drives, you could create 2 raid5 arrays, and use LVM to
> > interleave them by using --stripes = 2.)  I've never used that
> > feature.
> >
> > You need to worry about the vg extents.  I think vgcreate
> > --physicalextentsize is what you need to tune.  I would make each
> > extent an even number of stripes in size.  ie. 768KB * N.  Maybe use
> > N=10, so -s 7680K
> >
> > Assuming your not using lvm strips and since this appears to be a new
> > setup, I would also use -C or --contiguous to ensure all the data is
> > sequential.  It maybe overkill, but it will further ensure you _avoid_
> > LV extents that don't end on a stripe boundary.  (a stripe == 3 raid5
> > chunks for you).
> >
> > Then if you are going to use the snapshot feature, you need to set
> > your chunksize efficiently.  If you only are going to have large
> > files, then I would use a large LVM snapshot chunksize.  256KB seems
> > like a good choice, but I have not benchmarked snapshot chunksizes.
> >
> > Greg
> > --
>
> Just for the record, dealing with a bug that made the raid hang, found
> a workaround that also gave me performance boost: "echo 4096 >
> /sys/block/md2/md/stripe_cache_size"
>
> Result:
>
> mainwks:~ # dd if=/dev/zero bs=1024k count=1000 of=/datos/test
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1,0 GB) copied, 6,78341 s, 155 MB/s
>
> mainwks:~ # rm /datos/test
>
> mainwks:~ # dd if=/dev/zero bs=1024k count=20000 of=/datos/test
> 20000+0 records in
> 20000+0 records out
> 20971520000 bytes (21 GB) copied, 199,135 s, 105 MB/s
>
> Ciro
>
Ciro,


105 MB/s seems strange to me.  I would have expected 75 MB/s or 225MB/ s

ie. For normal non-full stripe i/o, it should be 75MB/s * 4 / 4.
Where 75MB/sec is what I see for one drive typically, the first 4 is
the number of drives that can be doing parallel i/o and the second 4
is the number of i/o's per write.

ie. When you do a non-full stripe write, the kernel has to read the
old checksum.  read the old chunk data, recalc the checksum, write the
new chunk data, write the checksum.

Out of curiosity, on the dd line, do you get better performance if you
set your blocksize to exactly one stripe?  ie. 3x 256KB = 768KB
stripe.   I've read the Linux's raid5 implementation is optimized to
handle full stripe write's.

ie. Writing 3 chunks produces:  Calc new checksum from all new data,
Write d1, d2, d3, p so to get 3 256KB chunks to the drive, the kernel
ends up invoking 4 256KB writes.

Or 75 MB/s * 4 * 3 / 4 = 225 MB / sec

If you have everything optimized, I think you should see the same
performance with a 2-stripe write.  ie. 6x 256KB.  If your
optimization is wrong, you will see a speed improvement because the
alignment between your writes and stripes will be wrong.  With the
bigger write, you will be guaranteed at least one full stripe write.

Thanks
Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [opensuse] Raid5/LVM2/XFS alignment

Reply via email to