date:20080116

Re: The SX4 challenge

2008-01-16 Thread Mark Lord


Jeff Garzik wrote:
..
Thus, the "SX4 challenge" is a challenge to developers to figure out the 
most optimal configuration for this hardware, given the existing MD and 
DM work going on.

..

This sort of RAID optimization hardware is not unique to the SX4,
so hopefully we can work out a way to take advantage of similar/different
RAID throughput features of other chipsets too (eventually).

This could be a good topic for discussion/beer in San Jose next month..

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

The SX4 challenge

2008-01-16 Thread Jeff Garzik



Promise just gave permission to post the docs for their PDC20621 (i.e. 
SX4) hardware:

http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-1.2.pdf.bz2

joining the existing PDC20621 DIMM and PLL docs:
http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-dimm-1.6.pdf.bz2
http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-pll-ata-timing-1.2.pdf.bz2


So, the SX4 is now open.  Yay :)  I am hoping to talk Mikael into 
becoming the sata_sx4 maintainer, and finally integrating my 'new-eh' 
conversion in libata-dev.git.


But now is a good time to remind people how lame the sata_sx4 driver 
software really is -- and I should know, I wrote it.


The SX4 hardware, simplified, is three pieces:  XOR engine (for raid5), 
host<->board memcpy engine, and several ATA engines (and some helpful 
transaction sequencing features).  Data for each WRITE command is first 
copied to the board RAM, then the ATA engines DMA to/from the board RAM. 
 Data for each READ command is copied to board RAM via the ATA engines, 
then DMA'd across PCI to your host memory.


Therefore, while it is not hardware RAID, the SX4 provides all the 
pieces necessary to offload RAID1 and RAID5, and handle other RAID 
levels optimally.  RAID1 and 5 copies can be offloaded (provided all 
copies go to SX4-attached devices of course).  RAID5 XOR gen and 
checking can be offloaded, allowing the OS to see a single request, 
while the hardware processes a sequence of low-level requests sent in a 
batch.


This hardware presents an interesting challenge:  it does not really fit 
into software RAID (i.e. no RAID) /or/ hardware RAID categories.  The 
sata_sx4 driver presents the no-RAID configuration, while is terribly 
inefficient:


WRITE:
submit host DMA (copy to board)
host DMA completion via interrupt
submit ATA command
ATA command completion via interrupt
READ:
submit ATA command
ATA command completion via interrupt
submit host DMA (copy from board)
host DMA completion via interrupt

Thus, the "SX4 challenge" is a challenge to developers to figure out the 
most optimal configuration for this hardware, given the existing MD and 
DM work going on.


Now, it must be noted that the SX4 is not current-gen technology.  Most 
vendors have moved towards an "IOP" model, where the hw vendor puts most 
of their hard work into an ARM/MIPS firmware, running on an embedded 
chip specially tuned for storage purposes.  (ref "hptiop" and "stex" 
drivers, very very small SCSI drivers)


I know Dan Williams @ Intel is working on very similar issues on the IOP 
-- async memcpy, XOR offload, etc. -- and I am hoping that, due to that 
current work, some of the good ideas can be reused with the SX4.


Anyway...  it's open, it's interesting, even if it's not current-gen 
tech anymore.  You can probably find them on Ebay or in an 
out-of-the-way computer shop somewhere.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How do I get rid of old device?

2008-01-16 Thread Justin Piszcz




On Thu, 17 Jan 2008, Neil Brown wrote:


On Wednesday January 16, [EMAIL PROTECTED] wrote:

p34:~# mdadm /dev/md3 --zero-superblock
p34:~# mdadm --examine --scan
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=f463057c:9a696419:3bcb794a:7aaa12b2
ARRAY /dev/md1 level=raid1 num-devices=2
UUID=98e4948c:c6685f82:e082fd95:e7f45529
ARRAY /dev/md2 level=raid1 num-devices=2
UUID=330c9879:73af7d3e:57f4c139:f9191788
ARRAY /dev/md3 level=raid0 num-devices=10
UUID=6dc12c36:b3517ff9:083fb634:68e9eb49
p34:~#

I cannot seem to get rid of /dev/md3, its almost as if there is a piece of
it on the root (2) disks or reference to it?

I also dd'd the other 10 disks (non-root) and /dev/md3 persists.


You don't zero the superblock on the array device, because the array
device does not have a superblock.  The component devices have the
superblock.

So
 mdadm --zero-superblock /dev/sd*
or whatever.
Maybe
 mdadm --examine --scan -v

then get the list of devices it found for the array you want to kill,
and  --zero-superblock that list.

NeilBrown



Thanks, will keep this in mind for the future-- I just checked and the 
dd's have finished and there is no longer a /dev/md3, but mdadm 
--zero-superblock /dev/sd[c-l] would have been much easier.


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How do I get rid of old device?

2008-01-16 Thread Neil Brown

On Wednesday January 16, [EMAIL PROTECTED] wrote:
> p34:~# mdadm /dev/md3 --zero-superblock
> p34:~# mdadm --examine --scan
> ARRAY /dev/md0 level=raid1 num-devices=2 
> UUID=f463057c:9a696419:3bcb794a:7aaa12b2
> ARRAY /dev/md1 level=raid1 num-devices=2 
> UUID=98e4948c:c6685f82:e082fd95:e7f45529
> ARRAY /dev/md2 level=raid1 num-devices=2 
> UUID=330c9879:73af7d3e:57f4c139:f9191788
> ARRAY /dev/md3 level=raid0 num-devices=10 
> UUID=6dc12c36:b3517ff9:083fb634:68e9eb49
> p34:~#
> 
> I cannot seem to get rid of /dev/md3, its almost as if there is a piece of 
> it on the root (2) disks or reference to it?
> 
> I also dd'd the other 10 disks (non-root) and /dev/md3 persists.

You don't zero the superblock on the array device, because the array
device does not have a superblock.  The component devices have the
superblock.

So
  mdadm --zero-superblock /dev/sd*
or whatever.
Maybe
  mdadm --examine --scan -v

then get the list of devices it found for the array you want to kill,
and  --zero-superblock that list.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How do I get rid of old device?

2008-01-16 Thread Justin Piszcz



On Wed, 16 Jan 2008, Justin Piszcz wrote:


p34:~# mdadm /dev/md3 --zero-superblock
p34:~# mdadm --examine --scan
ARRAY /dev/md0 level=raid1 num-devices=2 
UUID=f463057c:9a696419:3bcb794a:7aaa12b2
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=98e4948c:c6685f82:e082fd95:e7f45529
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=330c9879:73af7d3e:57f4c139:f9191788
ARRAY /dev/md3 level=raid0 num-devices=10 
UUID=6dc12c36:b3517ff9:083fb634:68e9eb49

p34:~#

I cannot seem to get rid of /dev/md3, its almost as if there is a piece of it 
on the root (2) disks or reference to it?


I also dd'd the other 10 disks (non-root) and /dev/md3 persists.




Hopefully this will clear it out:

p34:~# for i in /dev/sd[c-l]; do /usr/bin/time dd if=/dev/zero of=$i bs=1M 
&  done

[1] 4625
[2] 4626
[3] 4627
[4] 4628
[5] 4629
[6] 4630
[7] 4631
[8] 4632
[9] 4633
[10] 4634
p34:~#

Good aggregate bandwidth at least writing to all 10 disks.

procs ---memory-- ---swap-- -io -system--cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  9  0  46472 7201008  7342400 0 658756 2339 2242  0 22 24 54
 3 10  0  44132 7204680  7329200 0 660040 2335 2276  0 22 19 59
 5  8  0  48196 7201840  7373600 0 652708 2403 1645  0 23 11 66
 2  9  0  45728 7205036  7262800 0 659844 2296 1891  0 23 11 66
 0 11  0  47672 7202992  7256400 0 672856 2327 1616  0 22  7 71

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

How do I get rid of old device?

2008-01-16 Thread Justin Piszcz


p34:~# mdadm /dev/md3 --zero-superblock
p34:~# mdadm --examine --scan
ARRAY /dev/md0 level=raid1 num-devices=2 
UUID=f463057c:9a696419:3bcb794a:7aaa12b2
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=98e4948c:c6685f82:e082fd95:e7f45529
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=330c9879:73af7d3e:57f4c139:f9191788
ARRAY /dev/md3 level=raid0 num-devices=10 
UUID=6dc12c36:b3517ff9:083fb634:68e9eb49

p34:~#

I cannot seem to get rid of /dev/md3, its almost as if there is a piece of 
it on the root (2) disks or reference to it?


I also dd'd the other 10 disks (non-root) and /dev/md3 persists.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Justin Piszcz




On Thu, 17 Jan 2008, Al Boldi wrote:


Justin Piszcz wrote:

On Wed, 16 Jan 2008, Al Boldi wrote:

Also, can you retest using dd with different block-sizes?


I can do this, moment..


I know about oflag=direct but I choose to use dd with sync and measure the
total time it takes.
/usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
of=/r1/bigfile bs=1M count=10240; sync'

So I was asked on the mailing list to test dd with various chunk sizes,
here is the length of time it took
to write 10 GiB and sync per each chunk size:

4=chunk.txt:0:25.46
8=chunk.txt:0:25.63
16=chunk.txt:0:25.26
32=chunk.txt:0:25.08
64=chunk.txt:0:25.55
128=chunk.txt:0:25.26
256=chunk.txt:0:24.72
512=chunk.txt:0:24.71
1024=chunk.txt:0:25.40
2048=chunk.txt:0:25.71
4096=chunk.txt:0:27.18
8192=chunk.txt:0:29.00
16384=chunk.txt:0:31.43
32768=chunk.txt:0:50.11
65536=chunk.txt:2:20.80


What do you get with bs=512,1k,2k,4k,8k,16k...


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Done testing for now, but I did test with 256k with a 256k chunk and 
obviously that got good results, just like 1m with a 1mb chunk, 460-480 
MiB/s.


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Justin Piszcz




On Thu, 17 Jan 2008, Al Boldi wrote:


Justin Piszcz wrote:

On Wed, 16 Jan 2008, Al Boldi wrote:

Also, can you retest using dd with different block-sizes?


I can do this, moment..


I know about oflag=direct but I choose to use dd with sync and measure the
total time it takes.
/usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
of=/r1/bigfile bs=1M count=10240; sync'

So I was asked on the mailing list to test dd with various chunk sizes,
here is the length of time it took
to write 10 GiB and sync per each chunk size:

4=chunk.txt:0:25.46
8=chunk.txt:0:25.63
16=chunk.txt:0:25.26
32=chunk.txt:0:25.08
64=chunk.txt:0:25.55
128=chunk.txt:0:25.26
256=chunk.txt:0:24.72
512=chunk.txt:0:24.71
1024=chunk.txt:0:25.40
2048=chunk.txt:0:25.71
4096=chunk.txt:0:27.18
8192=chunk.txt:0:29.00
16384=chunk.txt:0:31.43
32768=chunk.txt:0:50.11
65536=chunk.txt:2:20.80


What do you get with bs=512,1k,2k,4k,8k,16k...


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



root  4621  0.0  0.0  12404   760 pts/2D+   17:53   0:00 mdadm -S 
/dev/md3
root  4664  0.0  0.0   4264   728 pts/5S+   17:54   0:00 grep D

Tried to stop it when it was re-syncing, DEADLOCK :(

[  305.464904] md: md3 still in use.
[  314.595281] md: md_do_sync() got signal ... exiting

Anyhow, done testing, time to move data back on if I can kill the resync 
process w/out deadlock.


Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-16 Thread Neil Brown

On Tuesday January 15, [EMAIL PROTECTED] wrote:
> On Wed, 16 Jan 2008 00:09:31 -0700 "Dan Williams" <[EMAIL PROTECTED]> wrote:
> 
> > > heheh.
> > >
> > > it's really easy to reproduce the hang without the patch -- i could
> > > hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
> > > i'll try with ext3... Dan's experiences suggest it won't happen with ext3
> > > (or is even more rare), which would explain why this has is overall a
> > > rare problem.
> > >
> > 
> > Hmmm... how rare?
> > 
> > http://marc.info/?l=linux-kernel&m=119461747005776&w=2
> > 
> > There is nothing specific that prevents other filesystems from hitting
> > it, perhaps XFS is just better at submitting large i/o's.  -stable
> > should get some kind of treatment.  I'll take altered performance over
> > a hung system.
> 
> We can always target 2.6.25-rc1 then 2.6.24.1 if Neil is still feeling
> wimpy.

I am feeling wimpy.  There've been a few too many raid5 breakages
recently and it is very hard to really judge the performance impact of
this change.  I even have a small uncertainty of correctness - could
it still hang in some other way?  I don't think so, but this is
complex code...

If it were really common I would have expected more noise on the
mailing list.  Sure, there has been some, but not much.  However maybe
people are searching the archives and finding the "increase stripe
cache size" trick, and not reporting anything  seems unlikely
though.

How about we queue it for 2.6.25-rc1 and then about when -rc2 comes
out, we queue it for 2.6.24.y?  Any one (or any distro) that really
needs it can of course grab the patch them selves...

??

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Al Boldi

Justin Piszcz wrote:
> On Wed, 16 Jan 2008, Al Boldi wrote:
> > > Also, can you retest using dd with different block-sizes?
>
> I can do this, moment..
>
>
> I know about oflag=direct but I choose to use dd with sync and measure the
> total time it takes.
> /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero
> of=/r1/bigfile bs=1M count=10240; sync'
>
> So I was asked on the mailing list to test dd with various chunk sizes,
> here is the length of time it took
> to write 10 GiB and sync per each chunk size:
>
> 4=chunk.txt:0:25.46
> 8=chunk.txt:0:25.63
> 16=chunk.txt:0:25.26
> 32=chunk.txt:0:25.08
> 64=chunk.txt:0:25.55
> 128=chunk.txt:0:25.26
> 256=chunk.txt:0:24.72
> 512=chunk.txt:0:24.71
> 1024=chunk.txt:0:25.40
> 2048=chunk.txt:0:25.71
> 4096=chunk.txt:0:27.18
> 8192=chunk.txt:0:29.00
> 16384=chunk.txt:0:31.43
> 32768=chunk.txt:0:50.11
> 65536=chunk.txt:2:20.80

What do you get with bs=512,1k,2k,4k,8k,16k...


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(no subject)

2008-01-16 Thread Jed Davidow


unsubscribe linux-raid
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Justin Piszcz


On Wed, 16 Jan 2008, Greg Cormier wrote:


What sort of tools are you using to get these benchmarks, and can I
used them for ext3?

Very interested in running this on my server.


Thanks,
Greg



You can use whatever suits you, such as untar kernel source tree, copy files, 
untar backups, etc--, you should benchmark specifically what *your* workload is.

Here is the skeleton, using bash:: (don't forget to turn off the cron 
daemon)


for i in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
do
  cd /
  umount /r1
  mdadm -S /dev/md3
  mdadm --create --assume-clean --verbose /dev/md3 --level=5 --raid-devices=10 
--chunk=$i --run /dev/sd[c-l]1

  /etc/init.d/oraid.sh # to optimize my raid stuff

  mkfs.xfs -f /dev/md3
  mount /dev/md3 /r1 -o logbufs=8,logbsize=262144

  # then simply add what you do often here
  # everyone's workload is different
  /usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero of=/r1/bigfile 
bs=1M count=10240; sync'
done

Then just, grep : /root/*chunk* | sort -n to get the results in the same format.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Greg Cormier

What sort of tools are you using to get these benchmarks, and can I
used them for ext3?

Very interested in running this on my server.


Thanks,
Greg

On Jan 16, 2008 11:13 AM, Justin Piszcz <[EMAIL PROTECTED]> wrote:
> For these benchmarks I timed how long it takes to extract a standard 4.4
> GiB DVD:
>
> Settings: Software RAID 5 with the following settings (until I change
> those too):
>
> Base setup:
> blockdev --setra 65536 /dev/md3
> echo 16384 > /sys/block/md3/md/stripe_cache_size
> echo "Disabling NCQ on all disks..."
> for i in $DISKS
> do
>echo "Disabling NCQ on $i"
>echo 1 > /sys/block/"$i"/device/queue_depth
> done
>
> p34:~# grep : *chunk* |sort -n
> 4-chunk.txt:0:45.31
> 8-chunk.txt:0:44.32
> 16-chunk.txt:0:41.02
> 32-chunk.txt:0:40.50
> 64-chunk.txt:0:40.88
> 128-chunk.txt:0:40.21
> 256-chunk.txt:0:40.14***
> 512-chunk.txt:0:40.35
> 1024-chunk.txt:0:41.11
> 2048-chunk.txt:0:43.89
> 4096-chunk.txt:0:47.34
> 8192-chunk.txt:0:57.86
> 16384-chunk.txt:1:09.39
> 32768-chunk.txt:1:26.61
>
> It would appear a 256 KiB chunk-size is optimal.
>
> So what about NCQ?
>
> 1=ncq_depth.txt:0:40.86***
> 2=ncq_depth.txt:0:40.99
> 4=ncq_depth.txt:0:42.52
> 8=ncq_depth.txt:0:43.57
> 16=ncq_depth.txt:0:42.54
> 31=ncq_depth.txt:0:42.51
>
> Keeping it off seems best.
>
> 1=stripe_and_read_ahead.txt:0:40.86
> 2=stripe_and_read_ahead.txt:0:40.99
> 4=stripe_and_read_ahead.txt:0:42.52
> 8=stripe_and_read_ahead.txt:0:43.57
> 16=stripe_and_read_ahead.txt:0:42.54
> 31=stripe_and_read_ahead.txt:0:42.51
> 256=stripe_and_read_ahead.txt:1:44.16
> 1024=stripe_and_read_ahead.txt:1:07.01
> 2048=stripe_and_read_ahead.txt:0:53.59
> 4096=stripe_and_read_ahead.txt:0:45.66
> 8192=stripe_and_read_ahead.txt:0:40.73
>16384=stripe_and_read_ahead.txt:0:38.99**
> 16384=stripe_and_65536_read_ahead.txt:0:38.67
> 16384=stripe_and_65536_read_ahead.txt:0:38.69 (again, this is what I use
> from earlier benchmarks)
> 32768=stripe_and_read_ahead.txt:0:38.84
>
> What about logbufs?
>
> 2=logbufs.txt:0:39.21
> 4=logbufs.txt:0:39.24
> 8=logbufs.txt:0:38.71
>
> (again)
>
> 2=logbufs.txt:0:42.16
> 4=logbufs.txt:0:38.79
> 8=logbufs.txt:0:38.71** (yes)
>
> What about logbsize?
>
> 16k=logbsize.txt:1:09.22
> 32k=logbsize.txt:0:38.70
> 64k=logbsize.txt:0:39.04
> 128k=logbsize.txt:0:39.06
> 256k=logbsize.txt:0:38.59** (best)
>
>
> What about allocsize? (default=1024k)
>
> 4k=allocsize.txt:0:39.35
> 8k=allocsize.txt:0:38.95
> 16k=allocsize.txt:0:38.79
> 32k=allocsize.txt:0:39.71
> 64k=allocsize.txt:1:09.67
> 128k=allocsize.txt:0:39.04
> 256k=allocsize.txt:0:39.11
> 512k=allocsize.txt:0:39.01
> 1024k=allocsize.txt:0:38.75** (default)
> 2048k=allocsize.txt:0:39.07
> 4096k=allocsize.txt:0:39.15
> 8192k=allocsize.txt:0:39.40
> 16384k=allocsize.txt:0:39.36
>
> What about the agcount?
>
> 2=agcount.txt:0:37.53
> 4=agcount.txt:0:38.56
> 8=agcount.txt:0:40.86
> 16=agcount.txt:0:39.05
> 32=agcount.txt:0:39.07** (default)
> 64=agcount.txt:0:39.29
> 128=agcount.txt:0:39.42
> 256=agcount.txt:0:38.76
> 512=agcount.txt:0:38.27
> 1024=agcount.txt:0:38.29
> 2048=agcount.txt:1:08.55
> 4096=agcount.txt:0:52.65
> 8192=agcount.txt:1:06.96
> 16384=agcount.txt:1:31.21
> 32768=agcount.txt:1:09.06
> 65536=agcount.txt:1:54.96
>
>
> So far I have:
>
> p34:~# mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 /dev/md3
> meta-data=/dev/md3   isize=256agcount=32, agsize=10302272
> blks
>   =   sectsz=4096  attr=2
> data =   bsize=4096   blocks=329671296, imaxpct=25
>   =   sunit=64 swidth=576 blks, unwritten=1
> naming   =version 2  bsize=4096
> log  =internal log   bsize=4096   blocks=32768, version=2
>   =   sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none   extsz=2359296 blocks=0, rtextents=0
>
> p34:~# grep /dev/md3 /etc/fstab
> /dev/md3/r1 xfs 
> noatime,nodiratime,logbufs=8,logbsize=262144 0 1
>
> Notice how mkfs.xfs 'knows' the sunit and swidth, and it is the correct
> units too because it is software raid, and it pulls this information from
> that layer, unlike HW raid which will not have a clue of what is
> underneath and say sunit=0,swidth=0.
>
> However, in earlier testing I actually made them both 0 and it actually
> made performance better:
>
> http://home.comcast.net/~jpiszcz/sunit-swidth/results.html
>
> In any case, I am re-running bonnie++ once more with a 256 KiB chunk and
> will compare to those values in a bit.
>
> Justin.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Justin Piszcz




On Wed, 16 Jan 2008, Al Boldi wrote:


Justin Piszcz wrote:

For these benchmarks I timed how long it takes to extract a standard 4.4
GiB DVD:

Settings: Software RAID 5 with the following settings (until I change
those too):

Base setup:
blockdev --setra 65536 /dev/md3
echo 16384 > /sys/block/md3/md/stripe_cache_size
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
   echo "Disabling NCQ on $i"
   echo 1 > /sys/block/"$i"/device/queue_depth
done

p34:~# grep : *chunk* |sort -n
4-chunk.txt:0:45.31
8-chunk.txt:0:44.32
16-chunk.txt:0:41.02
32-chunk.txt:0:40.50
64-chunk.txt:0:40.88
128-chunk.txt:0:40.21
256-chunk.txt:0:40.14***
512-chunk.txt:0:40.35
1024-chunk.txt:0:41.11
2048-chunk.txt:0:43.89
4096-chunk.txt:0:47.34
8192-chunk.txt:0:57.86
16384-chunk.txt:1:09.39
32768-chunk.txt:1:26.61

It would appear a 256 KiB chunk-size is optimal.


Can you retest with different max_sectors_kb on both md and sd?
Remember this is SW RAID, so max_sectors_kb will only affect the 
individual disks underneath the SW RAID, I have benchmarked in the past, 
the defaults chosen by the kernel are optimal, changing them did not make 
any noticable improvements.



> Also, can you retest using dd with different block-sizes?

I can do this, moment..


I know about oflag=direct but I choose to use dd with sync and measure the 
total time it takes.
/usr/bin/time -f %E -o ~/$i=chunk.txt bash -c 'dd if=/dev/zero 
of=/r1/bigfile bs=1M count=10240; sync'


So I was asked on the mailing list to test dd with various chunk sizes, 
here is the length of time it took

to write 10 GiB and sync per each chunk size:

4=chunk.txt:0:25.46
8=chunk.txt:0:25.63
16=chunk.txt:0:25.26
32=chunk.txt:0:25.08
64=chunk.txt:0:25.55
128=chunk.txt:0:25.26
256=chunk.txt:0:24.72
512=chunk.txt:0:24.71
1024=chunk.txt:0:25.40
2048=chunk.txt:0:25.71
4096=chunk.txt:0:27.18
8192=chunk.txt:0:29.00
16384=chunk.txt:0:31.43
32768=chunk.txt:0:50.11
65536=chunk.txt:2:20.80

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Al Boldi

Justin Piszcz wrote:
> For these benchmarks I timed how long it takes to extract a standard 4.4
> GiB DVD:
>
> Settings: Software RAID 5 with the following settings (until I change
> those too):
>
> Base setup:
> blockdev --setra 65536 /dev/md3
> echo 16384 > /sys/block/md3/md/stripe_cache_size
> echo "Disabling NCQ on all disks..."
> for i in $DISKS
> do
>echo "Disabling NCQ on $i"
>echo 1 > /sys/block/"$i"/device/queue_depth
> done
>
> p34:~# grep : *chunk* |sort -n
> 4-chunk.txt:0:45.31
> 8-chunk.txt:0:44.32
> 16-chunk.txt:0:41.02
> 32-chunk.txt:0:40.50
> 64-chunk.txt:0:40.88
> 128-chunk.txt:0:40.21
> 256-chunk.txt:0:40.14***
> 512-chunk.txt:0:40.35
> 1024-chunk.txt:0:41.11
> 2048-chunk.txt:0:43.89
> 4096-chunk.txt:0:47.34
> 8192-chunk.txt:0:57.86
> 16384-chunk.txt:1:09.39
> 32768-chunk.txt:1:26.61
>
> It would appear a 256 KiB chunk-size is optimal.

Can you retest with different max_sectors_kb on both md and sd?

Also, can you retest using dd with different block-sizes?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Justin Piszcz




On Wed, 16 Jan 2008, Justin Piszcz wrote:

For these benchmarks I timed how long it takes to extract a standard 4.4 GiB 
DVD:


Settings: Software RAID 5 with the following settings (until I change those 
too):


http://home.comcast.net/~jpiszcz/sunit-swidth/newresults.html

Any idea why an sunit and swidth of 0 (and -d agcount=4) is faster at least
with sequential input/output than the proper sunit/swidth that it should be?

It does not make sense.

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

2008-01-16 Thread Justin Piszcz

For these benchmarks I timed how long it takes to extract a standard 4.4 
GiB DVD:


Settings: Software RAID 5 with the following settings (until I change 
those too):


Base setup:
blockdev --setra 65536 /dev/md3
echo 16384 > /sys/block/md3/md/stripe_cache_size
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
  echo "Disabling NCQ on $i"
  echo 1 > /sys/block/"$i"/device/queue_depth
done

p34:~# grep : *chunk* |sort -n
4-chunk.txt:0:45.31
8-chunk.txt:0:44.32
16-chunk.txt:0:41.02
32-chunk.txt:0:40.50
64-chunk.txt:0:40.88
128-chunk.txt:0:40.21
256-chunk.txt:0:40.14***
512-chunk.txt:0:40.35
1024-chunk.txt:0:41.11
2048-chunk.txt:0:43.89
4096-chunk.txt:0:47.34
8192-chunk.txt:0:57.86
16384-chunk.txt:1:09.39
32768-chunk.txt:1:26.61

It would appear a 256 KiB chunk-size is optimal.

So what about NCQ?

1=ncq_depth.txt:0:40.86***
2=ncq_depth.txt:0:40.99
4=ncq_depth.txt:0:42.52
8=ncq_depth.txt:0:43.57
16=ncq_depth.txt:0:42.54
31=ncq_depth.txt:0:42.51

Keeping it off seems best.

1=stripe_and_read_ahead.txt:0:40.86
2=stripe_and_read_ahead.txt:0:40.99
4=stripe_and_read_ahead.txt:0:42.52
8=stripe_and_read_ahead.txt:0:43.57
16=stripe_and_read_ahead.txt:0:42.54
31=stripe_and_read_ahead.txt:0:42.51
256=stripe_and_read_ahead.txt:1:44.16
1024=stripe_and_read_ahead.txt:1:07.01
2048=stripe_and_read_ahead.txt:0:53.59
4096=stripe_and_read_ahead.txt:0:45.66
8192=stripe_and_read_ahead.txt:0:40.73
  16384=stripe_and_read_ahead.txt:0:38.99**
16384=stripe_and_65536_read_ahead.txt:0:38.67
16384=stripe_and_65536_read_ahead.txt:0:38.69 (again, this is what I use 
from earlier benchmarks)

32768=stripe_and_read_ahead.txt:0:38.84

What about logbufs?

2=logbufs.txt:0:39.21
4=logbufs.txt:0:39.24
8=logbufs.txt:0:38.71

(again)

2=logbufs.txt:0:42.16
4=logbufs.txt:0:38.79
8=logbufs.txt:0:38.71** (yes)

What about logbsize?

16k=logbsize.txt:1:09.22
32k=logbsize.txt:0:38.70
64k=logbsize.txt:0:39.04
128k=logbsize.txt:0:39.06
256k=logbsize.txt:0:38.59** (best)


What about allocsize? (default=1024k)

4k=allocsize.txt:0:39.35
8k=allocsize.txt:0:38.95
16k=allocsize.txt:0:38.79
32k=allocsize.txt:0:39.71
64k=allocsize.txt:1:09.67
128k=allocsize.txt:0:39.04
256k=allocsize.txt:0:39.11
512k=allocsize.txt:0:39.01
1024k=allocsize.txt:0:38.75** (default)
2048k=allocsize.txt:0:39.07
4096k=allocsize.txt:0:39.15
8192k=allocsize.txt:0:39.40
16384k=allocsize.txt:0:39.36

What about the agcount?

2=agcount.txt:0:37.53
4=agcount.txt:0:38.56
8=agcount.txt:0:40.86
16=agcount.txt:0:39.05
32=agcount.txt:0:39.07** (default)
64=agcount.txt:0:39.29
128=agcount.txt:0:39.42
256=agcount.txt:0:38.76
512=agcount.txt:0:38.27
1024=agcount.txt:0:38.29
2048=agcount.txt:1:08.55
4096=agcount.txt:0:52.65
8192=agcount.txt:1:06.96
16384=agcount.txt:1:31.21
32768=agcount.txt:1:09.06
65536=agcount.txt:1:54.96


So far I have:

p34:~# mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 /dev/md3
meta-data=/dev/md3   isize=256agcount=32, agsize=10302272 
blks

 =   sectsz=4096  attr=2
data =   bsize=4096   blocks=329671296, imaxpct=25
 =   sunit=64 swidth=576 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal log   bsize=4096   blocks=32768, version=2
 =   sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none   extsz=2359296 blocks=0, rtextents=0

p34:~# grep /dev/md3 /etc/fstab
/dev/md3/r1 xfs 
noatime,nodiratime,logbufs=8,logbsize=262144 0 1

Notice how mkfs.xfs 'knows' the sunit and swidth, and it is the correct 
units too because it is software raid, and it pulls this information from 
that layer, unlike HW raid which will not have a clue of what is 
underneath and say sunit=0,swidth=0.


However, in earlier testing I actually made them both 0 and it actually 
made performance better:


http://home.comcast.net/~jpiszcz/sunit-swidth/results.html

In any case, I am re-running bonnie++ once more with a 256 KiB chunk and 
will compare to those values in a bit.


Justin.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The SX4 challenge

The SX4 challenge

Re: How do I get rid of old device?

Re: How do I get rid of old device?

Re: How do I get rid of old device?

How do I get rid of old device?

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

(no subject)

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

Re: Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

Linux Software RAID 5 + XFS Multi-Benchmarks / 10 Raptors Again

17 matches

Site Navigation

Mail list logo

Footer information