Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread David Greaves

David Chinner wrote:

On Wed, Jun 27, 2007 at 07:20:42PM -0400, Justin Piszcz wrote:

For drives with 16MB of cache (in this case, raptors).


That's four (4) drives, right?


I'm pretty sure he's using 10 - email a few days back...

Justin Piszcz wrote:
Running test with 10 RAPTOR 150 hard drives, expect it to take 
awhile until I get the results, avg them etc. :)



If so, how do you get a block read rate of 578MB/s from
4 drives? That's 145MB/s per drive


Which gives a far more reasonable 60MB/s per drive...

David

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Justin Piszcz

mdadm --create \
  --verbose /dev/md3 \
  --level=5 \
  --raid-devices=10 \
  --chunk=1024 \
  --force \
  --run
  /dev/sd[cdefghijkl]1

Justin.


On Thu, 28 Jun 2007, Peter Rabbitson wrote:


Justin Piszcz wrote:

The results speak for themselves:

http://home.comcast.net/~jpiszcz/chunk/index.html




What is the array layout (-l ? -n ? -p ?)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Justin Piszcz

10 disks total.

Justin.

On Thu, 28 Jun 2007, David Chinner wrote:


On Wed, Jun 27, 2007 at 07:20:42PM -0400, Justin Piszcz wrote:

For drives with 16MB of cache (in this case, raptors).


That's four (4) drives, right?

If so, how do you get a block read rate of 578MB/s from
4 drives? That's 145MB/s per drive

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm usage: creating arrays with helpful names?

2007-06-28 Thread David Greaves
(back on list for google's benefit ;) and because there are some good questions 
and I don't know all the answers... )


Oh, and Neil 'cos there may be a bug ...

Richard Michael wrote:

On Wed, Jun 27, 2007 at 08:49:22AM +0100, David Greaves wrote:

http://linux-raid.osdl.org/index.php/Partitionable



Thanks.  I didn't know this site existed (Googling even just 'mdadm'
doesn't yield it in the first 100 results), and it's helpful.
Good ... I got permission to wikify the 'official' linux raid FAQ but it takes 
time (and motivation!) to update it :)

Hopefully it will snowball as people who use it then contribute back hint ;)

As it becomes more valuable to people then more links will be created and Google 
will notice...




What if don't want a partitioned array?  I simply want the name to be
nicer than the /dev/mdX or /dev/md/XX style.  (p1 still gives me
/dev/nicename /dev/nicename0, as your page indicates.)

--auto md

mdadm --create /dev/strawberry --auto md ...
[EMAIL PROTECTED]:/tmp # mdadm --detail /dev/strawberry
/dev/strawberry:
Version : 00.90.03
  Creation Time : Thu Jun 28 08:25:06 2007
 Raid Level : raid4





Also, when I use --create /dev/nicename --auto=p1 (for example), I
also see /dev/md_d126 created.  Why?  There is then a /sys/block/md_d126
entry (presumably created by the md driver), but no /sys/block/nicename
entry.  Why?

Not sure who creates this, mdadm or udev
The code isn't that hard to read and you sound like you'd follow it if you 
fancied a skim-read...


I too would expect that there should be a /sys/block/nicename - is this a bug 
Neil?

These options don't see a lot of use - I recently came across a bug in the 
--auto pX option...



Finally --stop /dev/nicename doesn't remove any of the aforementioned
/dev or /sys entries.  I don't suppose that it should, but an mdadm
command to do this would be helpful.  So, how do I remove the oddly
named /sys entries? (I removed the /dev entries with rm.)  man mdadm
indicates --stop releases all resources, but it doesn't (and probably
shouldn't).

rm !

'--stop' with mdadm  does release the 'resources', ie the components you used. 
It doesn't remove the array. There is no delete - I guess since an rm is just as 
effective unless you use a nicename...



[I think there should be a symmetry to the mdadm options
--create/--delete and --start/--stop.  It's *convenient* --create
also starts the array, but this conflates the issue a bit..]

I want to stop and completely remove all trace of the array.
(Especially as I'm experimenting with this over loopback, and stuff
hanging around irritates the lo driver.)

You're possibly mixing two things up here...

Releasing the resources with a --stop would let you re-use a lo device in 
another array. You don't _need_ --delete (or rm).
However md does write superblocks to the components and *mdadm* warns you that 
the loopback has a valid superblock..


mdadm: /dev/loop1 appears to be part of a raid array:
level=raid4 devices=6 ctime=Thu Jun 21 09:46:27 2007

[hmm, I can see why you may think it's part of an 'active' array]

You could do mdadm --zero-superblock to clean the component or just say yes 
when mdadm asks you to continue.


see:
# mdadm --create /dev/strawberry --auto md --level=4 -n 6 /dev/loop1 /dev/loop2 
/dev/loop3 /dev/loop4 /dev/loop5 /dev/loop6

mdadm: /dev/loop1 appears to be part of a raid array:
level=raid4 devices=6 ctime=Thu Jun 28 08:25:06 2007
blah
Continue creating array? yes
mdadm: array /dev/strawberry started.

# mdadm --stop /dev/strawberry
mdadm: stopped /dev/strawberry

# mdadm --create /dev/strawberry --auto md --level=4 -n 6 /dev/loop1 /dev/loop2 
/dev/loop3 /dev/loop4 /dev/loop5 /dev/loop6

mdadm: /dev/loop1 appears to be part of a raid array:
level=raid4 devices=6 ctime=Thu Jun 28 09:07:29 2007
blah
Continue creating array? yes
mdadm: array /dev/strawberry started.

David

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Peter Rabbitson

Justin Piszcz wrote:

mdadm --create \
  --verbose /dev/md3 \
  --level=5 \
  --raid-devices=10 \
  --chunk=1024 \
  --force \
  --run
  /dev/sd[cdefghijkl]1

Justin.


Interesting, I came up with the same results (1M chunk being superior) 
with a completely different raid set with XFS on top:


mdadm   --create \
--level=10 \
--chunk=1024 \
--raid-devices=4 \
--layout=f3 \
...

Could it be attributed to XFS itself?

Peter

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Justin Piszcz



On Thu, 28 Jun 2007, Peter Rabbitson wrote:


Justin Piszcz wrote:

mdadm --create \
  --verbose /dev/md3 \
  --level=5 \
  --raid-devices=10 \
  --chunk=1024 \
  --force \
  --run
  /dev/sd[cdefghijkl]1

Justin.


Interesting, I came up with the same results (1M chunk being superior) with a 
completely different raid set with XFS on top:


mdadm   --create \
--level=10 \
--chunk=1024 \
--raid-devices=4 \
--layout=f3 \
...

Could it be attributed to XFS itself?

Peter



Good question, by the way how much cache do the drives have that you are 
testing with?


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Peter Rabbitson

Justin Piszcz wrote:


On Thu, 28 Jun 2007, Peter Rabbitson wrote:

Interesting, I came up with the same results (1M chunk being superior) 
with a completely different raid set with XFS on top:


...

Could it be attributed to XFS itself?

Peter



Good question, by the way how much cache do the drives have that you are 
testing with?




I believe 8MB, but I am not sure I am looking at the right number:

[EMAIL PROTECTED]:~# hdparm -i /dev/sda

/dev/sda:

 Model=aMtxro7 2Y050M  , FwRev=AY5RH10W, 
SerialNo=6YB6Z7E4

 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7


 * signifies the current active mode

[EMAIL PROTECTED]:~#

1M chunk consistently delivered best performance with:

o A plain dumb dd run
o bonnie
o two bonnie threads
o iozone with 4 threads

My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
respectively)

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Justin Piszcz



On Thu, 28 Jun 2007, Peter Rabbitson wrote:


Justin Piszcz wrote:


On Thu, 28 Jun 2007, Peter Rabbitson wrote:

Interesting, I came up with the same results (1M chunk being superior) 
with a completely different raid set with XFS on top:


...

Could it be attributed to XFS itself?

Peter



Good question, by the way how much cache do the drives have that you are 
testing with?




I believe 8MB, but I am not sure I am looking at the right number:

[EMAIL PROTECTED]:~# hdparm -i /dev/sda

/dev/sda:

Model=aMtxro7 2Y050M  , FwRev=AY5RH10W, 
SerialNo=6YB6Z7E4

Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7


* signifies the current active mode

[EMAIL PROTECTED]:~#

1M chunk consistently delivered best performance with:

o A plain dumb dd run
o bonnie
o two bonnie threads
o iozone with 4 threads

My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
respectively)




8MB yup: BuffSize=7936kB.

My read ahead is set to 64 megabytes and 16384 for the stripe_size_cache.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Justin Piszcz



On Thu, 28 Jun 2007, Peter Rabbitson wrote:


Justin Piszcz wrote:


On Thu, 28 Jun 2007, Peter Rabbitson wrote:

Interesting, I came up with the same results (1M chunk being superior) 
with a completely different raid set with XFS on top:


...

Could it be attributed to XFS itself?

Peter



Good question, by the way how much cache do the drives have that you are 
testing with?




I believe 8MB, but I am not sure I am looking at the right number:

[EMAIL PROTECTED]:~# hdparm -i /dev/sda

/dev/sda:

Model=aMtxro7 2Y050M  , FwRev=AY5RH10W, 
SerialNo=6YB6Z7E4

Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7


* signifies the current active mode

[EMAIL PROTECTED]:~#

1M chunk consistently delivered best performance with:

o A plain dumb dd run
o bonnie
o two bonnie threads
o iozone with 4 threads

My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
respectively)

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Have you also tried tuning:

1. nr_requests per each disk? I noticed 10-20 seconds faster speed 
(overall) with bonnie tests when I set all disks in the array to 512k.

  echo 512  /sys/block/$i/queue/nr_requests

2. Also disable NCQ.
  echo 1  /sys/block/$i/device/queue_depth

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Justin Piszcz



On Thu, 28 Jun 2007, Justin Piszcz wrote:




On Thu, 28 Jun 2007, Peter Rabbitson wrote:


Justin Piszcz wrote:


On Thu, 28 Jun 2007, Peter Rabbitson wrote:

Interesting, I came up with the same results (1M chunk being superior) 
with a completely different raid set with XFS on top:


...

Could it be attributed to XFS itself?

Peter



Good question, by the way how much cache do the drives have that you are 
testing with?




I believe 8MB, but I am not sure I am looking at the right number:

[EMAIL PROTECTED]:~# hdparm -i /dev/sda

/dev/sda:

Model=aMtxro7 2Y050M  , FwRev=AY5RH10W, 
SerialNo=6YB6Z7E4

Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7


* signifies the current active mode

[EMAIL PROTECTED]:~#

1M chunk consistently delivered best performance with:

o A plain dumb dd run
o bonnie
o two bonnie threads
o iozone with 4 threads

My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
respectively)

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Have you also tried tuning:

1. nr_requests per each disk? I noticed 10-20 seconds faster speed (overall) 
with bonnie tests when I set all disks in the array to 512k.

 echo 512  /sys/block/$i/queue/nr_requests

2. Also disable NCQ.
 echo 1  /sys/block/$i/device/queue_depth

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Also per XFS:

noatime,logbufs=8

I am testing various options, so far the logbufs=8 option is detrimental, 
making the entire bonnie++ run a little slower.  I believe the default is 
2 and it uses 32k(?) buffers (shown below) if the blocksize is less than 
16K I am trying with: noatime,logbufs=8,logbsize=262144 currently.


   logbufs=value
  Set  the  number  of in-memory log buffers.  Valid numbers range
  from 2-8 inclusive.  The default value is 8 buffers for filesys-
  tems  with  a blocksize of 64K, 4 buffers for filesystems with a
  blocksize of 32K, 3 buffers for filesystems with a blocksize  of
  16K, and 2 buffers for all other configurations.  Increasing the
  number of buffers may increase performance on some workloads  at
  the  cost  of the memory used for the additional log buffers and
  their associated control structures.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Matti Aarnio
On Thu, Jun 28, 2007 at 10:24:54AM +0200, Peter Rabbitson wrote:
 Interesting, I came up with the same results (1M chunk being superior) 
 with a completely different raid set with XFS on top:
 
 mdadm --create \
   --level=10 \
   --chunk=1024 \
   --raid-devices=4 \
   --layout=f3 \
   ...
 
 Could it be attributed to XFS itself?

Sort of..

 /dev/md4:
 Version : 00.90.03
  Raid Level : raid5
Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 4
 
  Active Devices : 4
 Working Devices : 4

  Layout : left-symmetric
  Chunk Size : 256K

This means there are 3x 256k for the user data..
Now I had to carefully tune the XFS  bsize/sunit/swidth  to match that:

 meta-data=/dev/DataDisk/lvol0isize=256agcount=32, agsize=7325824 blks
  =   sectsz=512   attr=1
 data =   bsize=4096   blocks=234426368, imaxpct=25
  =   sunit=64 swidth=192 blks, unwritten=1
 ...

That is, 4k * 64 = 256k,   and   64 * 3 = 192
With that, bulk writing on the file system runs without need to
read back blocks of disk-space to calculate RAID5 parity data because
the filesystem's idea of block does not align with RAID5 surface.

I do have LVM in between the MD-RAID5 and XFS, so I did also align
the LVM to that  3 * 256k.

Doing this alignment thing did boost write performance by nearly
a factor of 2 from mkfs.xfs with default parameters.


With very wide RAID5, like the original question...  I would find it
very surprising if the alignment of upper layers to MD-RAID level
would not be important there as well.

Very small continuous writing does not make good use of disk mechanism,
(seek time, rotation delay), so something in order of 128k-1024k will
speed things up -- presuming that when you are writing, you are doing
it many MB at the time.  Database transactions are a lot smaller, and
are indeed harmed by such large megachunk-IO oriented surfaces.

RAID-levels 0 and 1 (and 10)  do not have the need of reading back parts
of the surface because a subset of it was not altered by incoming write.

Some DB application on top of the filesystem would benefit if we had
a way for it to ask about these alignment boundary issues, so it could
read whole alignment block even though it writes out only a subset of it.
(Theory being that those same blocks would also exist in memory cache
and thus be available for write-back parity calculation.)


 Peter

/Matti Aarnio
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread Justin Piszcz



On Thu, 28 Jun 2007, Matti Aarnio wrote:


On Thu, Jun 28, 2007 at 10:24:54AM +0200, Peter Rabbitson wrote:

Interesting, I came up with the same results (1M chunk being superior)
with a completely different raid set with XFS on top:

mdadm   --create \
--level=10 \
--chunk=1024 \
--raid-devices=4 \
--layout=f3 \
...

Could it be attributed to XFS itself?


If anyone is interested, I also did a 2048k, 1024k definitely results in
the most optimal configuration.

p34-128k-chunk,15696M,77236.3,99,445653,86.,192267,34.,78773.7,99,524463,41,594.9,0,16:10:16/64,1298.67,10.6667,5964.33,17.,3035.67,18.,1512,13.6667,5334.33,16,2634.67,19
p34-512k-chunk,15696M,78383,99,436842,86,162969,27,79624,99,486892,38,583.0,0,16:10:16/64,2019,17,9715,29,4272,23,2250,22,17095,45,3691,30
p34-1024k-chunk,15696M,77672.3,99,455267,87.,183772,29.6667,79601.3,99,578225,43.,595.933,0,16:10:16/64,2085.67,18,12953,39,3908.33,23.,2375.33,23.,18492,51.6667,3388.33,27
p34-2048k-chunk,15696M,76822,98,435439,86,164140,26.,77065.3,99,582948,44,631.467,0,16:10:16/64,1795.33,15,17612.3,49.,3668.67,20.6667,2040.67,19,13384,38,3255.33,25
p34-4096k-chunk,15696M,33791.1,43.5556,176630,37.,72235.1,11.5556,34424.9,44,247925,18.,271.644,0,16:10:16/64,560,4.9,2928,8.9,1039.56,5.8,571.556,5.3,1729.78,5.3,1289.33,9.3

http://home.comcast.net/~jpiszcz/chunk/

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid=noautodetect is apparently ignored?

2007-06-28 Thread Ian Dall
On Wed, 2007-06-27 at 08:48 -0700, Andrew Burgess wrote:
  Odd
   Maybe you have an initrd which is loading md as a module, then
   running raidautorun or similar?
 ..
 I suspect that the last comment is the clue, after pivotroot I bet it 
 runs another init, not from the boot/initrd images, but from the init.d 
 in the root filesystem.

You are absolutely correct. On Fedora core5, in rc.sysinit

echo raidautorun /dev/md0 | nash --quiet
if [ -f /etc/mdadm.conf ]; then
/sbin/mdadm -A -s
fi


But my original observation was correct. The noautodetect was/is being
ignored because (whenever) there is an initrd. FC5 doesn't support raid
root partitions (mkinitrd doesn't put the right stuff in initrd), but
FC7 tries to. I have upgraded and things are mostly correct. Albeit, FC7
doesn't support my nested raid configuration and so it took some coaxing
to get the the upgrade done and a hack to coax mkinitrd into doing the
right thing.

Putting mdadm.conf on a floppy disk plus a little intervention with a
virtual console early in the upgrade process worked wonders.

 One quick way to test this is to boot with init=/bin/sh
 This lets all the initrd stuff run but nothing from the
 root filesystem.

Neat idea. I'll try and remember that for the future.

-- 
Ian Dall [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm usage: creating arrays with helpful names?

2007-06-28 Thread Richard Michael
On Thu, Jun 28, 2007 at 09:12:56AM +0100, David Greaves wrote:
 (back on list for google's benefit ;) and because there are some good 
 questions and I don't know all the answers... )

Thanks, I didn't realize I didn't 'reply-all' to stay on the list.

 Hopefully it will snowball as people who use it then contribute back
 hint ;)

I will, I'm also keeping notes and changes to the man page. :)

 --auto md

Ah. Thanks for the example(s).

 Also, when I use --create /dev/nicename --auto=p1 (for example), I
 also see /dev/md_d126 created.  Why?  There is then a /sys/block/md_d126
 entry (presumably created by the md driver), but no /sys/block/nicename
 entry.  Why?
 Not sure who creates this, mdadm or udev

I'm guessing the kernel's md driver creates it; neither mdmadm nor udev
(just as the kernel creates, for example, sd* disk entries in /sys, but
udev creates the nice entries in /dev).

 The code isn't that hard to read and you sound like you'd follow it if
 you fancied a skim-read...

I read it for the --create option to see who created /dev/mdXX. :) I'll
take another look.

Thanks David.

Cheers.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Does --write-behind= have to be done at create time?

2007-06-28 Thread Ian Dall
I was wanting to try out the --write-behind option. I have a raid1
with bitmaps and write-mostly enabled, which are all the pre-requisites,
I think.

It would be nice if you could tweak this parameter on a live array, but
failing that, it is hard to see why it couldn't be done at assemble
time. mdadm wont let me though.

Is this a fundamental limitation?

A related question, if I do recreate the same array, with exactly the
same parameters (except for the write-behind value) will my data still
be OK?


-- 
Ian Dall [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-28 Thread Pavel Machek
Hi!

 FWIW, I'm on record stating that sync is not sufficient to quiesce an XFS
 filesystem for a suspend/resume to work safely and have argued that the only

Hmm, so XFS writes to disk even when its threads are frozen?

 safe thing to do is freeze the filesystem before suspend and thaw it after
 resume. This is why I originally asked you to test that with the other problem

Could you add that to the XFS threads if it is really required? They
do know that they are being frozen for suspend.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-28 Thread Rafael J. Wysocki
On Wednesday, 27 June 2007 22:49, Pavel Machek wrote:
 Hi!
 
  FWIW, I'm on record stating that sync is not sufficient to quiesce an XFS
  filesystem for a suspend/resume to work safely and have argued that the only
 
 Hmm, so XFS writes to disk even when its threads are frozen?
 
  safe thing to do is freeze the filesystem before suspend and thaw it after
  resume. This is why I originally asked you to test that with the other 
  problem
 
 Could you add that to the XFS threads if it is really required? They
 do know that they are being frozen for suspend.

Well, do you remember the workqueues?  They are still nonfreezable.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: spare not becoming active

2007-06-28 Thread Simon

Number   Major   Minor   RaidDevice State
   0   000  removed
   1   8   341  active sync   /dev/sdc2
   2   002  removed

   3   8   82-  spare   /dev/sdf2
   4   8   66-  spare   /dev/sde2
   5   8   50-  faulty spare
   6   8   18-  faulty spare


I was trying a couple things, but never got to change this status.
At one point I stopped the array and restarted it, and it didn't work. (I did it 
before, so i don't see why...)


# /sbin/mdadm -R /dev/md0
mdadm: failed to run array /dev/md0: Invalid argument

I'm starting to think the documentation i read was very outdated, or only 
touched the subject in surface, can you guys recommend a good reading, like a 
companion to the man page?


Thanks,
  Simon



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


XFS mount option performance on Linux Software RAID 5

2007-06-28 Thread Justin Piszcz


Still reviewing but it appears 8 + 256k looks good.

p34-noatime-logbufs=2-lbsize=256k,15696M,78172.3,99,450320,86.6667,178683,29,79808,99,565741,42.,610.067,0,16:10:16/64,2362,19.6667,15751.7,46,3993.33,22,2545.67,24.,13976,41,3781.33,28.6667
p34-noatime-logbufs=8-lbsize=256k,15696M,78238,99,455532,86.6667,182382,30,79741.7,99,571631,43,597.633,0,16:10:16/64,3421,29,12130,38.,5943.33,33,3671.33,35.6667,13521.3,41.,5162.33,38.
p34-noatime-logbufs=8-lbsize=default,15696M,77872,98.6667,438661,86.6667,179848,29.,79368,99,555999,42,632.733,0.33,16:10:16/64,2090,17.6667,11183,33,3922.67,23,2271.33,22.,11709,35,3391.33,26.
p34-noatime-only,15696M,77473,99,449689,86.6667,176960,29.,80186.3,99,568503,42.6667,592.633,0,16:10:16/64,2102,18,15935.3,44.6667,3825.67,22.,2353,23.6667,9727.33,29.,3265,25.6667

http://home.comcast.net/~jpiszcz/chunk/logbufs.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-28 Thread Pavel Machek
On Thu 2007-06-28 17:27:34, Rafael J. Wysocki wrote:
 On Wednesday, 27 June 2007 22:49, Pavel Machek wrote:
  Hi!
  
   FWIW, I'm on record stating that sync is not sufficient to quiesce an 
   XFS
   filesystem for a suspend/resume to work safely and have argued that the 
   only
  
  Hmm, so XFS writes to disk even when its threads are frozen?
  
   safe thing to do is freeze the filesystem before suspend and thaw it after
   resume. This is why I originally asked you to test that with the other 
   problem
  
  Could you add that to the XFS threads if it is really required? They
  do know that they are being frozen for suspend.
 
 Well, do you remember the workqueues?  They are still nonfreezable.

Oops, that would explain it :-(. Can we make XFS stop using them?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-06-28 Thread David Chinner
On Thu, Jun 28, 2007 at 04:27:15AM -0400, Justin Piszcz wrote:
 
 
 On Thu, 28 Jun 2007, Peter Rabbitson wrote:
 
 Justin Piszcz wrote:
 mdadm --create \
   --verbose /dev/md3 \
   --level=5 \
   --raid-devices=10 \
   --chunk=1024 \
   --force \
   --run
   /dev/sd[cdefghijkl]1
 
 Justin.
 
 Interesting, I came up with the same results (1M chunk being superior) 
 with a completely different raid set with XFS on top:
 
 mdadm--create \
  --level=10 \
  --chunk=1024 \
  --raid-devices=4 \
  --layout=f3 \
  ...
 
 Could it be attributed to XFS itself?

More likely it's related to the I/O size being sent to the disks. The larger
the chunk size, the larger the I/o hitting each disk. I think the maximum I/O
size is 512k ATM on x86(_64), so a chunk of 1MB will guarantee that there are
maximally sized I/Os being sent to the disk

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-28 Thread David Chinner
On Wed, Jun 27, 2007 at 08:49:24PM +, Pavel Machek wrote:
 Hi!
 
  FWIW, I'm on record stating that sync is not sufficient to quiesce an XFS
  filesystem for a suspend/resume to work safely and have argued that the only
 
 Hmm, so XFS writes to disk even when its threads are frozen?

They issue async I/O before they sleep and expects
processing to be done on I/O completion via workqueues.

  safe thing to do is freeze the filesystem before suspend and thaw it after
  resume. This is why I originally asked you to test that with the other 
  problem
 
 Could you add that to the XFS threads if it is really required? They
 do know that they are being frozen for suspend.

We don't suspend the threads on a filesystem freeze - they continue
run. A filesystem freeze guarantees the filesystem clean and that
the in memory state matches what is on disk. It is not possible for
the filesytem to issue I/O or have outstanding I/O when it is in the
frozen state, so the state of the threads and/or workqueues does not
matter because they will be idle.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-28 Thread David Chinner
On Fri, Jun 29, 2007 at 12:16:44AM +0200, Rafael J. Wysocki wrote:
 There are two solutions possible, IMO.  One would be to make these workqueues
 freezable, which is possible, but hacky and Oleg didn't like that very much.
 The second would be to freeze XFS from within the hibernation code path,
 using freeze_bdev().

The second is much more likely to work reliably. If freezing the
filesystem leaves something in an inconsistent state, then it's
something I can reproduce and debug without needing to
suspend/resume.

FWIW, don't forget you need to thaw the filesystem on resume.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html