Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote:

> I've managed to get myself into a little problem.
>
> Since power hits were taking out the /boot partition, I decided to split 
> /boot out of root. Working from my emergency partition,  I copied all files 
> from /root, re-partitioned what had been /root into room for /boot and 
> /root, and then created the drives. This left me with /dev/md/boot, 
> /dev/md/root, and /dev/md/base (everything else).
>
> I modified mdadm.conf on the emergency partition, used update-initramfs to 
> make certain that the new md drives would be recognized, and rebooted. This 
> worked as expected.
>
> I then mounted all the entire new file system on a mount point, copied the 
> mdadm.conf to that point, did a chroot to that point, and did an 
> update-initramfs so that the non-emergency partition would have the updated 
> mdadm.conf. This worked -- but with complaints about missing the file 
> /proc/modules (which is not present under chroot). If I use the -v option I 
> can see the raid456, raid1, etc. modules loading.
>
> I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
> (thanks, Robin!) successfully.
>
> Problem: on reboot, the I get an error message:
>
> root (hd0,1)  (Moshe comment: as expected)
> Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
> kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro
>
> Error 15: File not found
>
"File not found" at that point would suggest it can't find the kernel
file.  The path here should be relative to the root of the partition
/boot is on, so if your /boot is its own partition then you should
either use "kernel /vmlinuz" or (the more usual solution from what
I've seen) make sure there's a symlink:
ln -s . /boot/boot

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpSsIYkFb4DG.pgp
Description: PGP signature


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote:

> I've managed to get myself into a little problem.
>
> Since power hits were taking out the /boot partition, I decided to split 
> /boot out of root. Working from my emergency partition,  I copied all files 
> from /root, re-partitioned what had been /root into room for /boot and 
> /root, and then created the drives. This left me with /dev/md/boot, 
> /dev/md/root, and /dev/md/base (everything else).
>
> I modified mdadm.conf on the emergency partition, used update-initramfs to 
> make certain that the new md drives would be recognized, and rebooted. This 
> worked as expected.
>
> I then mounted all the entire new file system on a mount point, copied the 
> mdadm.conf to that point, did a chroot to that point, and did an 
> update-initramfs so that the non-emergency partition would have the updated 
> mdadm.conf. This worked -- but with complaints about missing the file 
> /proc/modules (which is not present under chroot). If I use the -v option I 
> can see the raid456, raid1, etc. modules loading.
>
> I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
> (thanks, Robin!) successfully.
>
> Problem: on reboot, the I get an error message:
>
> root (hd0,1)  (Moshe comment: as expected)
> Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
> kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro
---^^

Are you sure that's right?  Looks like a typo to me.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpLybDJ7EGiw.pgp
Description: PGP signature


Re: raid1 or raid10 for /boot

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 12:21:40PM +0100, Keld Jørn Simonsen wrote:

> On Mon, Feb 04, 2008 at 09:17:35AM +0000, Robin Hill wrote:
> > On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote:
> > 
> > > I understand that lilo and grub only can boot partitions that look like
> > > a normal single-drive partition. And then I understand that a plain
> > > raid10 has a layout which is equivalent to raid1. Can such a raid10
> > > partition be used with grub or lilo for booting?
> > > And would there be any advantages in this, for example better disk
> > > utilization in the raid10 driver compared with raid?
> > > 
> > A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and
> > _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could -
> > I'm not sure how that'd be layed out).  RAID-10 uses striping as well as
> > mirroring, and the striping breaks both grub and lilo (and, AFAIK, every
> > other boot manager currently out there).
> 
> Yes, it is understood that raid10,f2 uses striping, but a raid10,near=2,
> far=1 does not use striping, anfd this is what you get if you just make
> amdadm --create /dev/md0 -l 10 -n 2 /dev/sda1 /dev/sdb1
> 
Well yes, if you do a two-disk RAID-10 then (as I said above) you
probably end up with a RAID-1 (as you do with a two-disk RAID-5).  I
don't see how this would work any differently (or better) than a RAID-1
though (and only serves to confuse things).

If you have more than two disks then RAID-10 will _always_ stripe (no
matter whether you use near, far or offset layout - these affect only
where the mirrored chunks are put) and grub/lilo will fail to work.

Cheers,
Robin

-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpJkvfTSVpOK.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 05:06:09AM -0600, Moshe Yudkowsky wrote:

> Robin, thanks for the explanation. I have a further question.
>
> Robin Hill wrote:
>
>> Once the file system is mounted then hdX,Y maps according to the
>> device.map file (which may actually bear no resemblance to the drive
>> order at boot - I've had issues with this before).  At boot time it maps
>> to the BIOS boot order though, and (in my experience anyway) hd0 will
>> always map to the drive the BIOS is booting from.
>
> At the time that I use grub to write to the MBR, hd2,1 is /dev/sdc1. 
> Therefore, I don't quite understand why this would not work:
>
> grub < root(hd2,1)
> setup(hd2)
> EOF
>
> This would seem to be a command to have the MBR on hd2 written to use the 
> boot on hd2,1. It's valid when written. Are you saying that it's a command 
> for the MBR on /dev/sdc to find the data on (hd2,1), the location of which 
> might change at any time? That's... a  very strange way to write the tool. 
> I thought it would be a command for the MBR on hd2 (sdc) to look at hd2,1 
> (sdc1) to find its data, regardless of the boot order that caused sdc to be 
> the boot disk.
>
This is exactly what it does, yes - the hdX,Y are mapped by GRUB into
BIOS disk interfaces (0x80 being the first, 0x81 the second and so on)
and it writes (to hdc in this case) the instructions to look on the
first partition of BIOS drive 0x82 (whichever drive that ends up being)
for the rest of the bootloader.

It is a bit of a strange way to work, but it's really the only way it
_can_ work (and cover all circumstances).  Unfortunately when you start
playing with bootloaders you have to get down to the BIOS level, and
things weren't written to make sense at that level (after all, when
these standards were put in place everyone was booting from a single
floppy disk system).  If EFI becomes more standard then hopefully this
will simplify but we're stuck with things as they are for now.

Cheers,
Robin

-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgplpoLJetl8c.pgp
Description: PGP signature


Re: raid1 or raid10 for /boot

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote:

> I understand that lilo and grub only can boot partitions that look like
> a normal single-drive partition. And then I understand that a plain
> raid10 has a layout which is equivalent to raid1. Can such a raid10
> partition be used with grub or lilo for booting?
> And would there be any advantages in this, for example better disk
> utilization in the raid10 driver compared with raid?
> 
A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and
_cannot_ be used for booting (well, possibly a 2-disk RAID-10 could -
I'm not sure how that'd be layed out).  RAID-10 uses striping as well as
mirroring, and the striping breaks both grub and lilo (and, AFAIK, every
other boot manager currently out there).

Cheers,
Robin
-- 
 ___    
    ( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgp5z4BPWeOcx.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-03 Thread Robin Hill
On Sun Feb 03, 2008 at 02:46:54PM -0600, Moshe Yudkowsky wrote:

> Robin Hill wrote:
>
>> This is wrong - the disk you boot from will always be hd0 (no matter
>> what the map file says - that's only used after the system's booted).
>> You need to remap the hd0 device for each disk:
>> grub --no-floppy <> root (hd0,1)
>> setup (hd0)
>> device (hd0) /dev/sdb
>> root (hd0,1)
>> setup (hd0)
>> device (hd0) /dev/sdc
>> root (hd0,1)
>> setup (hd0)
>> device (hd0) /dev/sdd
>> root (hd0,1)
>> setup (hd0)
>> EOF
>
> For my enlightenment: if the file system is mounted, then hd2,1 is a 
> sensible grub operation, isn't it? For the record, given my original script 
> when I boot I am able to edit the grub boot options to read
>
> root (hd2,1)
>
> and proceed to boot.
>
Once the file system is mounted then hdX,Y maps according to the
device.map file (which may actually bear no resemblance to the drive
order at boot - I've had issues with this before).  At boot time it maps
to the BIOS boot order though, and (in my experience anyway) hd0 will
always map to the drive the BIOS is booting from.

So initially you may have:
SATA-1: hd0
SATA-2: hd1
SATA-3: hd2

Now, if the SATA-1 drive dies totally you will have:
SATA-1: -
SATA-2: hd0
SATA-3: hd1

or if SATA-2 dies:
SATA-1: hd0
SATA-2: -
SATA-3: hd1

Note that in the case where the drive is still detected but fails to
boot then the behaviour seems to be very BIOS dependent - some will
continue to drive 2 as above, whereas others will just sit and complain.

So to answer the second part of your question, yes - at boot time
currently you can do "root (hd2,1)" or "root (hd3,1)".  If a disk dies,
however (whichever disk it is), then "root (hd3,1)" will fail to work.

Note also that the above is only my experience - if you're depending on
certain behaviour under these circumstances then you really need to test
it out on your hardware by disconnecting drives, substituting
non-bootable drives, etc.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpuoxEErijXz.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-03 Thread Robin Hill
On Sun Feb 03, 2008 at 01:15:10PM -0600, Moshe Yudkowsky wrote:

> I've been reading the draft and checking it against my experience. Because 
> of local power fluctuations, I've just accidentally checked my system:  My 
> system does *not* survive a power hit. This has happened twice already 
> today.
>
> I've got /boot and a few other pieces in a 4-disk RAID 1 (three running, 
> one spare). This partition is on /dev/sd[abcd]1.
>
> I've used grub to install grub on all three running disks:
>
> grub --no-floppy < root (hd0,1)
> setup (hd0)
> root (hd1,1)
> setup (hd1)
> root (hd2,1)
> setup (hd2)
> EOF
>
> (To those reading this thread to find out how to recover: According to 
> grub's "map" option, /dev/sda1 maps to hd0,1.)
>
This is wrong - the disk you boot from will always be hd0 (no matter
what the map file says - that's only used after the system's booted).
You need to remap the hd0 device for each disk:

grub --no-floppy <
> After the power hit, I get:
>
> > Error 16
> > Inconsistent filesystem mounted
>
> I then tried to boot up on hda1,1, hdd2,1 -- none of them worked.
>
> The culprit, in my opinion, is the reiserfs file system. During the power 
> hit, the reiserfs file system of /boot was left in an inconsistent state; 
> this meant I had up to three bad copies of /boot.
>
Could well be - I always use ext2 for the /boot filesystem and don't
have it automounted.  I only mount the partition to install a new
kernel, then unmount it again.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgp9vtpHG44V7.pgp
Description: PGP signature


Re: Error mounting a reiserfs on renamed raid1

2008-01-25 Thread Robin Hill
On Fri Jan 25, 2008 at 01:48:32AM +0100, Clemens Koller wrote:

> Hi there.
>
> I am new to this list, however didn't find this effect nor a
> solution to my problem in the archives or with google:
>
> short story:
> 
> A single raid1 as /dev/md0 containing a reiserfs (with important data)
> assembled during boot works just fine:
> $ cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1]
> md0 : active raid1 hdg1[1] hde1[0]
>   293049600 blocks [2/2] [UU]
>
> The same raid1 moved to another machine as a fourth raid can be
> assembled manually as /dev/md3 (to work around naming conflicts),
> but it cannot be mounted anymore:
> $ mdadm --assemble /dev/md3 --update=super-minor -m0 /dev/hde /dev/hdg
> does not complain. /dev/md3 is created. But
>
It looks like you should be assembling the partitions, not the disks.
Certainly the mdstat entry above shows the array being formed from the
disks.  Try:
  mdadm --assemble /dev/md3 --update=super-minor -m0 /dev/hde1 /dev/hdg1

> $ mount /dev/md3 /raidmd3 gives:
>
> Jan 24 20:24:10 rio kernel: md: md3 stopped.
> Jan 24 20:24:10 rio kernel: md: bind
> Jan 24 20:24:10 rio kernel: md: bind
> Jan 24 20:24:10 rio kernel: raid1: raid set md3 active with 2 out of 2 
> mirrors
> Jan 24 20:24:12 rio kernel: ReiserFS: md3: warning: sh-2021: 
> reiserfs_fill_super: can not find reiserfs on md3
>
> Adding  -t reiserfs doesn't work either.
>
Presumably the superblock for the file system cannot be found because
it's now offset due to the above issue.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpku2hConceS.pgp
Description: PGP signature


Re: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-20 Thread Robin Hill
On Sat Jan 19, 2008 at 11:08:43PM -, Steve Fairbairn wrote:

> 
> Hi All,
> 
> I have a Software RAID 5 device configured, but one of the drives
> failed. I removed the drive with the following command...
> 
> mdadm /dev/md0 --remove /dev/hdc1
> 
> Now, when I try to insert the replacement drive back in, I get the
> following...
> 
> [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1
> mdadm: add new device failed for /dev/hdc1 as 5: Invalid argument
> 
> [EMAIL PROTECTED] mdadm-2.6.4]# dmesg | tail
> ...
> md: hdc1 has invalid sb, not importing!
> md: md_import_device returned -22
> md: hdc1 has invalid sb, not importing!
> md: md_import_device returned -22
> 
I've had the same error message trying to add a drive into an array
myself - in my case I'm almost certain it's because the drive is
slightly smaller than the others in the array (the array's currently
growing so I haven't delved any further yet).  Have you checked the
actual partition sizes?  Particularly if it's a different type of drive
as drives from different manufacturers can vary by quite a large
amount.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpKJVYKhAk6m.pgp
Description: PGP signature


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Robin Hill
On Wed Dec 19, 2007 at 09:50:16AM -0500, Justin Piszcz wrote:

> The (up to) 30% percent figure is mentioned here:
> http://insights.oetiker.ch/linux/raidoptimization.html
>
That looks to be referring to partitioning a RAID device - this'll only
apply to hardware RAID or partitionable software RAID, not to the normal
use case.  When you're creating an array out of standard partitions then
you know the array stripe size will align with the disks (there's no way
it cannot), and you can set the filesystem stripe size to align as well
(XFS will do this automatically).

I've actually done tests on this with hardware RAID to try to find the
correct partition offset, but wasn't able to see any difference (using
bonnie++ and moving the partition start by one sector at a time).

> # fdisk -l /dev/sdc
>
> Disk /dev/sdc: 150.0 GB, 150039945216 bytes
> 255 heads, 63 sectors/track, 18241 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x5667c24a
>
>Device Boot  Start End  Blocks   Id  System
> /dev/sdc1   1   18241   146520801   fd  Linux raid 
> autodetect
>
This looks to be a normal disk - the partition offsets shouldn't be
relevant here (barring any knowledge of the actual physical disk layout
anyway, and block remapping may well make that rather irrelevant).

That's my take on this one anyway.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpF38P14XDRA.pgp
Description: PGP signature


Re: Raid over 48 disks

2007-12-18 Thread Robin Hill
On Tue Dec 18, 2007 at 12:29:27PM -0500, Norman Elton wrote:

> We're investigating the possibility of running Linux (RHEL) on top of Sun's 
> X4500 Thumper box:
>
> http://www.sun.com/servers/x64/x4500/
>
> Basically, it's a server with 48 SATA hard drives. No hardware RAID. It's 
> designed for Sun's ZFS filesystem.
>
> So... we're curious how Linux will handle such a beast. Has anyone run MD 
> software RAID over so many disks? Then piled LVM/ext3 on top of that? Any 
> suggestions?
>
> Are we crazy to think this is even possible?
>
The most I've done is 28 drives in RAID-10 (SCSI drives, with the array
formatted as XFS).  That keeps failing one drive, but I've not had time
to give the drive a full test yet to confirm it's a drive issue.  It's
been running quite happily (under pretty heavy database load) on 27
disks for a couple of months now though.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpFz4s5k2eD3.pgp
Description: PGP signature


Re: Fwd: issues rebuilding raid array.

2007-10-22 Thread Robin Hill
On Mon Oct 22, 2007 at 09:46:08PM +1000, Sam Redfern wrote:

> Greetings happy mdadm users.
> 
> I have a little problem that after many hours of searching around I
> couldn't seem to solve.
> 
> I have upgraded my motherboard and kernel (bad practice I know but the
> ICH9R controller needs  2.6.2*+) at the same time.
> 
> The array was build using 2.6.18-7 Now i'm using  2.6.21-2
> 
> I'm trying to recreate the raid array with the following command and
> this is the error I get:
> 
> mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde
> /dev/sdf /dev/sdg
> mdadm: looking for devices for /dev/md1
> mdadm: no RAID superblock on /dev/sdc
> mdadm: /dev/sdc has no superblock - assembly aborted
> 
You're trying to assemble the array from 6 disks here and one looks to
be dodgy.  That's okay so far.

> So I figure, oh look the disk sdc has gone cactus, I'll just remove it
> from the list. One of the advantages of mdadm.
> 
> mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg
> mdadm: looking for devices for /dev/md1
> mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1.
> mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0.
> mdadm: /dev/sde is identified as a member of /dev/md1, slot 1.
> mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5.
> mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4.
> mdadm: added /dev/sde to /dev/md1 as 1
> mdadm: no uptodate device for slot 2 of /dev/md1
> mdadm: no uptodate device for slot 3 of /dev/md1
> mdadm: added /dev/sdg to /dev/md1 as 4
> mdadm: added /dev/sdf to /dev/md1 as 5
> mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument
> mdadm: added /dev/sdd to /dev/md1 as 0
> mdadm: /dev/md1 assembled from 4 drives - not enough to start the array.
> 
Now you're trying to assemble with 5 disks and getting 4 out of 6 in the
array, and one at slot -1 (i.e. a spare).

> If found this really difficult to understand considering that I can
> get the output of mdamd -E /dev/sdb (other disks included to overload
> you with information)
> 
> mdadm -E /dev/sd[b-h]
> 
> /dev/sdb:
>   Magic : a92b4efc
> Version : 00.90.00
>UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf
>   Creation Time : Fri Oct  5 09:18:25 2007
>  Raid Level : raid5
> Device Size : 312571136 (298.09 GiB 320.07 GB)
>  Array Size : 1562855680 (1490.46 GiB 1600.36 GB)
>Raid Devices : 6
>   Total Devices : 6
> Preferred Minor : 1
> 
> Update Time : Tue Oct 16 20:03:13 2007
>   State : clean
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 0
>   Spare Devices : 0
>Checksum : 80d47486 - correct
>  Events : 0.623738
> 
>  Layout : left-symmetric
>  Chunk Size : 64K
> 
>   Number   Major   Minor   RaidDevice State
> this 6   8   16   -1  spare   /dev/sdb
> 
>0 0   8   800  active sync   /dev/sdf
>1 1   8  1281  active sync   /dev/.static/dev/sdi
>2 2   8  1442  active sync   /dev/.static/dev/sdj
>3 3   8   163  active sync   /dev/sdb
>4 4   8   644  active sync   /dev/sde
>5 5   8   965  active sync   /dev/sdg
>
And here we see that the array has 6 active devices and a spare.  You
currently have 4 working active devices, a failed active device and the
spare.  What's happened to the other device?  You can't get the array
working with 4 out of 6 devices so you'll need to either find the other
active device (and rebuild onto the spare) or get the failed disk
working again.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpHEB3tyY49z.pgp
Description: PGP signature


Re: Partitions with == or \approx same size ?

2007-07-20 Thread Robin Hill
On Fri Jul 20, 2007 at 07:54:54PM +0200, Seb wrote:

> But the number of blocks cannot be imposed when creating a partition,
> only the number of cylinders.
> 
If you hit "u" in fdisk then you can create partitions by sector rather
than by cylinder.

HTH,
Robin
-- 
 ___
    ( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpsoIAmjQDpX.pgp
Description: PGP signature


Re: Software RAID5 Horrible Write Speed On 3ware Controller!!

2007-07-18 Thread Robin Hill
On Wed Jul 18, 2007 at 01:26:11PM +0200, Hannes Dorbath wrote:

> I think what you might be experiencing is that XFS can read su,sw values 
> from the MD device and will automatically optimize itself, while it 
> can't do that for the HW RAID device. It is absolutely essential to 
> align your file system, to prevent implicit reads, needed for parity 
> calculations.
> 
> Set su to the stripe size you have configured in your controller (like 
> 128K) and sw to 9 (for a 10 disk RAID 5 array).
> 
Just to pick up on this one (as I'm about to reformat my array as XFS) -
does this actually work with a hardware controller?  Is there any
assurance that the XFS stripes align with the hardware RAID stripes?  Or
could you just end up offsetting everything so that every 128k chunk on
the XFS side of things fits half-and-half into two hardware raid chunks
(especially if the array has been partitioned)?  In which case would
it be better (performance-wise) to provide the su,sw values or not?

I'm planning on doing some benchmarking first but thought I'd check
whether there's any definitive answers first.

Cheers,
    Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgprkL6eq3YTe.pgp
Description: PGP signature


Re: Software based SATA RAID-5 expandable arrays?

2007-06-19 Thread Robin Hill
On Tue Jun 19, 2007 at 06:43:27AM -0700, Michael wrote:

> I have yet to get info on what build and/or distro I should use.  What
> commands I need in Linux to build an array.  What commands I need in
> Linux to expand the array, and what type of RAID-5 setup linux option
> I should use to   What SATA adapter cards I should use.  Or any other
> ideas and suggestions.
> 
You didn't actually _ask_ any of these questions in your original mail,
which may be why you've not had any answers yet!  Anyway - if you're
planning on going with a dedicated server build then you may be best
looking at one of the distros specifically designed for storage servers
(e.g. NASLite - http://www.serverelements.com/).  Alternately, as a
linux newbie, one of the more user-friendly distributions like Ubuntu
would probably be a good option.

All the linux RAID configuration is done through the mdadm command - the
manual page should give you a pretty good idea of what can be done and
how.  You're best coming back for more details when you know what disks,
controller, distribution, etc. you're going with as that'll influence
the exact command line to use.

I can't really offer much advice in the way of SATA cards as I've only
used the onboard stuff so far.  You're best looking at somewhere like
www.linux-drivers.org to check for compatability, or ask here about
specific cards - someone probably has experience, good or bad, they can
share.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpnaGrutL8Gm.pgp
Description: PGP signature


Re: Questions about the speed when MD-RAID array is being initialized.

2007-05-10 Thread Robin Hill
On Thu May 10, 2007 at 05:33:17PM -0400, Justin Piszcz wrote:

> 
> 
> On Thu, 10 May 2007, Liang Yang wrote:
> 
> >Hi,
> >
> >I created a MD-RAID5 array using 8 Maxtor SAS Disk Drives (chunk size is 
> >256k). I have measured the data transfer speed for single SAS disk drive 
> >(physical drive, not filesystem on it), it is roughly about 80~90MB/s.
> >
> >However, I notice MD also reports the speed for the RAID5 array when it is 
> >being initialized (cat /proc/mdstat). The speed reported by MD is not 
> >constant which is roughly from 70MB/s to 90MB/s (average is 85MB/s which 
> >is very close to the single disk data transfer speed).
> >
> >I just have three questions:
> >1. What is the exact meaning of the array speed reported by MD? Is that 
> >mesured for the whole array (I used 8 disks) or for just single underlying 
> >disk? If it is for the whole array, then 70~90B/s seems too low 
> >considering 8 disks are used for this array.
> >
> >2. How is this speed measured and what is the I/O packet size being used 
> >when the speed is measured?
> >
> >3. From the beginning when MD-RAID 5 array is initialized to the end when 
> >the intialization is done, the speed reports by MD gradually decrease from 
> >90MB/s down to 70MB/s. Why does the speed change? Why does the speed 
> >gradually decrease?
> >
> >Could anyone give me some explanation?
> >
> >I'm using RHEL 4U4 with 2.6.18 kernel. MDADM version is 1.6.
> >
> >Thanks a lot,
> >
> >Liang
> >
> 
> For no 3. because it starts from the fast end of the disk and works its 
> way to the slower part (slower speeds).
> 
And I'd assume for no 1 it's because it's only writing to a single disk
at this point, so will obviously be limited to the transfer rate of a
single disk.  RAID5 arrays are created as a degraded array, then the
final disk is "recovered" - this is done so that the array is ready for
use very quickly.  So what you're seeing in /proc/mdstat is the speed in
calculating and writing the data for the final drive (and is, unless
computationally limited, going to be the write speed of the single
drive).

HTH,
Robin

-- 
 ___
( ' } |   Robin Hill<[EMAIL PROTECTED]> |
   / / )  | Little Jim says |
  // !!   |  "He fallen in de water !!" |


pgpuxGkc8VmMd.pgp
Description: PGP signature