Re: How many drives are bad?

2008-02-21 Thread pg_mh
 On Thu, 21 Feb 2008 13:12:30 -0500, Norman Elton

[ ... ]

normelton Assuming we go with Guy's layout of 8 arrays of 6
normelton drives (picking one from each controller),

Guy Watkins proposed another one too:

   «Assuming the 6 controllers are equal, I would make 3 16 disk
RAID6 arrays using 2 disks from each controller.  That way
any 1 controller can fail and your system will still be
running. 6 disks will be used for redundancy.

Or 6 8 disk RAID6 arrays using 1 disk from each controller).
That way any 2 controllers can fail and your system will
still be running. 12 disks will be used for redundancy.
Might be too excessive!»

So, I would not be overjoyed with either physical configuration,
except in a few particular cases. It is very amusing to read such
worries about host adapter failures, and somewhat depressing to
see too excessive used to describe 4+2 parity RAID.

normelton how would you setup the LVM VolGroups over top of
normelton these already distributed arrays?

That looks like a trick question, or at least an incorrect
question; because I would rather not do anything like that
except in a very few cases.

However, if one wants to do a bad thing in the least bad way,
perhaps a volume group per array would be least bad.

Going back to your original question:

  «So... we're curious how Linux will handle such a beast. Has
   anyone run MD software RAID over so many disks? Then piled
   LVM/ext3 on top of that?»

I haven't because it sounds rather inappropriate to me.

  «Any suggestions?»

Not easy to respond without a clear statement of what the array
be used for: RAID levels and file systems are very anisotropic
in both performance an resilience, so a particular configuration
may be very good for something but not for something else.

For example a 48 drive RAID0 with 'ext2' on top would be very
good for some cases, but perhaps not for archival :-).
In general, I'd use RAID10 (, RAID5 in
very few cases and RAID6 almost never.

In general current storage practices do not handle that well
large single computer storage pools (just consider 'fsck'
times) and beyond 10TB I reckon that currently only multi-host
parallel/cluster file systems are good enough, for example
Lustre (for smaller multi TB filesystem I'd use JFS or XFS).

But then Lustre can be also used on a single machine with
multiple (say 2TB) block devices, and this may be the best
choice here too if a single virtual filesystem is the goal:
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: raid 10 su, sw settings

2007-12-31 Thread pg_mh
 On Sun, 30 Dec 2007 19:00:39 -0500, Brad Langhorst

[ ... VMware virtual disks over RAID ... ]

brad  - 4 disk raid 10
brad  - 64k stripe size

Stripe size or chunk size? Try reducing the chunk size if that
is the chunk size, and applications in the VM do short reads or
writes with intervals. But things seem not to require a lot of

[ ... ]

brad Typical blocks/sec from iostat during large file movements
brad is about 100M/s read and 80M/s write.

That's fine, you are getting more or less the combined speed of
2 drives, which is what standard RAID10 over 4 drives should
give you.

brad  - is the partition aligned correctly? i fear not... [
brad... ]
brad  - What should the sunit and swidth settings be during
bradmount? [ ... ]

These really matter for parity RAID that do read-modify-write of
unaligned sector clusters. But it is rather less essential to
say the least for non-parity RAID, as it only affects speed with
respect to chunk size if operations are of the order of size as
the chunk size or smaller.

If the applications in your VM do mostly reads, try to switch to
RAID0 f2 software RAID, unless they often do concurrent reads,
in which case that's a bad idea.
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: Raid over 48 disks

2007-12-25 Thread pg_mh
 On Wed, 19 Dec 2007 07:28:20 +1100, Neil Brown

[ ... what to do with 48 drive Sun Thumpers ... ]

neilb I wouldn't create a raid5 or raid6 on all 48 devices.
neilb RAID5 only survives a single device failure and with that
neilb many devices, the chance of a second failure before you
neilb recover becomes appreciable.

That's just one of the many problems, other are:

* If a drive fails, rebuild traffic is going to hit hard, with
  reading in parallel 47 blocks to compute a new 48th.

* With a parity strip length of 48 it will be that much harder
  to avoid read-modify before write, as it will be avoidable
  only for writes of at least 48 blocks aligned on 48 block
  boundaries. And reading 47 blocks to write one is going to be
  quite painful.

[ ... ]

neilb RAID10 would be a good option if you are happy wit 24
neilb drives worth of space. [ ... ]

That sounds like the only feasible option (except for the 3
drive case in most cases). Parity RAID does not scale much
beyond 3-4 drives.

neilb Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use
neilb RAID0 to combine them together. This would give you
neilb adequate reliability and performance and still a large
neilb amount of storage space.

That sounds optimistic to me: the reason to do a RAID50 of
8x(5+1) can only be to have a single filesystem, else one could
have 8 distinct filesystems each with a subtree of the whole.
With a single filesystem the failure of any one of the 8 RAID5
components of the RAID0 will cause the loss of the whole lot.

So in the 47+1 case a loss of any two drives would lead to
complete loss; in the 8x(5+1) case only a loss of two drives in
the same RAID5 will.

It does not sound like a great improvement to me (especially
considering the thoroughly inane practice of building arrays out
of disks of the same make and model taken out of the same box).

There are also modest improvements in the RMW strip size and in
the cost of a rebuild after a single drive loss. Probably the
reduction in the RMW strip size is the best improvement.

Anyhow, let's assume 0.5TB drives; with a 47+1 we get a single
23.5TB filesystem, and with 8*(5+1) we get a 20TB filesystem.
With current filesystem technology either size is worrying, for
example as to time needed for an 'fsck'.

In practice RAID5 beyond 3-4 drives seems only useful for almost
read-only filesystems where restoring from backups is quick and
easy, never mind the 47+1 case or the 8x(5+1) one, and I think
that giving some credit even to the latter arrangement is not
quite right...
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at