Re: suns raid-z / zfs

2008-02-21 Thread Mario 'BitKoenig' Holbe
Keld Jørn Simonsen [EMAIL PROTECTED] wrote:
 On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote:
 Recovery after a failed drive would not be an easy operation, and I
 cannot imagine it being even close to the raw speed of the device.
 I thought this was a problem with most raid types, while
 reconstructioning, performance is quite slow. And as there has been some

There is a difference between recovery is quite slow and raid device
access is quite slow The former is an issue since it stretches the time
where you're in non-redundant danger while the latter is just
inconvenient.


regards
   Mario
-- 
I heard, if you play a NT-CD backwards, you get satanic messages...
That's nothing. If you play it forwards, it installs NT.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How many drives are bad?

2008-02-21 Thread Peter Grandi
 On Tue, 19 Feb 2008 14:25:28 -0500, Norman Elton
 [EMAIL PROTECTED] said:

[ ... ]

normelton The box presents 48 drives, split across 6 SATA
normelton controllers. So disks sda-sdh are on one controller,
normelton etc. In our configuration, I run a RAID5 MD array for
normelton each controller, then run LVM on top of these to form
normelton one large VolGroup.

Pure genius! I wonder how many Thumpers have been configured in
this well thought out way :-).

BTW, just to be sure -- you are running LVM in default linear
mode over those 6 RAID5s aren't you?

normelton I found that it was easiest to setup ext3 with a max
normelton of 2TB partitions. So running on top of the massive
normelton LVM VolGroup are a handful of ext3 partitions, each
normelton mounted in the filesystem.

Uhm, assuming 500GB drives each RAID set has a capacity of
3.5TB, and odds are that a bit over half of those 2TB volumes
will straddle array boundaries. Such attention to detail is
quite remarkable :-).

normelton This less than ideal (ZFS would allow us one large
normelton partition),

That would be another stroke of genius! (especially if you were
still using a set of underlying RAID5s instead of letting ZFS do
its RAIDZ thing). :-)

normelton but we're rewriting some software to utilize the
normelton multi-partition scheme.

Good luck!
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LVM performance (was: Re: RAID5 to RAID6 reshape?)

2008-02-21 Thread Peter Grandi
 This might be related to raid chunk positioning with respect
 to LVM chunk positioning. If they interfere there indeed may
 be some performance drop. Best to make sure that those chunks
 are aligned together.

 Interesting. I'm seeing a 20% performance drop too, with default
 RAID and LVM chunk sizes of 64K and 4M, respectively. Since 64K
 divides 4M evenly, I'd think there shouldn't be such a big
 performance penalty. [ ... ]

Those are as such not very meaningful. What matters most is
whether the starting physical address of each logical volume
extent is stripe aligned (and whether the filesystem makes use
of that) and then the stripe size of the parity RAID set, not
the chunk sizes in themselves.

I am often surprised by how many people who use parity RAID
don't seem to realize the crucial importance of physical stripe
alignment, but I am getting used to it.

Because of stripe alignment it is usually better to build parity
arrays on top of partitions or volumes than viceversa, as it is
often more difficult to align the start of a partition or volume
to the underlying stripes than the reverse.

But then those who understand the vital importance of stripe
aligned writes for parity RAID often avoid using parity RAID
:-).
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How many drives are bad?

2008-02-21 Thread Norman Elton

Pure genius! I wonder how many Thumpers have been configured in
this well thought out way :-).


I'm sorry I missed your contributions to the discussion a few weeks ago.

As I said up front, this is a test system. We're still trying a number  
of different configurations, and are learning how best to recover from  
a fault. Guy Watkins proposed one a few weeks ago that we haven't yet  
tried, but given our current situation... it may be a good time to  
give it a shot.


I'm still not convinced we were running a degraded array before this.  
One drive mysteriously dropped from the array, showing up as removed  
but not failed. We did not receive the notification that we did when  
the second actually failed. I'm still thinking its just one drive that  
actually failed.


Assuming we go with Guy's layout of 8 arrays of 6 drives (picking one  
from each controller), how would you setup the LVM VolGroups over top  
of these already distributed arrays?


Thanks again,

Norman



On Feb 20, 2008, at 2:21 AM, Peter Grandi wrote:


On Tue, 19 Feb 2008 14:25:28 -0500, Norman Elton
[EMAIL PROTECTED] said:


[ ... ]

normelton The box presents 48 drives, split across 6 SATA
normelton controllers. So disks sda-sdh are on one controller,
normelton etc. In our configuration, I run a RAID5 MD array for
normelton each controller, then run LVM on top of these to form
normelton one large VolGroup.

Pure genius! I wonder how many Thumpers have been configured in
this well thought out way :-).

BTW, just to be sure -- you are running LVM in default linear
mode over those 6 RAID5s aren't you?

normelton I found that it was easiest to setup ext3 with a max
normelton of 2TB partitions. So running on top of the massive
normelton LVM VolGroup are a handful of ext3 partitions, each
normelton mounted in the filesystem.

Uhm, assuming 500GB drives each RAID set has a capacity of
3.5TB, and odds are that a bit over half of those 2TB volumes
will straddle array boundaries. Such attention to detail is
quite remarkable :-).

normelton This less than ideal (ZFS would allow us one large
normelton partition),

That would be another stroke of genius! (especially if you were
still using a set of underlying RAID5s instead of letting ZFS do
its RAIDZ thing). :-)

normelton but we're rewriting some software to utilize the
normelton multi-partition scheme.

Good luck!
-
To unsubscribe from this list: send the line unsubscribe linux- 
raid in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How many drives are bad?

2008-02-21 Thread pg_mh
 On Thu, 21 Feb 2008 13:12:30 -0500, Norman Elton
 [EMAIL PROTECTED] said:

[ ... ]

normelton Assuming we go with Guy's layout of 8 arrays of 6
normelton drives (picking one from each controller),

Guy Watkins proposed another one too:

   «Assuming the 6 controllers are equal, I would make 3 16 disk
RAID6 arrays using 2 disks from each controller.  That way
any 1 controller can fail and your system will still be
running. 6 disks will be used for redundancy.

Or 6 8 disk RAID6 arrays using 1 disk from each controller).
That way any 2 controllers can fail and your system will
still be running. 12 disks will be used for redundancy.
Might be too excessive!»

So, I would not be overjoyed with either physical configuration,
except in a few particular cases. It is very amusing to read such
worries about host adapter failures, and somewhat depressing to
see too excessive used to describe 4+2 parity RAID.

normelton how would you setup the LVM VolGroups over top of
normelton these already distributed arrays?

That looks like a trick question, or at least an incorrect
question; because I would rather not do anything like that
except in a very few cases.

However, if one wants to do a bad thing in the least bad way,
perhaps a volume group per array would be least bad.

Going back to your original question:

  «So... we're curious how Linux will handle such a beast. Has
   anyone run MD software RAID over so many disks? Then piled
   LVM/ext3 on top of that?»

I haven't because it sounds rather inappropriate to me.

  «Any suggestions?»

Not easy to respond without a clear statement of what the array
be used for: RAID levels and file systems are very anisotropic
in both performance an resilience, so a particular configuration
may be very good for something but not for something else.

For example a 48 drive RAID0 with 'ext2' on top would be very
good for some cases, but perhaps not for archival :-).
In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in
very few cases and RAID6 almost never.

In general current storage practices do not handle that well
large single computer storage pools (just consider 'fsck'
times) and beyond 10TB I reckon that currently only multi-host
parallel/cluster file systems are good enough, for example
Lustre (for smaller multi TB filesystem I'd use JFS or XFS).

But then Lustre can be also used on a single machine with
multiple (say 2TB) block devices, and this may be the best
choice here too if a single virtual filesystem is the goal:

  http://wiki.Lustre.org/index.php?title=Lustre_Howto
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How many drives are bad?

2008-02-21 Thread Peter Rabbitson

Peter Grandi wrote:

In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in


Interesting movement. What do you think is their stance on Raid Fix? :)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID10 far (f2) read throughput on random and sequential / read-ahead

2008-02-21 Thread Nat Makarevitch
'md' performs wonderfully. Thanks to every contributor!

I pitted it against a 3ware 9650 and 'md' won on nearly every account (albeit on
RAID5 for sequential I/O the 3ware is a distant winner):
http://www.makarevitch.org/rant/raid/#3wmd

On RAID10 f2 a small read-ahead reduces the throughput on sequential read, but
even a low value (768 for the whole 'md' block device, 0 for the underlying
spindles) enables very good sequential read performance (300 MB/s on 6 low-end
Hitachi 500 GB spindles).

What baffles me is that, on a 1.4TB array served by a box having 12 GB RAM (low
cache-hit ratio), the random access performance remains stable and high (450
IOPS with 48 threads, 20% writes - 10% fsync'ed), even with a fairly high
read-ahead (16k). How comes?!

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html