Re: suns raid-z / zfs
Keld Jørn Simonsen [EMAIL PROTECTED] wrote: On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote: Recovery after a failed drive would not be an easy operation, and I cannot imagine it being even close to the raw speed of the device. I thought this was a problem with most raid types, while reconstructioning, performance is quite slow. And as there has been some There is a difference between recovery is quite slow and raid device access is quite slow The former is an issue since it stretches the time where you're in non-redundant danger while the latter is just inconvenient. regards Mario -- I heard, if you play a NT-CD backwards, you get satanic messages... That's nothing. If you play it forwards, it installs NT. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How many drives are bad?
On Tue, 19 Feb 2008 14:25:28 -0500, Norman Elton [EMAIL PROTECTED] said: [ ... ] normelton The box presents 48 drives, split across 6 SATA normelton controllers. So disks sda-sdh are on one controller, normelton etc. In our configuration, I run a RAID5 MD array for normelton each controller, then run LVM on top of these to form normelton one large VolGroup. Pure genius! I wonder how many Thumpers have been configured in this well thought out way :-). BTW, just to be sure -- you are running LVM in default linear mode over those 6 RAID5s aren't you? normelton I found that it was easiest to setup ext3 with a max normelton of 2TB partitions. So running on top of the massive normelton LVM VolGroup are a handful of ext3 partitions, each normelton mounted in the filesystem. Uhm, assuming 500GB drives each RAID set has a capacity of 3.5TB, and odds are that a bit over half of those 2TB volumes will straddle array boundaries. Such attention to detail is quite remarkable :-). normelton This less than ideal (ZFS would allow us one large normelton partition), That would be another stroke of genius! (especially if you were still using a set of underlying RAID5s instead of letting ZFS do its RAIDZ thing). :-) normelton but we're rewriting some software to utilize the normelton multi-partition scheme. Good luck! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LVM performance (was: Re: RAID5 to RAID6 reshape?)
This might be related to raid chunk positioning with respect to LVM chunk positioning. If they interfere there indeed may be some performance drop. Best to make sure that those chunks are aligned together. Interesting. I'm seeing a 20% performance drop too, with default RAID and LVM chunk sizes of 64K and 4M, respectively. Since 64K divides 4M evenly, I'd think there shouldn't be such a big performance penalty. [ ... ] Those are as such not very meaningful. What matters most is whether the starting physical address of each logical volume extent is stripe aligned (and whether the filesystem makes use of that) and then the stripe size of the parity RAID set, not the chunk sizes in themselves. I am often surprised by how many people who use parity RAID don't seem to realize the crucial importance of physical stripe alignment, but I am getting used to it. Because of stripe alignment it is usually better to build parity arrays on top of partitions or volumes than viceversa, as it is often more difficult to align the start of a partition or volume to the underlying stripes than the reverse. But then those who understand the vital importance of stripe aligned writes for parity RAID often avoid using parity RAID :-). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How many drives are bad?
Pure genius! I wonder how many Thumpers have been configured in this well thought out way :-). I'm sorry I missed your contributions to the discussion a few weeks ago. As I said up front, this is a test system. We're still trying a number of different configurations, and are learning how best to recover from a fault. Guy Watkins proposed one a few weeks ago that we haven't yet tried, but given our current situation... it may be a good time to give it a shot. I'm still not convinced we were running a degraded array before this. One drive mysteriously dropped from the array, showing up as removed but not failed. We did not receive the notification that we did when the second actually failed. I'm still thinking its just one drive that actually failed. Assuming we go with Guy's layout of 8 arrays of 6 drives (picking one from each controller), how would you setup the LVM VolGroups over top of these already distributed arrays? Thanks again, Norman On Feb 20, 2008, at 2:21 AM, Peter Grandi wrote: On Tue, 19 Feb 2008 14:25:28 -0500, Norman Elton [EMAIL PROTECTED] said: [ ... ] normelton The box presents 48 drives, split across 6 SATA normelton controllers. So disks sda-sdh are on one controller, normelton etc. In our configuration, I run a RAID5 MD array for normelton each controller, then run LVM on top of these to form normelton one large VolGroup. Pure genius! I wonder how many Thumpers have been configured in this well thought out way :-). BTW, just to be sure -- you are running LVM in default linear mode over those 6 RAID5s aren't you? normelton I found that it was easiest to setup ext3 with a max normelton of 2TB partitions. So running on top of the massive normelton LVM VolGroup are a handful of ext3 partitions, each normelton mounted in the filesystem. Uhm, assuming 500GB drives each RAID set has a capacity of 3.5TB, and odds are that a bit over half of those 2TB volumes will straddle array boundaries. Such attention to detail is quite remarkable :-). normelton This less than ideal (ZFS would allow us one large normelton partition), That would be another stroke of genius! (especially if you were still using a set of underlying RAID5s instead of letting ZFS do its RAIDZ thing). :-) normelton but we're rewriting some software to utilize the normelton multi-partition scheme. Good luck! - To unsubscribe from this list: send the line unsubscribe linux- raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How many drives are bad?
On Thu, 21 Feb 2008 13:12:30 -0500, Norman Elton [EMAIL PROTECTED] said: [ ... ] normelton Assuming we go with Guy's layout of 8 arrays of 6 normelton drives (picking one from each controller), Guy Watkins proposed another one too: «Assuming the 6 controllers are equal, I would make 3 16 disk RAID6 arrays using 2 disks from each controller. That way any 1 controller can fail and your system will still be running. 6 disks will be used for redundancy. Or 6 8 disk RAID6 arrays using 1 disk from each controller). That way any 2 controllers can fail and your system will still be running. 12 disks will be used for redundancy. Might be too excessive!» So, I would not be overjoyed with either physical configuration, except in a few particular cases. It is very amusing to read such worries about host adapter failures, and somewhat depressing to see too excessive used to describe 4+2 parity RAID. normelton how would you setup the LVM VolGroups over top of normelton these already distributed arrays? That looks like a trick question, or at least an incorrect question; because I would rather not do anything like that except in a very few cases. However, if one wants to do a bad thing in the least bad way, perhaps a volume group per array would be least bad. Going back to your original question: «So... we're curious how Linux will handle such a beast. Has anyone run MD software RAID over so many disks? Then piled LVM/ext3 on top of that?» I haven't because it sounds rather inappropriate to me. «Any suggestions?» Not easy to respond without a clear statement of what the array be used for: RAID levels and file systems are very anisotropic in both performance an resilience, so a particular configuration may be very good for something but not for something else. For example a 48 drive RAID0 with 'ext2' on top would be very good for some cases, but perhaps not for archival :-). In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in very few cases and RAID6 almost never. In general current storage practices do not handle that well large single computer storage pools (just consider 'fsck' times) and beyond 10TB I reckon that currently only multi-host parallel/cluster file systems are good enough, for example Lustre (for smaller multi TB filesystem I'd use JFS or XFS). But then Lustre can be also used on a single machine with multiple (say 2TB) block devices, and this may be the best choice here too if a single virtual filesystem is the goal: http://wiki.Lustre.org/index.php?title=Lustre_Howto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How many drives are bad?
Peter Grandi wrote: In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in Interesting movement. What do you think is their stance on Raid Fix? :) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID10 far (f2) read throughput on random and sequential / read-ahead
'md' performs wonderfully. Thanks to every contributor! I pitted it against a 3ware 9650 and 'md' won on nearly every account (albeit on RAID5 for sequential I/O the 3ware is a distant winner): http://www.makarevitch.org/rant/raid/#3wmd On RAID10 f2 a small read-ahead reduces the throughput on sequential read, but even a low value (768 for the whole 'md' block device, 0 for the underlying spindles) enables very good sequential read performance (300 MB/s on 6 low-end Hitachi 500 GB spindles). What baffles me is that, on a 1.4TB array served by a box having 12 GB RAM (low cache-hit ratio), the random access performance remains stable and high (450 IOPS with 48 threads, 20% writes - 10% fsync'ed), even with a fairly high read-ahead (16k). How comes?! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html