Re: Two-disk RAID5?

2006-04-26 Thread Jon Lewis

On Wed, 26 Apr 2006, John Rowe wrote:


I'm about to create a RAID1 file system and a strange thought occurs to
me: if I create a two-disk RAID5 array then I can grow it later by the
simple expedient of adding a third disk and hence doubling its size.


No.  When one of the 2 drives in your RAID5 dies, and all you have for 
some blocks is parity info, how will the missing data be reconstructed?


You could [I suspect] create a 2 disk RAID5 in degraded mode (3rd member 
missing), but it'll obviously lack redundancy until you add a 3rd disk, 
which won't add anything to your RAID5 storage capacity.


--
 Jon Lewis   |  I route
 Senior Network Engineer |  therefore you are
 Atlantic Net|
_ http://www.lewis.org/~jlewis/pgp for PGP public key_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two-disk RAID5?

2006-04-26 Thread Tuomas Leikola
 No.  When one of the 2 drives in your RAID5 dies, and all you have for
 some blocks is parity info, how will the missing data be reconstructed?

 You could [I suspect] create a 2 disk RAID5 in degraded mode (3rd member
 missing), but it'll obviously lack redundancy until you add a 3rd disk,
 which won't add anything to your RAID5 storage capacity.

IMO if you have a 2-disk raid5, the parity for each block is the same
as the data. There is performance drop as I suspect md isn't smart
enough to read data from both disks, but that's all.

When one disk fails, the (lone) parity block is quite enough to
reconstruct. With XOR parity, you can always assume any amount of
additional disks full of zero, it doesn't really change the algorithm.

(maybe mdadm could/can change a raid-1 into raid5 by just changing the
superblocks, for the purpose of expanding into more disks..)

- tuomas
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Two-disk RAID5?

2006-04-26 Thread Jon Lewis

On Wed, 26 Apr 2006, Jansen, Frank wrote:


It is not possible to flip a bit to change a set of disks from RAID 1 to
RAID 5, as the physical layout is different.


As Tuomas pointed out though, a 2 disk RAID5 is kind of a special case 
where all you have is data and parity which is actually also just data. 
Seems kind of like a RAID1 with extra overhead.  I don't think I've ever 
heard of a RAID5 implementation willing to handle 3 drives though.


I suspect I should have just kept out of this, and waited for someone like 
Neil to answer authoratatively.


So...Neil, what's the right answer to Tuomas's 2 disk RAID5 question? :)

--
 Jon Lewis   |  I route
 Senior Network Engineer |  therefore you are
 Atlantic Net|
_ http://www.lewis.org/~jlewis/pgp for PGP public key_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Two-disk RAID5?

2006-04-26 Thread Neil Brown
On Wednesday April 26, [EMAIL PROTECTED] wrote:
 
 I suspect I should have just kept out of this, and waited for someone like 
 Neil to answer authoratatively.
 
 So...Neil, what's the right answer to Tuomas's 2 disk RAID5 question? :)
 

.. and a deep resounding voice from on-high spoke and in it's infinite
wisdom it said
 
   yeh, whatever


The data layout on a 2disk raid5 and a 2 disk raid1 is identical (if
you ignore chunksize issues (raid1 doesn't need one) and the
superblock (which isn't part of the data)).  Each drive contains
identical data(*).

Write throughput to a the r5 would be a bit slower because data is
always copied in memory first, then written.
Read through put would be largely the same if the r5 chunk size was
fairly large, but much poorer for r5 if the chunksize was small.

Converting a raid1 to a raid5 while offline would be quite straight
forward except for the chunksize issue.  If the r1 wasn't a multiple
of the chunksize you chose for r5, then you would lose the last
fraction of a chunk.  So if you are planning to do this, set the size
of your r1 to something that is nice and round (e.g. a multiple of
128k).

Converting a raid1 to a raid5 while online is something I have been
thinking about, but it is not likely to happen any time soon.

I think that answers all the issues.

NeilBrown

(*) The term 'mirror' for raid1 has always bothered me because a
mirror presents a reflected image, while raid1 copies the data without
any transformation.

With a 2drive raid5, one drive gets the original data, and the other
drive gets the data after it has been 'reflected' through an XOR
operation, so maybe a 2drive raid5 is really a 'mirrored' pair
Except that the data is still the same as XOR with 0 produces no
change.
So, if we made a tiny change to raid5 and got the xor operation to
start with 0xff in every byte, then the XOR would reflect each byte
in a reasonable meaningful way, and we might actually get a mirrored
pair!!!  

But I don't think that would provide any real value :-)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Trying to start dirty, degraded RAID6 array

2006-04-26 Thread Christopher Smith

The short version:

I have a 12-disk RAID6 array that has lost a device and now whenever I 
try to start it with:


mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1

I get:

mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

And in dmesg:

md: bindsdk1
md: bindsdi1
md: bindsdj1
md: bindsde1
md: bindsdf1
md: bindsdg1
md: bindsdb1
md: bindsdd1
md: bindsda1
md: bindsdc1
md: bindsdl1
md: md0: raid array is not clean -- starting background reconstruction
raid6: device sdl1 operational as raid disk 0
raid6: device sdc1 operational as raid disk 11
raid6: device sda1 operational as raid disk 10
raid6: device sdd1 operational as raid disk 9
raid6: device sdb1 operational as raid disk 8
raid6: device sdg1 operational as raid disk 6
raid6: device sdf1 operational as raid disk 5
raid6: device sde1 operational as raid disk 4
raid6: device sdj1 operational as raid disk 3
raid6: device sdi1 operational as raid disk 2
raid6: device sdk1 operational as raid disk 1
raid6: cannot start dirty degraded array for md0
RAID6 conf printout:
 --- rd:12 wd:11 fd:1
 disk 0, o:1, dev:sdl1
 disk 1, o:1, dev:sdk1
 disk 2, o:1, dev:sdi1
 disk 3, o:1, dev:sdj1
 disk 4, o:1, dev:sde1
 disk 5, o:1, dev:sdf1
 disk 6, o:1, dev:sdg1
 disk 8, o:1, dev:sdb1
 disk 9, o:1, dev:sdd1
 disk 10, o:1, dev:sda1
 disk 11, o:1, dev:sdc1
raid6: failed to run raid set md0
md: pers-run() failed ...


I'm 99% sure the data is ok and I'd like to know how to force the array 
online.




Longer version:

A couple of days ago I started having troubles with my fileserver 
mysteriously hanging during boot (I was messing with trying to get Xen 
running at the time, so lots of reboots were involved).  I finally 
nailed it down to the autostarting of the RAID array.


After several hours of pulling CPUs, SATA cards, RAM (not to mention 
some scary problems with memtest86+ that turned out to be because USB 
Legacy was enabled) I finally managed to figure out that one of my 
drives would simply stop transferring data after about the first gig 
(tested with dd, monitoring with iostat).  About 30 seconds after the 
drive stops, the rest of the machine also hangs.


Interestingly, there are no error messages anywhere I could find 
indicating the drive was having problem.  Even its SMART test (smartctl 
-t long) says it's ok.  This made the problem substantially more 
difficult to figure out.


I then tried to start the array without the broken disk and had the 
problem mentioned in the short version above - the array wouldn't start, 
presumably because its rebuild had been started and (uncleanly) stopped 
about a dozen times since it last succeeeded.  I finally managed to get 
the array online by starting it with all the disks, then immediately 
knocking the one I knew to be bad offline with 'mdadm /dev/md0 -f 
/dev/sdh1' before it hit the point where it would hang.  After that the 
rebuild completed without error (I didn't touch the machine at all while 
it was rebuilding).


However, a few hours after the rebuild completed, a power failure killed 
the machine again and now I can't start the array, as outlined in the 
short version above.  I must admit I find it a bit weird that the 
array is dirty and degraded after it had successfully completed a rebuild.


Unfortunately the original failed drive (/dev/sdh) is no longer 
available, so I can't do my original trick again.  I'm pretty sure - 
based on the rebuild completing previously - that the data will be fine 
if I can just get the array back online, is there some sort of 
--really-force switch to mdadm ?  Can the array be brought back online 
*without* triggering a rebuild, so I can get as much data as possible 
off and then start from scratch again ?


CS

Here is the 'mdadm --examine /dev/sdX' output for each of the remaining 
drives, if it is helpful:


/dev/sda1:
 Magic : a92b4efc
   Version : 00.90.02
  UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
 Creation Time : Wed Feb  1 01:09:11 2006
Raid Level : raid6
   Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
  Raid Devices : 12
 Total Devices : 11
Preferred Minor : 0

   Update Time : Wed Apr 26 22:30:01 2006
 State : active
 Active Devices : 11
Working Devices : 11
 Failed Devices : 1
 Spare Devices : 0
  Checksum : 1685ebfc - correct
Events : 0.11176511


 Number   Major   Minor   RaidDevice State
this10   81   10  active sync   /dev/sda1

  0 0   8  1770  active sync   /dev/sdl1
  1 1   8  1611  active sync   /dev/sdk1
  2 2   8  1292  active sync   /dev/sdi1
  3 3   8  1453  active sync   /dev/sdj1
  4 4   8   654  active sync   /dev/sde1
  5 5   8   815  active sync   /dev/sdf1
  6 6   8   976  active sync   /dev/sdg1
  7 7   007  faulty removed
  

Re: Trying to start dirty, degraded RAID6 array

2006-04-26 Thread Neil Brown
On Thursday April 27, [EMAIL PROTECTED] wrote:
 The short version:
 
 I have a 12-disk RAID6 array that has lost a device and now whenever I 
 try to start it with:
 
 mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1
 
 I get:
 
 mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
 
...
 raid6: cannot start dirty degraded array for md0

The '-f' is meant to make this work.  However it seems there is a bug.

Could you please test this patch?  It isn't exactly the right fix, but
it definitely won't hurt.

Thanks,
NeilBrown

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./super0.c |1 +
 1 file changed, 1 insertion(+)

diff ./super0.c~current~ ./super0.c
--- ./super0.c~current~ 2006-03-28 17:10:51.0 +1100
+++ ./super0.c  2006-04-27 10:03:40.0 +1000
@@ -372,6 +372,7 @@ static int update_super0(struct mdinfo *
if (sb-level == 5 || sb-level == 4 || sb-level == 6)
/* need to force clean */
sb-state |= (1  MD_SB_CLEAN);
+   rv = 1;
}
if (strcmp(update, assemble)==0) {
int d = info-disk.number;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linear writes to raid5

2006-04-26 Thread Neil Brown
On Thursday April 20, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  
  What is the rationale for your position?
 
 My rationale was that if md layer receives *write* requests not smaller
 than a full stripe size, it is able to omit reading data to update, and
 can just calculate new parity from the new data.  Hence, combining a
 dozen small write requests coming from a filesystem to form a single
 request = full stripe size should dramatically increase
 performance.

That makes sense.

However in both cases (above and below raid5), the device receiving
the requests is in a better position to know what size is a good
size than the client sending the requests.
That is exactly what the 'plugging' concept is for.  When a request
arrives, the device is 'plugged' so that it won't process new
requests, and the request plus any following requests are queued.  At
some point the queue is unplugged and the device should be able to
collect related requests to make large requests of an appropriate size
and alignment for the device.

The current suggestion is that plugging is quite working right for
raid5.  That is certainly possible.


 
 Eg, when I use dd with O_DIRECT mode (oflag=direct) and experiment with
 different block size, write performance increases alot when bs becomes
 full stripe size.  Ofcourse it decreases again when bs is increased a
 bit further (as md starts reading again, to construct parity blocks).
 

Yes O_DIRECT is essentially saying I know what I am doing and I
want to bypass all the smarts and go straight to the device.
O_DIRECT requests should certainly be sized and aligned to make the
device.  For non-O_DIRECT it shouldn't matter so much.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Trying to start dirty, degraded RAID6 array

2006-04-26 Thread Christopher Smith

Neil Brown wrote:

The '-f' is meant to make this work.  However it seems there is a bug.

Could you please test this patch?  It isn't exactly the right fix, but
it definitely won't hurt.


Thanks, Neil, I'll give this a go when I get home tonight.

Is there any way to start an array without kicking off a rebuild ?

CS
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Trying to start dirty, degraded RAID6 array

2006-04-26 Thread Neil Brown
On Thursday April 27, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  The '-f' is meant to make this work.  However it seems there is a bug.
  
  Could you please test this patch?  It isn't exactly the right fix, but
  it definitely won't hurt.
 
 Thanks, Neil, I'll give this a go when I get home tonight.
 
 Is there any way to start an array without kicking off a rebuild ?

echo 1  /sys/module/md_mod/parameters/start_ro 

If you do this, then arrays will be read-only when they are started,
and so will not do a rebuild.  The first write request to the array
(e.g. if you mount a filesystem) will cause a switch to read/write and
any required rebuild will start. 

echo 0  
will revert the effect.

This requires a reasonably recent kernel.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html