safe mode for grow?

2008-01-27 Thread Nagilum

Hi,

After having been through a few ups and downs with md and mdadm I  
think it would be a good idea if mdadm had a safe mode for growing  
raid5.

Safe mode would entail several things:
1. It would kick off a resync.
2. It would write to the spare drive to ensure it is ok. (we have to  
wait for the resync anyway)

3. It would make a backup file mandatory.
4. It would only allow to grow the raid by one device. (because if  
more than one of the new drives go bad, we have a big problem)


I know it would probably result in a bit of a deviation of the usual  
mdadm behaviour but I think its worth it.

Kind regards,
Alex.


#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..



pgp6rjuteGLTw.pgp
Description: PGP Digital Signature


write-intent bitmaps

2008-01-27 Thread Russell Coker
http://lists.debian.org/debian-devel/2008/01/msg00921.html

Are they regarded as a stable feature?  If so I'd like to see distributions 
supporting them by default.  I've started a discussion in Debian on this 
topic, see the above URL for details.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: write-intent bitmaps

2008-01-27 Thread Neil Brown
On Sunday January 27, [EMAIL PROTECTED] wrote:
 http://lists.debian.org/debian-devel/2008/01/msg00921.html
 
 Are they regarded as a stable feature?  If so I'd like to see distributions 
 supporting them by default.  I've started a discussion in Debian on this 
 topic, see the above URL for details.

Yes, it is regarded as stable.

However it can be expected to reduce write throughput.  A reduction of
several percent would not be surprising, and depending in workload it
could probably be much higher.

It is quite easy to add or remove a bitmap on an active array, so
making it a default would probably be fine providing it was easy for
an admin to find out about it and remove the bitmap is they wanted the
extra performance.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: write-intent bitmaps

2008-01-27 Thread Russell Coker
On Sunday 27 January 2008 22:21, Neil Brown [EMAIL PROTECTED] wrote:
 On Sunday January 27, [EMAIL PROTECTED] wrote:
  http://lists.debian.org/debian-devel/2008/01/msg00921.html
 
  Are they regarded as a stable feature?  If so I'd like to see
  distributions supporting them by default.  I've started a discussion in
  Debian on this topic, see the above URL for details.

 Yes, it is regarded as stable.

Thanks for that information.

 However it can be expected to reduce write throughput.  A reduction of
 several percent would not be surprising, and depending in workload it
 could probably be much higher.

It seems to me that losing a few percent of performance all the time is better 
than a dramatic performance loss for an hour or two when things go wrong.

 It is quite easy to add or remove a bitmap on an active array, so
 making it a default would probably be fine providing it was easy for
 an admin to find out about it and remove the bitmap is they wanted the
 extra performance.

I hadn't realised that.  So having this in the installer is not as important 
as I previously thought.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2 failed disks RAID 5 behavior bug?

2008-01-27 Thread TJ Harrell
Hi!

Let me apologize in advance for not having as much information as I'd like
to.

I have a RAID 5 array with 3 elements. Kernel is 2.6.23.

I had a SATA disk fail. On analysis, it's SMART claimed it had an
'electrical failure'. The drive sounded like an angry buzz-saw, so I'm
guessing more was going on with it. Anyway, when the drive failed,
/proc/mdstat showed two drives marked as failed [__U]. The other failed
drive was on the other channel of the same SATA controller. On inspection,
this second drive works fine. I'm guessing somehow the failing drive caused
the SATA controller to lock or something, which caused the RAID layer to
think the second drive was failed.

The problematic behavior is that once two elements were marked as failed,
any read or write access resulted in an I/O Failure message.
Unfortunately, I believe some writes were made to the array as the Event
Counter did not match on the two functional elements, and there was quite a
bit of data corruption of the superblock of the FS. 

I'm sorry I don't have more specifics, but I hope perhaps Mr. Brown or
someone else who knows the RAID code will consider making some sort of
safeguard to prevent writing to a RAID 5 array when more than one element is
failed.

PS: Please CC: me. :)

Thank You!
TJ Harrell
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


striping of a 4 drive raid10

2008-01-27 Thread Keld Jørn Simonsen
Hi

I have tried to make a striping raid out of my new 4 x 1 TB
SATA-2 disks. I tried raid10,f2 in several ways:

1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0
of md0+md1

2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
of md0+md1

3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of 
md0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB

4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB

5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1

My new disks give a transfer rate of about 80 MB/s, so I expected
to have something like 320 MB/s for the whole raid, but I did not get
more than about 180 MB/s.

I think it may be something with the layout, that in effect 
the drives should be something like:

  sda1 sdb1sdc1  sdd1
   01   2 3
   45   6 7

And this was not really doable for the combination of raids,
because thet combinations give different block layouts.

How can it be done? Do we need a new raid type?

Best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: striping of a 4 drive raid10

2008-01-27 Thread Peter Grandi
 On Sun, 27 Jan 2008 20:33:45 +0100, Keld Jørn Simonsen
 [EMAIL PROTECTED] said:

keld Hi I have tried to make a striping raid out of my new 4 x
keld 1 TB SATA-2 disks. I tried raid10,f2 in several ways:

keld 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0
keldof md0+md1
keld 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
keldof md0+md1
keld 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize 
of 
keldmd0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB
keld 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
keldof md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB

These stacked RAID levels don't make a lot of sense.

keld 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1

This also does not make a lot of sense. Why have four mirrors
instead of two?

Instead, try 'md0 = raid10,f2' for example. The first mirror of
will be striped across the outer half of all four drives, and
the second mirrors will be rotated in the inner half of each
drive.

Which of course means that reads will be quite quick, but writes
and degraded operation will be slower.

Consider this post for more details:

  http://www.spinics.net/lists/raid/msg18130.html

[ ... ]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: striping of a 4 drive raid10

2008-01-27 Thread Neil Brown
On Sunday January 27, [EMAIL PROTECTED] wrote:
 Hi
 
 I have tried to make a striping raid out of my new 4 x 1 TB
 SATA-2 disks. I tried raid10,f2 in several ways:
 
 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0
 of md0+md1
 
 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
 of md0+md1
 
 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of 
 md0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB
 
 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
 of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB
 
 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1

Try
  6: md0 = raid10,f2 of sda1+sdb1+sdc1+sdd1

Also try raid10,o2 with a largeish chunksize (256KB is probably big
enough).

NeilBrown


 
 My new disks give a transfer rate of about 80 MB/s, so I expected
 to have something like 320 MB/s for the whole raid, but I did not get
 more than about 180 MB/s.
 
 I think it may be something with the layout, that in effect 
 the drives should be something like:
 
   sda1 sdb1sdc1  sdd1
01   2 3
45   6 7
 
 And this was not really doable for the combination of raids,
 because thet combinations give different block layouts.
 
 How can it be done? Do we need a new raid type?
 
 Best regards
 keld
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: striping of a 4 drive raid10

2008-01-27 Thread Keld Jørn Simonsen
On Mon, Jan 28, 2008 at 07:13:30AM +1100, Neil Brown wrote:
 On Sunday January 27, [EMAIL PROTECTED] wrote:
  Hi
  
  I have tried to make a striping raid out of my new 4 x 1 TB
  SATA-2 disks. I tried raid10,f2 in several ways:
  
  1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0
  of md0+md1
  
  2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
  of md0+md1
  
  3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of 
  md0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB
  
  4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
  of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB
  
  5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1
 
 Try
   6: md0 = raid10,f2 of sda1+sdb1+sdc1+sdd1

That I already tried, (and I wrongly stated that I used f4 in stead of
f2). I had two times a thruput of about 300 MB/s but since then I could
not reproduce the behaviour. Are there errors on this that has been
corrected in newer kernels?


 Also try raid10,o2 with a largeish chunksize (256KB is probably big
 enough).

I tried that too, but my mdadm did not allow me to use the o flag.

My kernel is 2.6.12  and mdadm is v1.12.0 - 14 June 2005.
can I upgrade the mdadm alone to a newer version, and then which is
recommendable?

best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: striping of a 4 drive raid10

2008-01-27 Thread Keld Jørn Simonsen
On Sun, Jan 27, 2008 at 08:11:35PM +, Peter Grandi wrote:
  On Sun, 27 Jan 2008 20:33:45 +0100, Keld Jørn Simonsen
  [EMAIL PROTECTED] said:
 
 keld Hi I have tried to make a striping raid out of my new 4 x
 keld 1 TB SATA-2 disks. I tried raid10,f2 in several ways:
 
 keld 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = 
 raid0
 keldof md0+md1
 keld 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
 keldof md0+md1
 keld 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize 
 of 
 keldmd0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB
 keld 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
 keldof md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB
 
 These stacked RAID levels don't make a lot of sense.
 
 keld 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1
 
 This also does not make a lot of sense. Why have four mirrors
 instead of two?

My error, I did mean f2.

Anyway 4 mirrors would make the disk 2 times faster than 2 disks, and given disk
prices these days this could make a lot of sense.

 Instead, try 'md0 = raid10,f2' for example. The first mirror of
 will be striped across the outer half of all four drives, and
 the second mirrors will be rotated in the inner half of each
 drive.
 
 Which of course means that reads will be quite quick, but writes
 and degraded operation will be slower.
 
 Consider this post for more details:
 
   http://www.spinics.net/lists/raid/msg18130.html

Thanks for the reference.

There is also more in the original article on possible layouts of what
is now known as raid10,f2

http://marc.info/?l=linux-raidm=107427614604701w=2

including performance enhancements due to use of the faster outer
sectors, and smaller average seek times because you can seek on only
half the disk.

best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: striping of a 4 drive raid10

2008-01-27 Thread Neil Brown
On Sunday January 27, [EMAIL PROTECTED] wrote:
 On Mon, Jan 28, 2008 at 07:13:30AM +1100, Neil Brown wrote:
  On Sunday January 27, [EMAIL PROTECTED] wrote:
   Hi
   
   I have tried to make a striping raid out of my new 4 x 1 TB
   SATA-2 disks. I tried raid10,f2 in several ways:
   
   1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0
   of md0+md1
   
   2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
   of md0+md1
   
   3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize 
   of 
   md0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB
   
   4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
   of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB
   
   5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1
  
  Try
6: md0 = raid10,f2 of sda1+sdb1+sdc1+sdd1
 
 That I already tried, (and I wrongly stated that I used f4 in stead of
 f2). I had two times a thruput of about 300 MB/s but since then I could
 not reproduce the behaviour. Are there errors on this that has been
 corrected in newer kernels?

No, I don't think any performance related changes have been made to
raid10 lately.

You could try increasing the read-ahead size.  For a 4-drive raid10 it
defaults to 4 times the read-ahead setting of a single drive, but
increasing substantially (e.g. 64 times) seem to increase the speed of
dd reading a gigabyte.
Whether that will actually affect your target workload is a different question.

 
 
  Also try raid10,o2 with a largeish chunksize (256KB is probably big
  enough).
 
 I tried that too, but my mdadm did not allow me to use the o flag.
 
 My kernel is 2.6.12  and mdadm is v1.12.0 - 14 June 2005.
 can I upgrade the mdadm alone to a newer version, and then which is
 recommendable?

You would need a newer kernel and a newer mdadm to get raid10 - offset
mode.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html