Re: [gentoo-user] A drive in my RAID6 has failed

2013-09-06 Thread Paul Hartman
On Fri, Sep 6, 2013 at 12:46 AM, Paul Hartman
paul.hartman+gen...@gmail.com wrote:
 So, I simply inserted and partitioned the new drive, added it to the
 array and away we go!

 md0 : active raid6 sde1[6] sdd1[5] sdg1[4] sdh1[2] sdf1[1] sdi1[0]
   11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
 [6/5] [UUU_UU]
   []  recovery =  2.3% (69513216/2930002432)
 finish=428.7min speed=111206K/sec

 When I wake up in the morning, I hope there won't be any errors.

Success! It took 10 hours to rebuild the drive (speeds near the start
of the disk are significantly faster than those near the end of the
disk, so early estimates quoted by /proc/mdstat above were overly
optimistic):

[3720270.120695] md: bindsde1
[3720270.162933] RAID conf printout:
[3720270.162942]  --- level:6 rd:6 wd:5
[3720270.162949]  disk 0, o:1, dev:sdi1
[3720270.162954]  disk 1, o:1, dev:sdf1
[3720270.162958]  disk 2, o:1, dev:sdh1
[3720270.162962]  disk 3, o:1, dev:sde1
[3720270.162965]  disk 4, o:1, dev:sdg1
[3720270.162969]  disk 5, o:1, dev:sdd1
[3720270.163060] md: recovery of RAID array md0
[3720270.163067] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[3720270.163071] md: using maximum available idle IO bandwidth (but
not more than 20 KB/sec) for recovery.
[3720270.163085] md: using 128k window, over a total of 2930002432k.
[3756293.459324] md: md0: recovery done.
[3756294.797961] RAID conf printout:
[3756294.797969]  --- level:6 rd:6 wd:6
[3756294.797974]  disk 0, o:1, dev:sdi1
[3756294.797979]  disk 1, o:1, dev:sdf1
[3756294.797982]  disk 2, o:1, dev:sdh1
[3756294.797986]  disk 3, o:1, dev:sde1
[3756294.797989]  disk 4, o:1, dev:sdg1
[3756294.797992]  disk 5, o:1, dev:sdd1



Re: [gentoo-user] A drive in my RAID6 has failed

2013-09-05 Thread Paul Hartman
On Thu, Sep 5, 2013 at 11:52 AM, Michael Orlitzky mich...@orlitzky.com wrote:
 This is the process I always follow:

   http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array

 The sfdisk trick will save you a bit of hassle.

Thanks, it looks like I was on the right path! Crossing my fingers...



Re: [gentoo-user] A drive in my RAID6 has failed

2013-09-05 Thread Michael Orlitzky
On 09/05/2013 12:49 PM, Paul Hartman wrote:
 Hi,
 
 I woke up this morning to see the dreaded email from mdadm telling me
 one of my drives failed overnight, while I was happily dreaming about
 cute puppies and kittens installing a rainbow-colored roof on my
 house. The array is a RAID6 (two parity drives) and this is the
 current state:
 
 md0 : active raid6 sdd1[5] sdg1[4] sde1[3](F) sdh1[2] sdf1[1] sdi1[0]
   11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
 [6/5] [UUU_UU]
 
 I've been using RAID in Linux for years, but this is actually the
 first time I've had a disk fail in one.
 
 If I remember correctly, the process should be as simple as:
 
 #remove the failed disk from the array:
 mdadm /dev/md0 -r /dev/sde1
 
 #pull the drive, replace with new one, partition it, then add it to the array:
 mdadm /dev/md0 -a /dev/sde1
 
 and sit back and eat popcorn while I enjoy the blinkenlights for the
 next several hours/days? :) Any advice/suggestions for managing this
 process any differently?
 

This is the process I always follow:

  http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array

The sfdisk trick will save you a bit of hassle.




[gentoo-user] A drive in my RAID6 has failed

2013-09-05 Thread Paul Hartman
Hi,

I woke up this morning to see the dreaded email from mdadm telling me
one of my drives failed overnight, while I was happily dreaming about
cute puppies and kittens installing a rainbow-colored roof on my
house. The array is a RAID6 (two parity drives) and this is the
current state:

md0 : active raid6 sdd1[5] sdg1[4] sde1[3](F) sdh1[2] sdf1[1] sdi1[0]
  11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
[6/5] [UUU_UU]

I've been using RAID in Linux for years, but this is actually the
first time I've had a disk fail in one.

If I remember correctly, the process should be as simple as:

#remove the failed disk from the array:
mdadm /dev/md0 -r /dev/sde1

#pull the drive, replace with new one, partition it, then add it to the array:
mdadm /dev/md0 -a /dev/sde1

and sit back and eat popcorn while I enjoy the blinkenlights for the
next several hours/days? :) Any advice/suggestions for managing this
process any differently?

For now I have unmounted the filesystem that sits atop it, to prevent
any more writes from occurring, just in case...

Thanks,
Paul



Re: [gentoo-user] A drive in my RAID6 has failed

2013-09-05 Thread Paul Hartman
On Thu, Sep 5, 2013 at 12:11 PM, Paul Hartman
paul.hartman+gen...@gmail.com wrote:
 On Thu, Sep 5, 2013 at 11:52 AM, Michael Orlitzky mich...@orlitzky.com 
 wrote:
 This is the process I always follow:

   http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array

 The sfdisk trick will save you a bit of hassle.

 Thanks, it looks like I was on the right path! Crossing my fingers...

So, I probably should not have attempted to do this immediately after
eating dinner. My brain was not operating at full speed, and I went
ahead and pulled the drive before removing it from the array. Oops! As
soon as I pulled the latch to release the drive, I had that oh no!
moment. Luckily, as it turns out, md (or mdadm? or udev?) was nice
enough to automatically remove it for me when the drive ceased to
exist.

So, I simply inserted and partitioned the new drive, added it to the
array and away we go!

md0 : active raid6 sde1[6] sdd1[5] sdg1[4] sdh1[2] sdf1[1] sdi1[0]
  11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
[6/5] [UUU_UU]
  []  recovery =  2.3% (69513216/2930002432)
finish=428.7min speed=111206K/sec

When I wake up in the morning, I hope there won't be any errors.


BTW -- a couple tips I found which speed up RAID building/recovery
tremendously (season to taste):

echo 32768  /sys/block/md0/md/stripe_cache_size
echo 20  /proc/sys/dev/raid/speed_limit_max