Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-10 Thread CaT
On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
  But I suspect that --assemble --force would do the right thing.
  Without more details, it is hard to say for sure.
 
 I suspect so aswell but throwing caution into the wind erks me wrt this
 raid array. :)

Sorry. Not to be a pain but considering the previous email with all the
examine dumps, etc would the above be the way to go? I just don't want
to have missed something and bugger the array up totally.

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-10 Thread CaT
On Fri, Jan 11, 2008 at 07:21:42AM +1100, Neil Brown wrote:
 On Thursday January 10, [EMAIL PROTECTED] wrote:
  On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
But I suspect that --assemble --force would do the right thing.
Without more details, it is hard to say for sure.
   
   I suspect so aswell but throwing caution into the wind erks me wrt this
   raid array. :)
  
  Sorry. Not to be a pain but considering the previous email with all the
  examine dumps, etc would the above be the way to go? I just don't want
  to have missed something and bugger the array up totally.
 
 Yes, definitely.

Cool.

 The superblocks look perfectly normal for a single drive failure
 followed by a crash.  So --assemble --force is the way to go.
 
 Technically you could have some data corruption if a write was under
 way at the time of the crash.  In that case the parity block of that

I'd expect so as I think the crash situation is one of rather severe
abruptness.

 stripe could be wrong, so the recovered data for the missing device
 could be wrong.
 This is why you are required to use --force - to confirm that you
 are aware that there could be a problem.

Right.

 It would be worth running fsck just to be sure that nothing critical
 has been corrupted.  Also if you have a recent backup, I wouldn't
 recycle it until I was fairly sure that all your data was really safe.

I'll be doing a fsck and checking what data I can over the weekend to
see what was fragged. I suspect it'll just be something rsynced due to
the time of the crash.

 But in my experience the chance of actual data corruption in this
 situation is fairly low.

Yaay. :)

Thanks. I'll now go and put humpty together again. For some reason
Johnny Cash's 'Ring of Fire' is playing in my head.

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-09 Thread CaT
On Wed, Jan 09, 2008 at 05:52:57PM +1100, Neil Brown wrote:
 On Wednesday January 9, [EMAIL PROTECTED] wrote:
  
  I'd provide data dumps of --examine and friends but I'm in a situation
  where transferring the data would be a right pain. I'll do it if need
  be, though.
  
  So, what can I do? 
 
 Well, providing the output of --examine would help a lot.

Here's the output of the 3 remaining drives, the array and mdstat.

/proc/mdstat:
Personalities : [raid1] [raid6] [raid5] [raid4] 
...
md3 : inactive sdf1[0] sde1[2] sdd1[1]
  1465151808 blocks
...
unused devices: none

/dev/md3:
Version : 00.90.03
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Thu Jan  3 08:51:00 2008
  State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : f60a1be0:5a10f35f:164afef4:10240419
 Events : 0.45649

Number   Major   Minor   RaidDevice State
   0   8   810  active sync   /dev/sdf1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1
   3   00-  removed

/dev/sdd1:
  Magic : a92b4efc
Version : 00.90.03
   UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

Update Time : Thu Jan  3 08:51:00 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : cb259d08 - correct
 Events : 0.45649

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 1   8   491  active sync   /dev/sdd1

   0 0   8   810  active sync   /dev/sdf1
   1 1   8   491  active sync   /dev/sdd1
   2 2   8   652  active sync   /dev/sde1
   3 3   8   333  active sync   /dev/sdc1

/dev/sde1:
  Magic : a92b4efc
Version : 00.90.03
   UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

Update Time : Thu Jan  3 08:51:00 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : cb259d1a - correct
 Events : 0.45649

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 2   8   652  active sync   /dev/sde1

   0 0   8   810  active sync   /dev/sdf1
   1 1   8   491  active sync   /dev/sdd1
   2 2   8   652  active sync   /dev/sde1
   3 3   8   333  active sync   /dev/sdc1

/dev/sdf1:
  Magic : a92b4efc
Version : 00.90.03
   UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

Update Time : Thu Jan  3 08:51:00 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : cb259d26 - correct
 Events : 0.45649

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 0   8   810  active sync   /dev/sdf1

   0 0   8   810  active sync   /dev/sdf1
   1 1   8   491  active sync   /dev/sdd1
   2 2   8   652  active sync   /dev/sde1
   3 3   8   333  active sync   /dev/sdc1


 But I suspect that --assemble --force would do the right thing.
 Without more details, it is hard to say for sure.

I suspect so aswell but throwing caution into the wind erks me wrt this
raid array. :)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid5 stuck in degraded, inactive and dirty mode

2008-01-08 Thread CaT
Hi,

I've got a 4 disk RAID5 array that had one of the disks die. The hassle
is that the death was not graceful and triggered a bug in the nforce4
chipset that wound up freezing the northbridge and hence the pc. This
has left the array in a degraded state where I cannot add the swanky new
HD to the array and have it back up to its snazzy self. Normally I would
tinker until I got it working but this being the actual backup box, I'd
rather not lose the data. :)

After a bit of pondering I have come to the conclusion that what may be
biting me is that each individual left-over component of the RAID array
still thinks that the failed drive is still around, whilst the array as
a whole knows better. Setting what used to be the left-over hd failed
produces a device not found error. The components all have different
checksums (which seems to be the right thing judging by other, whole
arrays) and the checksums are marked correct. Event numbers are all
thesame. The status on each drive is active, which I also assume is
wrong. Where the components list the other members of the array the
missing drive is marked 'active sync'.

I'd provide data dumps of --examine and friends but I'm in a situation
where transferring the data would be a right pain. I'll do it if need
be, though.

So, what can I do? 

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid5 resizing

2007-12-19 Thread CaT
Hi,

I'm thinking of slowly replacing disks in my raid5 array with bigger
disks and then resize the array to fill up the new disks. Is this
possible? Basically I would like to go from:

3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of
storage.

It seems like it should be, but... :)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 resizing

2007-12-19 Thread CaT
On Wed, Dec 19, 2007 at 10:59:41PM +1100, Neil Brown wrote:
 On Wednesday December 19, [EMAIL PROTECTED] wrote:
  Hi,
  
  I'm thinking of slowly replacing disks in my raid5 array with bigger
  disks and then resize the array to fill up the new disks. Is this
  possible? Basically I would like to go from:
  
  3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of
  storage.
  
  It seems like it should be, but... :)
 
 Yes.
 
 mdadm --grow /dev/mdX --size=max

Oh -joy-. I love linux sw raid. :) The only thing it seems to lack is
battery backed-up cache.

Thank you.

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread CaT
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
 [EMAIL PROTECTED] ~]# mdadm
 --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
 /dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1
 /dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1
 mdadm: superblock on /dev/sdw1 doesn't match others - assembly aborted

Have you tried zeroing the superblock with

mdadm --misc --zero-superblock /dev/sdw1

and then adding it in?

 [EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/
 /dev/md3: Invalid argument
 mount: /dev/md3: can't read superblock

Wow that looks messy. ummm. about the only thing I can think of is
failing /dev/sdw1 and removing it (I know it says it's not there
but...)

Also, not biggest expert on raid around here. ;)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help wanted - 6-disk raid5 borked: _ _ U U U U

2006-04-16 Thread CaT
On Sun, Apr 16, 2006 at 08:46:52PM -0300, Carlos Carvalho wrote:
 Neil Brown ([EMAIL PROTECTED]) wrote on 17 April 2006 09:30:
  The easiest thing to do when you get an error on a drive is to kick
  the drive from the array, so that is what the code always did, and
  still does in many cases.
  It is arguable that for a read error on a degraded raid5, that may not
  be the best thing to do, but I'm not completely convinced.
 
 I don't see how it could be different. If the array is degraded and
 one more disk fails there's no way to obtain the information, so the
 md device just fails like a single disk.

Not necessarily. You probably have something like (say) 200GB of data
stripes across that disk. That one read error may affect just one or a
few which means there's a whole buttload of data that could be retrieved
still. Perhaps setting the entire raid array read-only on such an error
would be better? That makes it a choice between potentially losing
everything and having writes and some reads fail as you have a mild
stroke trying to get another drive in on things. Put the drive in, let
the array do the best it can to restore things, fail the bad drive, put
another disk in, have it come up fully and the fsck it good.

At least this way you probably have less of a chance of losing the
entire array of data and who knows, only the 'less important' files
might be lost. :)

Anyway, my 2c. :)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help wanted - 6-disk raid5 borked: _ _ U U U U

2006-04-16 Thread CaT
On Sun, Apr 16, 2006 at 09:42:34PM -0300, Carlos Carvalho wrote:
 CaT ([EMAIL PROTECTED]) wrote on 17 April 2006 10:25:
 Not necessarily. You probably have something like (say) 200GB of data
 stripes across that disk. That one read error may affect just one or a
 few which means there's a whole buttload of data that could be retrieved
 still. Perhaps setting the entire raid array read-only on such an error
 would be better? That makes it a choice between potentially losing
 everything and having writes and some reads fail as you have a mild
 stroke trying to get another drive in on things. Put the drive in, let
 the array do the best it can to restore things, fail the bad drive, put
 another disk in, have it come up fully and the fsck it good.
 
 You want the array to stay on and jump here and there getting the
 stripes from wherever it can, each time from a different set of disks.
 That's surely nice but I think it's too much to ask...

That would be nice but even just setting it read-only and if it fails a
read done as it normally would it just fails it and moves on. Nothing
special but it might let you recover a vast chunk of your data. Then you
can decide if what is lost is worth crying over. That's still better
then complete data loss.

Hope that makes sense. :)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2+ raid sets, sata and a missing hd question

2006-02-15 Thread CaT
On Wed, Feb 15, 2006 at 07:50:28AM +0100, Luca Berra wrote:
 On Wed, Feb 15, 2006 at 01:45:21PM +1100, CaT wrote:
 Seeing as how SATA drives can move around if one removes one from a set
 (ie given sda, sdb, sdc, if sdb was removed sdc drops to sdb) would md6
 come back up without problems if I were to remove either sda or sdb.
 
 if you configured mdadm correctly, you will have no problem :)
 
 hint
 echo DEVICE partitions  /etc/mdadm.conf
 mdadm -Esc partitions | grep ARRAY /etc/mdadm.conf

So the md5 array will reconstruct itself after initial bootup where the
kernel reconstructs the raid1 (as well as it can) for booting?

 All md partitions are of type fd (Linux raid autodetect).

 this is surprisingly not at all relevant

Awww. But I like it when the kernel just, well, does it all and makes it
all ready. :)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html