Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-10 Thread CaT
On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
  But I suspect that --assemble --force would do the right thing.
  Without more details, it is hard to say for sure.
 
 I suspect so aswell but throwing caution into the wind erks me wrt this
 raid array. :)

Sorry. Not to be a pain but considering the previous email with all the
examine dumps, etc would the above be the way to go? I just don't want
to have missed something and bugger the array up totally.

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-10 Thread Neil Brown
On Thursday January 10, [EMAIL PROTECTED] wrote:
 On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
   But I suspect that --assemble --force would do the right thing.
   Without more details, it is hard to say for sure.
  
  I suspect so aswell but throwing caution into the wind erks me wrt this
  raid array. :)
 
 Sorry. Not to be a pain but considering the previous email with all the
 examine dumps, etc would the above be the way to go? I just don't want
 to have missed something and bugger the array up totally.

Yes, definitely.

The superblocks look perfectly normal for a single drive failure
followed by a crash.  So --assemble --force is the way to go.

Technically you could have some data corruption if a write was under
way at the time of the crash.  In that case the parity block of that
stripe could be wrong, so the recovered data for the missing device
could be wrong.
This is why you are required to use --force - to confirm that you
are aware that there could be a problem.

It would be worth running fsck just to be sure that nothing critical
has been corrupted.  Also if you have a recent backup, I wouldn't
recycle it until I was fairly sure that all your data was really safe.

But in my experience the chance of actual data corruption in this
situation is fairly low.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-10 Thread CaT
On Fri, Jan 11, 2008 at 07:21:42AM +1100, Neil Brown wrote:
 On Thursday January 10, [EMAIL PROTECTED] wrote:
  On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote:
But I suspect that --assemble --force would do the right thing.
Without more details, it is hard to say for sure.
   
   I suspect so aswell but throwing caution into the wind erks me wrt this
   raid array. :)
  
  Sorry. Not to be a pain but considering the previous email with all the
  examine dumps, etc would the above be the way to go? I just don't want
  to have missed something and bugger the array up totally.
 
 Yes, definitely.

Cool.

 The superblocks look perfectly normal for a single drive failure
 followed by a crash.  So --assemble --force is the way to go.
 
 Technically you could have some data corruption if a write was under
 way at the time of the crash.  In that case the parity block of that

I'd expect so as I think the crash situation is one of rather severe
abruptness.

 stripe could be wrong, so the recovered data for the missing device
 could be wrong.
 This is why you are required to use --force - to confirm that you
 are aware that there could be a problem.

Right.

 It would be worth running fsck just to be sure that nothing critical
 has been corrupted.  Also if you have a recent backup, I wouldn't
 recycle it until I was fairly sure that all your data was really safe.

I'll be doing a fsck and checking what data I can over the weekend to
see what was fragged. I suspect it'll just be something rsynced due to
the time of the crash.

 But in my experience the chance of actual data corruption in this
 situation is fairly low.

Yaay. :)

Thanks. I'll now go and put humpty together again. For some reason
Johnny Cash's 'Ring of Fire' is playing in my head.

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-09 Thread CaT
On Wed, Jan 09, 2008 at 05:52:57PM +1100, Neil Brown wrote:
 On Wednesday January 9, [EMAIL PROTECTED] wrote:
  
  I'd provide data dumps of --examine and friends but I'm in a situation
  where transferring the data would be a right pain. I'll do it if need
  be, though.
  
  So, what can I do? 
 
 Well, providing the output of --examine would help a lot.

Here's the output of the 3 remaining drives, the array and mdstat.

/proc/mdstat:
Personalities : [raid1] [raid6] [raid5] [raid4] 
...
md3 : inactive sdf1[0] sde1[2] sdd1[1]
  1465151808 blocks
...
unused devices: none

/dev/md3:
Version : 00.90.03
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Thu Jan  3 08:51:00 2008
  State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : f60a1be0:5a10f35f:164afef4:10240419
 Events : 0.45649

Number   Major   Minor   RaidDevice State
   0   8   810  active sync   /dev/sdf1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1
   3   00-  removed

/dev/sdd1:
  Magic : a92b4efc
Version : 00.90.03
   UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

Update Time : Thu Jan  3 08:51:00 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : cb259d08 - correct
 Events : 0.45649

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 1   8   491  active sync   /dev/sdd1

   0 0   8   810  active sync   /dev/sdf1
   1 1   8   491  active sync   /dev/sdd1
   2 2   8   652  active sync   /dev/sde1
   3 3   8   333  active sync   /dev/sdc1

/dev/sde1:
  Magic : a92b4efc
Version : 00.90.03
   UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

Update Time : Thu Jan  3 08:51:00 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : cb259d1a - correct
 Events : 0.45649

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 2   8   652  active sync   /dev/sde1

   0 0   8   810  active sync   /dev/sdf1
   1 1   8   491  active sync   /dev/sdd1
   2 2   8   652  active sync   /dev/sde1
   3 3   8   333  active sync   /dev/sdc1

/dev/sdf1:
  Magic : a92b4efc
Version : 00.90.03
   UUID : f60a1be0:5a10f35f:164afef4:10240419
  Creation Time : Thu Aug 30 15:50:01 2007
 Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3

Update Time : Thu Jan  3 08:51:00 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : cb259d26 - correct
 Events : 0.45649

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 0   8   810  active sync   /dev/sdf1

   0 0   8   810  active sync   /dev/sdf1
   1 1   8   491  active sync   /dev/sdd1
   2 2   8   652  active sync   /dev/sde1
   3 3   8   333  active sync   /dev/sdc1


 But I suspect that --assemble --force would do the right thing.
 Without more details, it is hard to say for sure.

I suspect so aswell but throwing caution into the wind erks me wrt this
raid array. :)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid5 stuck in degraded, inactive and dirty mode

2008-01-08 Thread CaT
Hi,

I've got a 4 disk RAID5 array that had one of the disks die. The hassle
is that the death was not graceful and triggered a bug in the nforce4
chipset that wound up freezing the northbridge and hence the pc. This
has left the array in a degraded state where I cannot add the swanky new
HD to the array and have it back up to its snazzy self. Normally I would
tinker until I got it working but this being the actual backup box, I'd
rather not lose the data. :)

After a bit of pondering I have come to the conclusion that what may be
biting me is that each individual left-over component of the RAID array
still thinks that the failed drive is still around, whilst the array as
a whole knows better. Setting what used to be the left-over hd failed
produces a device not found error. The components all have different
checksums (which seems to be the right thing judging by other, whole
arrays) and the checksums are marked correct. Event numbers are all
thesame. The status on each drive is active, which I also assume is
wrong. Where the components list the other members of the array the
missing drive is marked 'active sync'.

I'd provide data dumps of --examine and friends but I'm in a situation
where transferring the data would be a right pain. I'll do it if need
be, though.

So, what can I do? 

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-08 Thread Neil Brown
On Wednesday January 9, [EMAIL PROTECTED] wrote:
 
 I'd provide data dumps of --examine and friends but I'm in a situation
 where transferring the data would be a right pain. I'll do it if need
 be, though.
 
 So, what can I do? 

Well, providing the output of --examine would help a lot.

But I suspect that --assemble --force would do the right thing.
Without more details, it is hard to say for sure.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html