Re: raid5 reshape/resync

2007-11-25 Thread Nagilum

- Message from [EMAIL PROTECTED] -
Date: Sat, 24 Nov 2007 12:02:09 +0100
From: Nagilum [EMAIL PROTECTED]
Reply-To: Nagilum [EMAIL PROTECTED]
 Subject: raid5 reshape/resync
  To: linux-raid@vger.kernel.org


Hi,
I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] - md0)
During that reshape (at around 4%) /dev/sdd reported read errors and
went offline.
I replaced /dev/sdd with a new drive and tried to reassemble the array
(/dev/sdd was shown as removed and now as spare).
Assembly worked but it would not run unless I use --force.
Since I'm always reluctant to use force I put the bad disk back in,
this time as /dev/sdg . I re-added the drive and could run the array.
The array started to resync (since the disk can be read until 4%) and
then I marked the disk as failed. Now the array is active, degraded,
recovering:

nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
Version : 00.91.03
  Creation Time : Sat Sep 15 21:11:41 2007
 Raid Level : raid5
 Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
  Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
   Raid Devices : 6
  Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sat Nov 24 10:10:46 2007
  State : active, degraded, recovering
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 16K

 Reshape Status : 19% complete
  Delta Devices : 1, (5-6)

   UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
 Events : 0.726347

Number   Major   Minor   RaidDevice State
   0   800  active sync   /dev/sda
   1   8   161  active sync   /dev/sdb
   2   8   322  active sync   /dev/sdc
   6   8   963  faulty spare rebuilding   /dev/sdg
   4   8   644  active sync   /dev/sde
   5   8   805  active sync   /dev/sdf

   7   8   48-  spare   /dev/sdd

iostat:
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 129.48  1498.01  1201.59   7520   6032
sdb 134.86  1498.01  1201.59   7520   6032
sdc 127.69  1498.01  1201.59   7520   6032
sdd   0.40 0.00 3.19  0 16
sde 111.55  1498.01  1201.59   7520   6032
sdf 117.73 0.00  1201.59  0   6032
sdg   0.00 0.00 0.00  0  0

What I find somewhat confusing/disturbing is that does not appear to
utilize /dev/sdd. What I see here could be explained by md doing a
RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would
have expected it to use the new spare sdd for that. Also the speed is
unusually low which seems to indicate a lot of seeking as if two
operations are happening at the same time.
Also when I look at the data rates it looks more like the reshape is
continuing even though one drive is missing (possible but risky).
Can someone relief my doubts as to whether md does the right thing here?
Thanks,


- End message from [EMAIL PROTECTED] -

Ok, so the reshape tried to continue without the failed drive and  
after that resynced to the new spare.
Unfortunately the result is a mess. On top of the Raid5 I have  
dm-crypt and LVM.
Although dmcrypt and LVM dont appear to have a problem the filesystems  
on top are a mess now.
I still have the failed drive, I can read the superblock from that  
drive and up to 4% from the beginning and probably backwards from the  
end towards that point.
So in theory it could be possible to reorder the stripe blocks which  
appears to have been messed up.(?)
Unfortunately I'm not sure what exactly went wrong or what I did  
wrong. Can someone please give me hint?

Thanks,
Alex.


#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..



pgp3M3OZnLTve.pgp
Description: PGP Digital Signature


Re: HELP! New disks being dropped from RAID 6 array on every reboot

2007-11-25 Thread Joshua Johnson
On Nov 24, 2007 9:27 PM, Bill Davidsen [EMAIL PROTECTED] wrote:

  No, I think you read that backward. using PARTITIONS is the right way to do
 it, but I was suggesting that the boot mdadm.conf in initrd was still using
 the old deleted partition names. And I assume that the old drives were
 either physically removed or you used the zero-superblock option to prevent
 the old partitions from being found and confusing the issue.

I doubt a second mdadm.conf is the problem (unless I misunderstand
something about the boot process), as I am not using a initrd on this
kernel.  One PATA drive was physically removed and one was moved to
the new controller.  The new SATA drive took the place of the removed
PATA drive in the array.

  I assume you made the old partitions go away in one of these ways, so
 PARTITIONS should work, and from what you said I had the impression it did
 work after boot, which would fit having a non-functional mdadm config in
 initrd.

  Any of that match what you're doing? I've never had to use the explicit
 partitions except when I forgot to zap the old superblocks.

  --
 Bill Davidsen [EMAIL PROTECTED]

I'm not sure exactly why the array wasn't being assembled with the sd*
disks but I suspect that the md driver was being loaded before the
scsi disk driver was done with the partition scan.  At any rate, using
the wildcarded device names resolved the issue and the server is
humming along happily.  Thanks for the help!
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html