RE: How many drives are bad?

2008-02-19 Thread Steve Fairbairn

 
 The box presents 48 drives, split across 6 SATA controllers. 
 So disks sda-sdh are on one controller, etc. In our 
 configuration, I run a RAID5 MD array for each controller, 
 then run LVM on top of these to form one large VolGroup.
 

I might be missing something here, and I realise you'd lose 8 drives to
redundancy rather than 6, but wouldn't it have been better to have 8
arrays of 6 drives, each array using a single drive from each
controller?  That way a single controller failure (assuming no other HD
failures) wouldn't actually take any array down?  I do realise that 2
controller failures at the same time would lose everything.

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.20.7/1286 - Release Date:
18/02/2008 18:49
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HDD errors in dmesg, but don't know why...

2008-02-18 Thread Steve Fairbairn

Hi All,

I've got a degraded RAID5 which I'm trying to add in the replacement
disk.  Trouble is, every time the recovery starts, it flies along at
70MB/s or so.  Then after doing about 1%, it starts dropping rapidly,
until eventually a device is marked failed.

When I look in dmesg, I get the following...

SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
 res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:18:3f:02:f9/01:00:00:00:00/40 tag 3 cdb 0x0 data
131072 in
 res 41/40:00:c3:02:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
 res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:18:3f:02:f9/01:00:00:00:00/40 tag 3 cdb 0x0 data
131072 in
 res 41/40:00:c3:02:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
 res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back

I've no idea what to make of these errors.  As far as I can work out,
the HD's themselves are fine They are all less than 2 months old.

The box is CentOS 5.1.  Linux space.homenet.com 2.6.18-53.1.13.el5 #1
SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

Any suggestions on what I can do to stop this issue?

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.20.7/1284 - Release Date:
17/02/2008 14:39
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Any inexpensive hardware recommendations for PCI interface cards?

2008-02-08 Thread Steve Fairbairn

Hi All,

I currently have a couple of IT8212 PCI ATA RAID (1, 0 ot 1+0) cards
which Linux RAID doesn't seem to like too well.  Initially I tried
creating an array out of 4 disks on the 4 primaries over the 2 cards.
Although this seemed to work, the access performance was impossibly low
and I never actually got to the point of leaving it for a week to build
the array.

Since then, I have striped 2 of the drives (just primaries again and
known to be good drives) on a single card using the cards own striping
capability.  This striped drive, I've that added to my RAID 5 array as a
500GB single disk.

The grow into this disk worked fine as did the resize2fs.  As soon as I
tried to copy data onto the array, the system marked the  disk as
faulty.  Now currently I'm running a degraded array anyway as I'm
waiting for the replacement of the drive that failed with bad sectors
when I initially started the grow. So I had to use assemble --force to
get the md device back online.  I've not mounted it since. (I know it's
wise not to use the drive while it's degraded, but if I can get the data
off of a 320GB HD, then I can stripe that with another disk on the other
ITE card, and add in a spare to the array).

Can anyone see any issues with what I'm trying to do?
Are there any known issues with IT8212 cards (They worked as straight
disks on linux fine)?
Is anyone using an array with disks on PCI interface cards?
Is there an issue with mixing motherboard interfaces and PCI card based
ones?
Does anyone recommend any inexpensive (probably SATA-II) PCI interface
cards?

The motherboard has run out of sensible interfaces (I'm not using both
primary and secondary in an array on IDE), but I'd still like the
capacity to grow my array further.

Thanks again for the help.

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.21/1265 - Release Date:
07/02/2008 11:17
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Disk failure during grow, what is the current state.

2008-02-06 Thread Steve Fairbairn

Hi All,

I was wondering if someone might be willing to confirm what the current
state of my RAID array is, given the following sequence of events (sorry
it's pretty long)

I had a clean, running /dev/md0 using 5 disks in RAID 5 (sda1, sdb1,
sdc1, sdd1, hdd1).  It had been clean like that for a while.  So last
night I decided it was safe to grow the array into a sixth disk

[EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdi1
mdadm: added /dev/hdi1
[EMAIL PROTECTED] ~]# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Wed Jan  9 18:57:53 2008
 Raid Level : raid5
 Array Size : 1953535744 (1863.04 GiB 2000.42 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 5
  Total Devices : 6
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Feb  5 23:55:59 2008
  State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : 382c157a:405e0640:c30f9e9e:888a5e63
 Events : 0.429616

Number   Major   Minor   RaidDevice State
   0   810  active sync   /dev/sda1
   1   8   171  active sync   /dev/sdb1
   2   8   332  active sync   /dev/sdc1
   3  22   653  active sync   /dev/hdd1
   4   8   494  active sync   /dev/sdd1

   5  561-  spare   /dev/hdi1
[EMAIL PROTECTED] ~]# mdadm --grow /dev/md0 --raid-devices=6
mdadm: Need to backup 1280K of critical section..
mdadm: ... critical section passed.
[EMAIL PROTECTED] ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 hdi1[5] sdd1[4] sdc1[2] sdb1[1] sda1[0] hdd1[3]
  1953535744 blocks super 0.91 level 5, 64k chunk, algorithm 2 [6/6]
[UU]
  []  reshape =  0.0% (29184/488383936)
finish=2787.4min speed=2918K/sec
  
unused devices: none
[EMAIL PROTECTED] ~]# 

OK, so that would take nearly 2 days to complete, so I went to bed happy
about 10 hours ago.

I come to the machine this morning, and I have the following

[EMAIL PROTECTED] ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 hdi1[5] sdd1[6](F) sdc1[2] sdb1[1] sda1[0] hdd1[3]
  1953535744 blocks super 0.91 level 5, 64k chunk, algorithm 2 [6/5]
[_U]
  
unused devices: none
You have new mail in /var/spool/mail/root
[EMAIL PROTECTED] ~]# mdadm -D /dev/md0
/dev/md0:
Version : 00.91.03
  Creation Time : Wed Jan  9 18:57:53 2008
 Raid Level : raid5
 Array Size : 1953535744 (1863.04 GiB 2000.42 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Feb  6 05:28:09 2008
  State : clean, degraded
 Active Devices : 5
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

  Delta Devices : 1, (5-6)

   UUID : 382c157a:405e0640:c30f9e9e:888a5e63
 Events : 0.470964

Number   Major   Minor   RaidDevice State
   0   810  active sync   /dev/sda1
   1   8   171  active sync   /dev/sdb1
   2   8   332  active sync   /dev/sdc1
   3  22   653  active sync   /dev/hdd1
   4   004  removed
   5  5615  active sync   /dev/hdi1

   6   8   49-  faulty spare
[EMAIL PROTECTED] ~]# df -k
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
  56086828  11219432  41972344  22% /
/dev/hda1   101086 18281 77586  20% /boot
/dev/md0 1922882096 1775670344  69070324  97% /Downloads
tmpfs   513556 0513556   0% /dev/shm
[EMAIL PROTECTED] ~]# mdadm /dev/md0 --remove /dev/sdd1
mdadm: cannot find /dev/sdd1: No such file or directory [EMAIL PROTECTED] ~]# 

As you can see, one of the original 5 devices has failed (sdd1) and
automatically removed.  The reshape has stopped, but the new disk seems
to be in and clean which is the bit I don't understand.  The new disk
hasn't been added to the size, so it would seem that md has switched it
to being used as a spare instead (possibly as the grow hadn't
completed?).

How come it seems to have recovered so nicely?
Is there something I can do to check it's integrity?
Was it just so much quicker than 2 days because it switched to only
having to sort out the 1 disk? Would it be safe to run an fsck to check
the integrity of the fs?  I don't want to inadvertently blat the raid
array by 'using' it when it's in a dodgy state.

I have unmounted the drive for the time being, so that it doesn't get
any writes until I know what state 

FW: Disk failure during grow, what is the current state.

2008-02-06 Thread Steve Fairbairn

I'm having a nightmare with emails today.  I can't get a single one
right first time.  Apologies to Alex for sending it directly to him and
not to the list on first attempt.

Steve

 -Original Message-
 From: Steve Fairbairn [mailto:[EMAIL PROTECTED] 
 Sent: 06 February 2008 15:02
 To: 'Nagilum'
 Subject: RE: Disk failure during grow, what is the current state.
 
 
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Nagilum
  Sent: 06 February 2008 14:34
  To: Steve Fairbairn
  Cc: linux-raid@vger.kernel.org
  Subject: Re: Disk failure during grow, what is the current state.
  
  
  
  If a drive failes during reshape the reshape will just
  continue. The blocks which were on the failed drive are 
  calculated from the the  
  other disks and writes to the failed disk are simply omitted. 
  The result is a raid5 with a failed drive. You should get a 
  new drive asap to restore the redundancy. Also it's kinda 
  important that you don't run 2.6.23 because it has a  
  nasty bug which would be triggered in this scenario.
  The reshape probably increased in speed after the system was 
  no longer  
  actively used and io bandwidth freed up.
  Kind regards,
  Alex.
  
 
 Thanks for the response Alex, but
 
  Array Size : 1953535744 (1863.04 GiB 2000.42 GB)
   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
 
 Surely the added disk should now been added to the Array 
 Size?  5 * 500GB is 2500GB, not 2000GB.  This is why I don't 
 think the reshape has continued.  As for speeding up because 
 of no IO badwidth, this also doesn't actually hold very true, 
 because the system was at a point of not being used anyway 
 before I added the disk, and I didn't unmount the drive until 
 this morning after it claimed it had finished doing anything.
 
 It's because the size doesn't match up to all 5 disks being 
 used that I still wonder at the state of the array.
 
 Steve.
 
 
 
 
 No virus found in this outgoing message.
 Checked by AVG Free Edition. 
 Version: 7.5.516 / Virus Database: 269.19.20/1261 - Release 
 Date: 05/02/2008 20:57
  
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.20/1261 - Release Date:
05/02/2008 20:57
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Disk failure during grow, what is the current state.

2008-02-06 Thread Steve Fairbairn
  -Original Message-
  From: Steve Fairbairn [mailto:[EMAIL PROTECTED]
  Sent: 06 February 2008 15:02
  To: 'Nagilum'
  Subject: RE: Disk failure during grow, what is the current state.
  
  
   Array Size : 1953535744 (1863.04 GiB 2000.42 GB)
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
  
  Surely the added disk should now been added to the Array
  Size?  5 * 500GB is 2500GB, not 2000GB.  This is why I don't 
  think the reshape has continued.  As for speeding up because 
  of no IO badwidth, this also doesn't actually hold very true, 
  because the system was at a point of not being used anyway 
  before I added the disk, and I didn't unmount the drive until 
  this morning after it claimed it had finished doing anything.
  

Thanks again to Alex for his comments.  I've just rebooted the box, and
the reshape has continued on the degraded array and an RMA has been
raised for the faulty disk.

Thanks,

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.20/1261 - Release Date:
05/02/2008 20:57
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-20 Thread Steve Fairbairn

Thanks for the response Bill.  Neil has responded to me a few times, but
I'm more than happy to try and keep it on this list instead as it feels
like I'm badgering Neil which really isn't fair...

Since my initial email, I got to the point of believing it was down to
the superblock, and that --zero-superblock wasn't working, so a good few
hours and a dd if=/dev/zero of=/dev/hdc later, I tried adding it again
to the same result.

As it happens, I did the --zero-superblock, then tried to insert it
again and then examined (mdadm -E) again and the block was 'still there'
- What really happened was that the act of trying to add it writes in
the superblock.  So --zero-superblock is working fine for me, but it's
still refusing to add the device.

The only other thing I've tried is moving the replacement drive to
/dev/hdd instead (secondary slave) with an small old HD I had lying
around as hdc.

[EMAIL PROTECTED] ~]# mdadm -E /dev/hdd1
mdadm: No md superblock detected on /dev/hdd1.

[EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdd1
mdadm: add new device failed for /dev/hdd1 as 5: Invalid argument

[EMAIL PROTECTED] ~]# dmesg | tail
...
md: hdd1 has invalid sb, not importing!
md: md_import_device returned -22

[EMAIL PROTECTED] ~]# mdadm -E /dev/hdd1
/dev/hdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 382c157a:405e0640:c30f9e9e:888a5e63
Creation Time : Wed Jan 9 18:57:53 2008
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1953535744 (1863.04 GiB 2000.42 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Jan 20 13:02:00 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : 198f8fb4 - correct
Events : 0.348270
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 22 65 -1 spare /dev/hdd1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 0 0 3 faulty removed
4 4 8 49 4 active sync /dev/sdd1

I have mentioned it to Neil, but didn't mention it here before.  I am a
C developer by trade, so can easily devle into the mdadm source for
extra debug if anyone thinks it could help.  I could also delve into md
in the kernel if really wanted, but my knowledge of building kernels on
linux is some 4 years+ out of date and forgotten, so if that's a yes,
then some pointers on how to get the centos kernel config and a choice
of kernel from www.kernel.org, or from the centos distro would be
invaluable.

I'm away for a few days from tomorrow and probably wont be able to do
much if anything until I'm back on Thursday, so please be patient if I
don't respond before then.

Many Thanks,

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.7/1233 - Release Date:
19/01/2008 18:37
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-20 Thread Steve Fairbairn
 -Original Message-
 From: Neil Brown [mailto:[EMAIL PROTECTED] 
 Sent: 20 January 2008 20:37
 
  md: hdd1 has invalid sb, not importing!
  md: md_import_device returned -22
 
 In 2.6.18, the only thing that can return this message 
 without other more explanatory messages are:
 
   2/ If the device appears to be too small.
 
 Maybe it is the later, though that seems unlikely.
 

[EMAIL PROTECTED] ~]# mdadm /dev/md0 --verbose --add /dev/hdd1
mdadm: added /dev/hdd1

HUGE thanks to Neil, and one white gold plated donkey award to me.

OK.  When I created /dev/md1 after creating /dev/md0, I was using a
mishmash of disks I had lying around.  As this selection of disks used
differing block sizes, I chose to create the raid partitions from the
first block, to a set size (+250G).  When I reinstalled the disk for
going into /dev/md0, I partitioned the disk the same way (+500G), which
it turns out isn't how I created the partitions when I created that
array.

So the device I was trying to add was about 22 blocks too small.  Taking
Neils suggestion and looking at /proc/partitions showed this up
incredibly quickly.

My sincere apologies for wasting all your time on a stupid error, and
again many many thanks for the solution...

md0 : active raid5 hdd1[5] sdd1[4] sdc1[2] sdb1[1] sda1[0]
  1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUU_U]
  []  recovery =  0.9% (4430220/488383936)
finish=1110.8min speed=7259K/sec

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.7/1233 - Release Date:
19/01/2008 18:37
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm error when trying to replace a failed drive in RAID5 array

2008-01-19 Thread Steve Fairbairn

Hi All,

Firstly, I must express my thanks to Neil Brown for being willing to
respond to the direct email I sent him as I couldn't for the life of me
find any forums on mdadm or this list...

I have a Software RAID 5 device configured, but one of the drives
failed. I removed the drive with the following command...

mdadm /dev/md0 --remove /dev/hdc1

[EMAIL PROTECTED] ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md1 : active raid5 hdk1[5] hdi1[3] hdh1[2] hdg1[1] hde1[0]
  976590848 blocks level 5, 64k chunk, algorithm 2 [5/4] [_]
  []  recovery = 22.1% (54175872/244147712)
finish=3615.3min speed=872K/sec
  
md0 : active raid5 sdd1[4] sdc1[2] sdb1[1] sda1[0]
  1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUU_U]
  
unused devices: none

Please ignore /dev/md1 for now at least.  Now my array (/dev/md0) shows
the following...

[EMAIL PROTECTED] ~]# mdadm -QD /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Wed Jan 9 18:57:53 2008
Raid Level : raid5
Array Size : 1953535744 (1863.04 GiB 2000.42 GB)
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Jan 4 04:28:03 2005
State : clean, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

UUID : 382c157a:405e0640:c30f9e9e:888a5e63
Events : 0.337650

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 0 0 3 removed
4 8 49 4 active sync /dev/sdd1

Now, when I try to insert the replacement drive back in, I get the
following...

[EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1
mdadm: add new device failed for /dev/hdc1 as 5: Invalid argument

It seems to be that mdadm is trying to add the device as number 5
instead of replacing number 3, but I have no idea why, or how to make it
replace number 3.

--- Neil has explained to me already that the drive should be added as
5, and then switched to 3 after a a rebuild is complete.  Neil aslo
asked me if dmesg showed up anything when I tried adding the drive

[EMAIL PROTECTED] mdadm-2.6.4]# dmesg | tail
...
md: hdc1 has invalid sb, not importing!
md: md_import_device returned -22
md: hdc1 has invalid sb, not importing!
md: md_import_device returned -22

I have updated mdadm to the latest version I can find...

[EMAIL PROTECTED] ~]# mdadm --version
mdadm - v2.6.4 - 19th October 2007

Still get the same error. I'm hoping someone will have some suggestion
as to how to sort this out. Backing up nearly 2TB of data isn't really a
viable option for me, so I'm quite desperate to get the redundancy back.

My linux distribution is a relatively new installation from CentOS 5.1
ISOs.  The Kernel version is 

[EMAIL PROTECTED] ~]# uname -a
Linux space.homenet.com 2.6.18-53.1.4.el5 #1 SMP Fri Nov 30 00:45:55 EST
2007 x86_64 x86_64 x86_64 GNU/Linux

Many Thanks,

Steve.


No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.7/1232 - Release Date:
18/01/2008 19:32
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html